Ann Stat. Author manuscript; available in PMC 2013 March 29.
Published in final edited form as:
Ann Stat. 2013 February 1; 41(1): 1–40.
PMCID: PMC3611334
NIHMSID: NIHMS426027

# The linear stochastic order and directed inference for multivariate ordered distributions

## Abstract

Researchers are often interested in drawing inferences regarding the order between two experimental groups on the basis of multivariate response data. Since standard multivariate methods are designed for two sided alternatives they may not be ideal for testing for order between two groups. In this article we introduce the notion of the linear stochastic order and investigate its properties. Statistical theory and methodology are developed to both estimate the direction which best separates two arbitrary ordered distributions and to test for order between the two groups. The new methodology generalizes Roy's classical largest root test to the nonparametric setting and is applicable to random vectors with discrete and/or continuous components. The proposed methodology is illustrated using data obtained from a 90–day pre–chronic rodent cancer bioassay study conducted by the National Toxicology Program (NTP).

Keywords: nonparametric tests, order restricted statistical inference, stochastic order relations

## 1. Introduction

In a variety of applications researchers are interested in comparing two treatment groups on the basis of several, potentially dependent, outcomes. For example, to evaluate if a chemical is a neuro–toxicant, toxicologists compare a treated group of animals with an untreated control group in terms of various correlated outcomes such as: Tail-pinch response, Click response, and Gait score, among many others (cf. Moser, 2000). The statistical problem of interest is to compare the multivariate distributions of the outcomes in the control and treatment groups. Moreover, the outcome distributions are expected to be ordered in some sense. The theory of stochastic order relations (Shaked and Shanthikumar, 2007) provides the theoretical foundation for such comparisons.

To fix ideas let X and Y be p dimensional random variables (RVs); X is said to be smaller than Y in the multivariate stochastic order, denoted X st Y, provided (X U) ≤ (Y U) for all upper sets U p (Shaked and Shanthikumar, 2007). If for some upper set the above inequality is sharp we say that X is strictly smaller than Y (in the multivariate stochastic order) which we denote by X st Y. Recall that a set U p is called an upper set if u U implies that v U whenever uv, i.e., if uivi, i = 1,…, p. Note that comparing X and Y with respect to the multivariate stochastic order requires comparing their distributions over all upper sets in p. This turns out to be a very high dimensional problem. For example if X and Y are multivariate binary RVs then X st Y provided ΣtU pX (t) ≤ ΣtU pY (t) where pX (t) and pY (t) are the corresponding probability mass functions. Here U Up where Up is the family of upper sets defined on the support of a p dimensional multivariate binary RV. It turns out that the cardinality of Up, denoted by Np, grows super–exponentially with p. In fact N1 = 1, N2 = 4, N3 = 18, N4 = 166, N5 = 7579 and N6 = 7828352. The values of N7 and N8 are also known but N9 is not. However good approximations for Np are available for all p (cf. Davidov and Peddada 2011). Obviously the number of upper sets for general multivariate RVs is much larger. Since in many applications p is large it would seem that the analysis of high dimensional stochastically ordered data is practically hopeless. As a consequence, the methodology for analyzing multivariate ordered data is underdeveloped. It is worth mentioning that Sampson and Whitaker (1989) as well as Lucas and Wright (1991) studied stochastically ordered bivariate multinomial distributions. They noted the dificulty of extending their methodology to high dimensional data due to the large number of constraints that need to be imposed. Recently Davidov and Peddada (2011) proposed a framework for testing for order among K, p–dimensional, ordered multivariate binary distributions.

In this paper we address the dimensionality problem by considering an easy to understand stochastic order which we refer to as the linear stochastic order.

Definition 1.1. The RV X is said to be smaller than the RV Y in the (multivariate) linear stochastic order, denoted X l-stY, if for all

(1.1)

where st in (1.1) denotes the usual (univariate) stochastic order.

Note that it is enough to limit (1.1) to all non-negative real vectors satisfying ║s║ = 1 and accordingly we denote by $S+p−1$ the positive part of the unit sphere in P. We call each a “direction”. In other words the RVs X and Y are ordered by the linear stochastic order if every non-negative linear combination of their components is ordered by the usual (univariate) stochastic order. Thus instead of considering all upper sets in P we need for each to consider only upper sets in . This is a substantial reduction in dimensionality. In fact we will show that only one value of s, determined by the data, need be considered. Note that the linear stochastic order, like the multivariate stochastic order, is a generalization of the usual univariate stochastic order to multivariate data. Both of these orders indicate, in difierent ways, that one random vector is more likely than another to take on large values. In this paper we develop the statistical theory and methodology for estimation and testing for linearly ordered multivariate distributions. For completeness we note that weaker notions of the linear stochastic order are discussed by Hu et al. (2011) and applied to various optimization problems in queuing and finance.

Comparing linear combinations has a long history in statistics. For example, in Phase I clinical trials it is common to compare dose groups using an overall measures of toxicity. Typically, this quantity is an ad hoc weighted average of individual toxicities where the weights are often known as “severity weights” (cf. Bekele and Thall, 2004, Ivanova and Murphy, 2009). This strategy of dimension reduction is not new in the statistical literature and has been used in classical multivariate analysis when comparing two or more multivariate normal populations. For example, using the union-intersection principle, the comparison of multivariate normal populations can be reduced to the comparison of all possible linear combinations of their mean vectors. This approach is the basis of Roy's classical largest root test (Roy 1953, Johnson and Wichern, 1998). Our proposed test may be viewed as nonparametric generalization of the classical normal theory method described above with the exception that we limit consideration only to non-negative linear combinations (rather than all possible linear combinations) since our main focus is to make comparisons in terms of stochastic order. We emphasize that the linear stochastic order will allow us to address the much broader problem of directional ordering for multivariate ordered data, i.e., to find the direction which best separates two ordered distributions. Based on our survey of the literature, we are not aware of any methodology that addresses the problems investigated here.

This paper is organized in the following way. In Section 2 some probabilistic properties of the linear stochastic order are explored and its relationships with other multivariate stochastic orders are clarified. In Section 3 we provide the background and motivation for directional inference under the linear stochastic order and develop estimation and testing procedure for independent as well as paired samples. In particular the estimator of the best separating direction is presented and its large sampling properties derived. We note that the problem of estimating the best separating direction is a non-smooth optimization problem. The limiting distribution of the best separating direction is derived in a variety of settings. Tests for the linear stochastic order based on the best separating direction are also developed. One advantage of our approach is that it avoids the estimation of multivariate distributions subject to order restrictions. Simulation results, presented in Section 4, reveal that for large sample sizes the proposed estimator has negligible bias and mean squared error (MSE). The bias and MSE seem to depend on the true value of the best separating direction, the dependence structure and the dimension of the problem. Furthermore, the proposed test honors the nominal type I error rate and has sufficient power. In Section 5 we illustrate the methodology using data obtained from the National Toxicology Program (NTP). Concluding remarks and some open research problems are provided in Section 6. For convenience all proofs are provided in an Appendix where additional concepts are defined when needed.

## 2. Some properties of the linear stochastic order

We start by clarifying the relationship between the linear stochastic order and the multivariate stochastic order. First note that X l-st Y if and only if (sTXt) ≤ (sTYt) for all which is equivalent to (X H) ≤ (Y H) for all H where is the collection of all upper-half-planes, i.e., sets which are both half planes and upper sets. Thus X st Y X l-st Y. The converse does not hold in general.

Example 2.1. Let X and Y be bivariate RVs such that (X = (1,1)) = (X = (0,1)) = (X = (1,0)) = 1/3 and (Y = (3/4,3/4)) = (Y = (1,2)) = (Y = (2,1)) = 1/3. It is easy to show that X is smaller than Y in the linear stochastic order but not in the multivariate stochastic order.

The following theorem provides some closure results for the linear stochastic order.

### Theorem 2.1

(i) If X l-st Y then g(X) l-st g(Y) for any affine increasing function; (ii) If X l-st Y then XI l-st YI for each subset I {1,…,p}; (iii) If X|Z = z l-st Y|Z = z for all z in the support of Z then X l-st Y; (iii) If X1,…,Xn are independent RVs with dimensions pi and similarly for Y1,…,Yn and if in addition Xi l-st Yi then (X1,…,Xn) l-st (Yi,…,Yn); (iv) Finally, if XnX and YnY where convergence can be in distribution, in probability or almost surely and if Xn l-st Yn for all n then X l-st Y.

Theorem 2.1 shows that the linear stochastic order is closed under increasing linear transformations, marginalization, mixtures, conjugations and convergence. In particular parts (ii) and (iii) of Theorem 2.1 imply that if X l-st Y then Xi st Yi and Xi + Xj st Yi + Yj for all i and j, i.e., all marginals are ordered as are all convolutions. Although the multivariate stochastic order is in general stronger than the linear stochastic order there are situation in which both orders coincide.

### Theorem 2.2

Let X and Y be continuous elliptically distributed RVs supported on p with the same generator. Then X l-st Y if and only if X st Y.

Note that the elliptical family of distributions is large and includes the multivariate normal, multivariate t and the exponential power family, see Fang et al. (1987). Thus Theorem 2.2 shows that the multivariate stochastic order coincides with the linear stochastic order in the normal family. Incidentally, in the proof of Theorem 2.2 we generalize the results of Ding and Zhang (2004) on multivariate stochastic ordering of elliptical RVs. Another interesting example is:

### Theorem 2.3

Let X and Y be multivariate binary RVs. Then X l-st Y is equivalent to X st Y if and only if p ≤ 3.

Remark 2.1. In the proof of Theorem 2.2 distributional properties of the elliptical family play a major role. In contrast, Theorem 2.3 is a consequence of the geometry of the upper sets of multivariate binary RVs which turn out to be upper half planes if and only if p ≤ 3.

We now explore the role of the dependence structure.

### Theorem 2.4

Let X and Y have the same copula. Then X l-st Y if and only if X st Y.

Theorem 2.4 establishes that if two RVs have the same dependence structure as quantified by their copula function (cf. Joe 1997) then the linear and multivariate stochastic orders coincide. Such situations arise when the correlation structure among outcomes is not expected to vary with dose.

The orthant orders are also of interest in statistical applications. We say that X is smaller than Y in the upper orthant order, denoted X uo Y, if (X O) ≤ (Y O) for all O where is the collection of upper orthants, i.e., sets of the form {z : zx} for some fixed x p: The lower orthant order is similarly defined (cf. Shaked and Shanthikumar, 2007 or Davidov and Herman 2011). It is obvious that the orthant orders are weaker than the usual multivariate stochastic order, i.e., X st Y X uo Y and X lo Y. In general the linear stochastic order does not imply the upper (or lower) orthant order, nor is the converse true. However, as stated below, under some conditions on the copula functions the linear stochastic order implies the upper (or lower) orthant order.

### Theorem 2.5

If X i-st Y and CX (u) ≤ CY (u) for all u [0,1]p then X lo Y. Similarly if X l-st Y and C̄x (u) ≤ C̄y (u) for all u [0, 1]p then X uo Y.

Note that CX (u) and X (u) above are the copula and tail-copula functions for the RV X (cf. Joe 1994) and are defined in the Appendix. Similarly for CY (u) and Y(u): Further note that the relations CX(u) ≤ CY(u) and/or X (u) ≤ Y (u) indicate that the components of Y are more strongly dependent than the components of X. This particular dependence ordering is known as positive quadrant dependence. It can be further shown that strong dependence and the linear stochastic order do not in general imply stochastic ordering.

Additional properties of the linear stochastic order as they relate to estimation and testing problems are given in Section 3.

## 3. Directional inference

### 3.1. Background and motivation

There exists a long history of well developed theory for comparing two or more multivariate normal (MVN) populations. Methods for assessing whether there are any difierences between the populations, which differ? in which component(s)? and by how much? have been addressed in the literature using a variety of simultaneous con…dence intervals and multiple comparison methods (cf. Johnson and Wichern, 1998). Of particular interest to us is Roy's largest root test. To fix ideas consider two multivariate normal random vectors X and Y with means μ and ν, respectively, and a common variance matrix Σ. Using the Union-Intersection principle Roy (1953) expressed the problem of testing H0 : μ = ν versus H1: μν as a collection of univariate testing problems, by showing that H0 and H1 are equivalent to ∩s ppH0,s and s pH1,s where H0,s: STμ = ST ν and H1,S: STμSTν Implicitly Roy's test identifies the linear combination $SmaxT(ν−μ)$ that corresponds to the largest “distance” between the mean vectors, i.e., the direction which best separates their distributions. The resulting test, known as Roy's largest root test, is given by the largest eigenvalue of BS−1 where B is the matrix of between groups (or populations) sums of squares and cross products and S is the usual unbiased estimator of Σ. In the special case when there are only two populations this test statistic is identical to Hotelling's T2 statistic. From the simultaneous confidence intervals point of view, the critical values derived from the null distribution of this statistic can be used for constructing Scheffe's simultaneous confidence intervals for all possible linear combinations of the difference (μν) Further note that the estimated direction corresponding to Roy's largest root test is S−1 () where and are the respective sample means based on n and m observations, respectively.

Our objective is to extend and generalize the classical multivariate method, described above, to non-normal multivariate ordered data. Our approach will be nonparametric. Recall that comparing MVNs is done by considering the family of statistics Tn,m(s) = sT() for all s p In the case of non-normal populations, the population mean alone may not be enough to characterize the distribution. In such cases, it may not be sufficient to compare the means of the populations but one may have to compare entire distributions. One possible way of doing so is by considering rank statistics. Suppose X1,…,Xn and Y1,…,Ym are independent random samples from two multivariate populations. Let

$Rk(s)=∑i=1nI(sTXi≤sTXk)+∑j=1mI(sTYj≤sTXk)$

be the rank of sTXk in the combined sample sTX1,…,sTXn, $s1T$Y1,…,sTYm: For fixed s p the distributions of sTX and sTY can be compared using a rank test. For example if we use $Wn,m(s)=∑i=1nRi(s)$ our comparison is done in terms of Wilcoxon s rank sum statistics. It is well known that rank tests are well suited for testing for univariate stochastic order (cf. Hajek et al. 1999, Davidov 2012) where the restrictions that must be made. Although any rank test can be used, the Mann-Whitney form of Wilcooxon's (WMW) statistic is particularly attractive in this application. Therefore in the rest of this paper we develop estimation and testing procedures for the linear stochastic order based on the family of statistics

$Ψn,m(s)=1nm∑i=1n∑j=1mI(sTXi≤sTYj)$
(3.1)

where s varies over $S+p−1$ Note that (3.1) unbiasedly estimates

$Ψ(s)=P(sTX≤sTY).$
(3.2)

The following result is somewhat surprising.

#### Proposition 3.1

Let X and Y be independent MVNs with means μν and common variance matrix Σ. Then Roy's maximal separating direction Σ−1(νμ) also maximizes (sTXsTY).

Proposition 3.1 shows that the direction which separates the means, in the sense of Roy, also maximizes (3.2). Thus it provides further support for choosing (3.1) as our test statistic. Note that in general Σ−1(νμ) may not belong to $S+p−1$. Since we focus on the linear statistical order, we restrict ourselves to . Consequently we define and refer to smax as the best separating direction. Further note that if X and Y are independent and continuous and if Xl-st Y then Ψ(s) ≥ 1/2 for all . This simply means that sTX tends to be smaller than sTY more than 50% of the time. Note that probabilities of the type (3.2) were introduced by Pitman (1937) and further studied by Peddada (1985) for comparing estimators. Random variables satisfying such a condition are said to be ordered by the precedence order (Arcones et al. 2002).

Once smax is estimated we can plug it in (3.1) to get a test statistic. Hence our test may be viewed as a natural generalization of Roy's largest root test from MVNs to arbitrary ordered distributions. However unlike Roy's method, which does not explicitly estimate smax we do. On the other hand the proposed test does not require the computation of the inverse of the sample covariance matrix whereas Roy's test and Hotteling's T2 test require such computations. Consequently, such tests cannot be used when n < p whereas our test can be used in all such instances.

Remark 3.1. In the above description Xi and Yj are independent for all i and j and therefore the probability (sTXisTYj) is independent of both i and j: However, in many applications such as repeated measurement and crossover designs, the data are a random sample of dependent pairs (X1, Y1),…,(XN, YN) for which Zi = YiXi are IID. For example, such a situation may arise when and where i are pair-specific random effects and the RVs $Yi′$ (as well as $Xi′$) are IID. In this situation (sTXisTYi) is independent of i and smax is well defined. Moreover the objective function analogous to (3.1) is

$ΨN(s)=1N∑i=1NI(sTXi≤sTYi).$
(3.3)

In the following we consider both sampling designs which we refer to as: (a) independent samples; and (b) paired or dependent samples. Results are developed primarily for independent samples but modification for paired samples are mentioned as appropriate.

### 3.2. Estimating the best separating direction

Consider first the case of independent samples, i.e., X1,….,Xn and Y1,….,Ym are random samples from the two populations. Rewrite (3.1) as

$Ψn,m(s)=1nm∑i=1n∑j=1mI(sTZij≥0)$
(3.4)

where Zij=YjXi. The maximizer of (3.4) is denoted by ŝmax, i.e.,

(3.5)

Finding (3.5) with is a non-smooth optimization problem. Consider first the situation where p = 2: In this case we maximize (3.4) subject to . Geometrically $S+1$ is a quarter circle spanning the first quadrant. Now let Z = (Z1,Z2) and without any loss of generality assume that ║Z= 1. We examine the behavior of the function I(sTZ≥0) as a function of s. Clearly if Z ≥ 0; i.e., if Z1 ≥ 0, Z2 ≥ 0 then for all we have I(sTZ≥0) = 1. In other words any value of s on the arc $S+1$ maximizes I(sTZ≥0) Similarly if Z < 0 then for all we have I(sTZ≥0) = 0 and again the entire arc $S+1$ maximizes I(sTZ≥0). Now let Z1 ≥ 0 and Z2< 0. It follows that I(sTZ≥0) = 1 provided cos(sTZ) ≥ 0. Thus I(sTZ≥0) = 1 for all s on the arc [0; θ] for some θ. If Z1< 0 and let Z2 0 the situation is reversed and I(sT Z≥0) = 1 for all angles s on the arc [θ, π/2]. The value of θ is given by (3.6). In other words each Z is mapped to an arc on $S+1$ as described above. Now, the function (3.4) simply counts the number of arcs covering each . The maximizer of (3.4) lies in the region where the maximum number of arcs overlap. Clearly this implies that the maximizer of (3.4) is not unique. A quick way to find the maximizer is:

#### Algorithm 3.1

Let M denote the number of Zij's which belong to the second or fourth quadrant. Map

(3.6)

Relabel and order the resulting angles as θ [1] <…< θ[M]. Also define θ[0] = 0 and θ[M+1] = π/2. Evaluate Ψn,m(s[i]) i = 1,….,M where s[i],1 = cos(θ[i]) and s[i],2 = sin(θ[i]) If a maximum is attained at θ[j] then any value in [θ[j−1], θ[j]] or [θ[j], θ[j+1]] maximizes (3.1).

In light of the above discussion it can be easily proved that:

#### Proposition 3.2

For p = 2 Algorithm 3.1 maximizes (3.1).

In the general case, i.e., for p ≥ 3; each observation Zij is associated with a “slice” of $S+p−1$. The boundaries of the slice are the intersection of $S+p−1$ and some half-plane. Note that when p = 2 the slices are arcs. The shape of the slice depends on the quadrant to which Zij belongs. The maximizer of (3.1) is again the value of s which belongs to the largest number of slices. Although the geometry of the resulting optimization problem is easy to understand, we have not been able to devise a simple algorithm, which scales with p, based on the ideas above. However, we have found that (3.5) can be obtained by converting the data into polar coordinates and then using the Nelder-Mead algorithm which does not require the objective function to be differentiable. We emphasize that this maximization process results in a single maximizer of (3.1) and we do not attempt to find the entire set of maximizers. For completness we note that there are methods for optimizing (3.1) specifically designed for non–smooth problems. For more details see Price et al. (2008) and Audet et al. (2008) and the references therein for both algorithms and convergence results.

Remark 3.2. It is clear that the estimation procedure for paired samples is the same as for independent samples.

### 3.3. Large sample behavior

We find three different asymptotic regimes for ŝmax depending on distributional assumptions and the sampling scheme (paired versus independent samples).

Note that the parameter space is the unit sphere not the usual Euclidian space. There are several ways of dealing with this irregularity, one of which is to re-express the last coordinate of as $sp=1−s12−…−sp−12$ and consider the parameter space ${s≥0:s12+…+sp−12≤1}$ which is a compact subset of p−1. Clearly these parameterizations are equivalent and without any further ambiguity we will denote them both by $S+p−1$. Thus in the proofs below both views of $S+p−1$ are used interchangeably as convenient.

We begin our discussion with independent samples assuming continuous distributions for both X and Y.

#### Theorem 3.1

Let X and Y have continuously differentiable densities. If Ψ(s) is uniquely maximized by smax interior ( $S+p−1$). Then ŝmax, the maximizer of (3.1), is strongly consistent, i.e., $s^max→a.s.smax$. Furthermore ŝmax = smax + Op(N−1/2) where N = n + m. Finally, if n/ (n/ + m) → λ (0, 1) then

where the matrix Σ is defined in the body of the proof.

Although (3.1) is not continuous (nor differentiable) its U–statistic structure guarantees that it is “almost” so (i.e., it is continuous up to an op(1/N) term) and therefore its maximizer converges at a $N$ rate to a normal limit (Sherman, 1993). We also note that it is difficult to estimate the asymptotic variance Σ directly since it depends on unknown functions (ψj and 2ψj for j = 1, 2 where the functions ψj are defined in the body of the proof). Nevertheless bootstrap variance estimates are easily derived.

Remark 3.3. Note that if either X or Y are continuous RVs then Ψ(s) is continuous. This is a necessary condition for the uniqueness of smax.

We have not been able to find general condition(s) for a unique maximizer for Ψ(s) although important sufficient conditions can be found. For example,

#### Proposition 3.3

If Z = Y − X and there exist δ = νμ 0 and Σ so the distribution of

$sTZ−sTδsT∑s$

is independent of s then the maximizer of Ψ(s) is unique.

The condition above is satisfied by location scale families and it may be convenient to think of δ and Σ as the location and scale parameters for Z. In general, however, Ψ(s) may not have a unique maximum nor be continuous. For example, if both X and Y are discrete RVs then Ψ(s) is a step function. In such situations smax is set valued and we may denote it by Smax. As we have seen earlier the maximizer of Ψn,m(s) is always set valued (typically however we find only one maximizer). Consider for example the case where (X = (−1,−1)) = (X = (1,1)) = 1/2 and let (Y = (−1,−1)) = 1/2 − ε and (Y = (1,1)) = 1/2 + ε for some ε > 0. It is clear that X stY. Further note that Z {(2, 2), (0, 0), (−2, −2)} and it follows that Ψ(s) is constant on $S+1$ which implies that Smax coincides with $S+1$. Similarly Ψn,m(s) is constant on $S+1$ and therefore ŝmax Smax for all n, m. This means that consistency is guaranteed and the limiting distribution is degenerate. More generally, we have:

#### Theorem 3.2

If X and Y have discrete distributions with finite support and ŝmax is a maximizer of (3.1), then

for some positive constants C1 and C2.

Theorem 3.2 shows that the probability that a maximizer of Ψn,m is not in Smax is exponentially small when the underlying distributions of X and Y are discrete. Hence ŝmax is consistent and converges exponentially fast. In fact the proof of Theorem 3.2 implies that ŝmaxSmax, i.e., the set of maximizers of Ψn,m(•) converges to the set of maximizers of Ψ(•) that is ρH(ŝmax, Smax) → 0 where ρH is the Hausdroff metric defined on compact sets. A careful reading of the proof shows that Theorem 3.2 also holds under paired samples.

Finally, we consider the case of continuous RVs under paired samples. Then under the conditions of Theorem 3.1 and provided the density of Z=YX is bounded we have:

#### Theorem 3.3

Under the above mentioned conditions ŝmax, the maximizer of (3.3), is strongly consistent, i.e., $s^max→a.s.smax$, converges at a cube root rate, that is ŝmax = smax + Op(N−1/3), and

where W has the distribution of the almost surely unique maximizer of the process s −[Q (s) + W;(s)] on $S+p−1$ where Q(s) is a quadratic function and W(s) is a zero mean Gaussian process described in the body of the proof.

Theorem 3.3 shows that in paired samples ŝmax is consistent but in contrast with Theorem 3.1 it converges at a cube-root rate to a non-normal limit. The cube root rate is due to the discontinuous nature of the objective function (3.3). General results dealing with this kind of asymptotics for independent observations are given by Kim and Pollard (1990). The main difference between Theorems 3.1 and 3.3 is that the objective function (3.1) is smoothed by its U-statistic structure while (3.3) is not.

### 3.4. A confidence set for ŝmax

Since the parameter space is the surface of a unit sphere it is natural to define the (1−α) × 100% confidence set for smax centered at ŝmax by

where Cα,N satisfies $P(s^maxTs≤Cα,N)=1−α.$For more details see Fisher and Hall (1989) or Peddada and Chang (1996). Hence the confidence set is the set of all which have a small angle with ŝmax. In theory one may appeal to Theorem 3.1 to derive the critical value for any α (0, 1). However the limit law in Theorem 3.1 requires knowledge of unknown parameters and functions. For this reason, we explore the bootstrap for estimating Cα,N.

Remark 3.4. Since in the case of paired samples, the estimator converges at cube root rate rather than the square root rate, the standard bootstrap methodology may yield inaccurate coverage probabilities (see Abrevaya and Huang 2005 and Sen et al. 2010). For this reason we recommend the “M out of N” bootstrap methodology. For further discussion on the “M out of N” bootstrap methodology one may refer to Lee (1999), Delagdo et al. (2001), Bickel and Sackov (2008).

### 3.5. Testing for order

Consider first the case of independent samples where interest is in testing the hypothesis

(3.7)

Thus (3.7) tests whether the distributions of X and Y are equal or ordered (later on we briefly discuss testing H0 : X stY versus H1 : X stY). In this section we propose a new test for detecting an ordering among two multivariate distributions based on the maximal separating direction. The test is based on the following observation:

#### Theorem 3.4

Let X and Y be independent and continuous RVs. If X =stY then (sTXsTY) = 1/2 for all and if both (i)X stY, and (ii) (sTXsTY) > 1/2 for some , hold, then X stY.

Theorem 3.4 says that if it is known apriori that {X stY} = {XstY} {X stY}, i.e., the RVs are either equal or ordered (which is exactly what (3.7) implies) then a strict linear statistical ordering implies a strict ordering by the usual multivariate stochastic order. In particular under the alternative there must exist a direction for which sTX l-stsTY.

Remark 3.5. The assumption that X stY is natural in applications such as environmental sciences where high exposures are associated with increased risk. Nevertheless if the assumption that X stY is not warranted then the alternative hypothesis formulated in terms of the linear stochastic order actually tests whether there exists a s for which (sTXsTY) > 1/2. This amounts to a precedence (or Pitman) ordering between sTX and sTY.

Remark 3.6. In the proof of Theorem 3.4 we use the fact that given that X st Y we have X stY provided Xi stYi for some 1 ≤ i ≤ p. Note that if Xi stYi then E (Xi) < E (Yi). Thus it is possible to test (3.7) by comparing means (or any other monotone function of the data). Although such a test will be consistent it may lack power because tests based on means are often far from optimal when the data is not normally distributed. The WMW procedure, however, is known to have high power for a broad collection of underlying distributions.

Hence (3.7) can be reformulated in terms of the linear stochastic. In particular it justifies using the statistic

$Sn,m=N1/2(Ψn,m(s^max)−1/2).$
(3.8)

To the best of our knowledge this is the first general test for multivariate ordered distributions. In practice we first estimate ŝmax and then define $U^i=s^maxTXi$ and $V^i=s^maxTYj$ where i = 1,…,n and j = 1,…,m. Hence (3.8) is nothing but a WMW test based on the Uŝ's and Vŝ's. It is also a Kolmogorov-Smirnov type test.

The large sample distribution of (3.8) is given in the following.

#### Theorem 3.5

Suppose the null (3.7) holds. Let n, m → ∞ and n/(n + m) → λ (0, 1). Then,

where G (s) is a zero mean Gaussian process with covariance function given by (7.20).

Remark 3.7. Since $s^max→a.s.smax$ by Slutzky's theorem the power of the test (3.8) converges to the power of a WMW test comparing the samples $(smaxTX1,…,smaxTXn)$ and $(smaxTY1,…,smaxTYm)$ The “synthetic” test assuming that smax is known serves as a gold standard as verified by our simulation study.

Remark 3.8. Furthermore, the power of the test under local alternatives, i.e., when Y =stX + N−1/2δ and N → ∞ is bounded by the power of the WMW test comparing the distributions of $smaxTX$ and $smaxTY=smaxTX+N−1/2smaxTδ.$

Alternatives to the “sup” statistic (3.8) are the “integrated” statistics

(3.9)

where [x]+= max (0, x). It is clear that In;m N (0, σ2) where and C (u, v), the covariance function of G, is given by (7.20). Also

This distribution does not have a closed form. The statistics In,m and $In,m+$ have univariate analogues, cf. Davidov and Herman (2012). Finally,

#### Theorem 3.6

The tests (3.8) and (3.9) are consistent. Furthermore if X stY stZ then all three tests for H0: X =stZ versus H1: X stZ are more powerful than the respective tests for H0: X =stY versus H1: X stY.

Theorem 3.6 shows that the tests are consistent and that their power function is monotone in the linear stochastic order.

Remark 3.9. Qualitatively similar results are obtainable in the paired sampling case; the only difference being the limiting process. For example it easy to see that the paired sample analogue of (3.8) satisfies

where (s) is the empirical process on $S+p−1$ associated with (3.3). Analogues of In,m and $In,m+$ are similarly defined and analyzed. Tests for paired samples may be similarly implemented using bootstrap or permutation methods.

## 4. Simulations

For simplicity of exposition, and motivated by the fact that the example we analyzed in this paper deals with independent samples, we limit our simulations to the case of independent samples.

### 4.1. The distribution of ŝmax

We start by investigating the distribution of ŝmax by simulation. For simplicity we choose p = 3 and generated Xi(i = 1,…, n) distributed as N3 (0, Σ) and Yj(j = 1,…., m) distributed as N3(δ, Σ) where Σ = (1−ρ)I+ ρJ,I is the identity matrix and J is a square matrix of 1's. We simulated 1000 realizations of ŝmax for various sample sizes and correlation coefficients. To get a visual description of the density of ŝmax, we provide a pair of plots for each configuration of ρ and sample size n. In Figure 1 we provide the joint density of the 2-dimensional polar angles (θ, ) of ŝmax. There are four panels in Figure 1, corresponding to all combinations of ρ = 0, 0.9 and n = 10, 100. The mean vector δ in this plot was taken to be δ = (2, 2, 2)T. In Figure 2 we provide the density of the polar residual defined by $1‒s^maxTsmax$. The four panels of Figure 2 correspond to all combinations of ρ = 0, 0.9 and n = 10,100 and two patterns of δ, namely, (2, 2, 2)T and (3, 2, 1)T. We see from Figure 1 that ŝmax converges to a unimodal, normal looking, distribution as the sample size increases. Interestingly, from Figure 2 we see that the concentration of the distribution around the true parameter depends upon the values of δ and ρ (which together determine smax). If the components of the underlying random vector are exchangeable (e.g., δ= (2, 2, 2)T) the residuals tend to concentrate more closely around zero (Figure 2(a) and 2(c)) compared to the case when they are not exchangeable (Figure 2(b) and 2(d)).

(a):delta = (2,2,2),rho=0, n=10
(a):delta = (2,2,2),Rho=0

### 4.2. Study design

The simulation study consists of three parts. In the first part we evaluate the accuracy and precision of ŝmax by estimating its bias and mean squared error (MSE). In the second part we investigate the coverage probability of bootstrap confidence intervals: In the third part we estimate type I errors and powers of the proposed test Sn,m as well as the integral tests In,m and $In,m+$.

To evaluate the bias and MSEs we generated X1,…., Xn ~ N3 (0, Σ) and Y1,…, Ym ~ N3(δ, Σ) where n = m = 20 or 100 observations. The common variance matrix is assumed to have intra-class correlation structure, i.e., Σ = (1 − ρ) I + ρJ where I is the identity matrix and J is a matrix of ones. Various patterns of the mean vectors δ and correlation coefficient ρ were considered as described in Table 1.

Bias and MSE of ŝmax.

We conducted extensive simulation studies to evaluate the performance of the bootstrap confidence intervals. In this paper we present a small sample of our study. We generated data from two 5-dimensional normal populations with means 0 and δ, respectively and a common covariance Σ = (1 − ρ) I + ρJ. We considered 5 patterns of ρ and 2 patterns of sample sizes (n = m = 20 and 40). The nominal coverage probability was 0.95. Results are summarized in Table 2.

Coverage probabilities for the bootstrap confidence intervals for p = 5 normal data. Pattern i = 1,2 corresponds to δ1 = (.1, .25, .5, .75, .9) and δ2 = (.5, .5, .5, .5, .5).

The goal of the third part of our simulation study was to evaluate the type I error and the power of the test (3.8). To evaluate the type I error three different baseline distributions for the two populations X and Y were employed as follows: (1) both distributed as N(0, Σ);(2) both distributed as πN(0; Σ) + (1 − π)N(δ,Σ) with π = 0.2 or π = 0.8; and (3) both distributed as exp (Z) = (exp (Z1),…., exp (Zp)) where Z follows a N(δ,Σ). We refer to this distribution as the multivariate lognormal distribution. Throughout the variance matrix is assumed to have the intra-class structure described above. Various patterns of the mean vectors δ and correlation coefficient ρ the dimension p were considered as described in Table 3. Sample sizes of n = m = 15 or 25 are reported.

Type I errors for the proposed procedure with nominal level α = .05. Three types of distributions are considered: MVNs, MV-LogN (multivariate lognormal) and Mix-MVN (mixtures of MVNs).

Power comparisons were carried out for data generated from X1,…., Xn ~ Np (0, Σ) and Y1,…, Ym ~ Np(δ, Σ) where p = 3 or 5 and a variety of patterns for δ as described in Table 4. If Roy's maximal separating direction (cf. Proposition 3.1) was known then a “natural gold standard” would be the test based on Ψn,m(smax). We shall refer to this test as the true maximal direction (TMD) test. Clearly the TMD test cannot be used in practice since it involves the unknown direction smax. Nevertheless the TMD test provides an upper bound for the power of the proposed test which uses the estimated direction. Hence we compute the efficiency of the proposed test relative to TMD test. An additional test, referred to as the RMD test is also compared. The RMD test has the same form but uses Roy's maximal direction given by S−1(). As suggested by a reviewer we also evaluated the power of the two integral based tests, described in 3.9, which do not require the determination of the best separating direction.

Type I errors and power for some settings with pn. Here n = m = 10 and δ1 has components 1/2 and δ2 has components i/p

Additionally, in Table 5 we evaluate the type I error and power of our test when X1,….,Xn ~ Np (0, Σ) and Y1,…, Ym ~ Np(δ, Σ) and n = m = p = 10 and n = m = 10 and p = 20 (i.e., p < n set up). Note that in neither of these cases the standard Hotteling's T2 (or Roy's largest root test) can be computed whereas the proposed test can be calculated.

Power comparisons of the two proposed test procedures with type I error of .050. Here δ1 = (0.1, 0.5, .9), δ2 = (0.1, 0.25, 0.5, 0.75, 0.9), δ3 = (0.5, 0.5, 0.5) and δ4 = (0.5, 0.5, 0.5, 0.5, 0.5) and n = m = 15.

Simulation results reported in this paper are based on 1000 simulation runs. Confidence sets are calculated using 1000 bootstrap samples, The bootstrap critical values for estimating type I error were based on 500 bootstrap samples. Since the results between 100 bootstrap samples and 500 bootstrap samples did not differ by much, all powers were estimated using 100 bootstrap samples.

### 4.3. Simulation results

The Bias and MSEs for the patterns considered are summarized in Table 1.

It is clear that the bias decreases with the sample size as do the MSEs. We observe that the bias tends to be smaller under independence and negative dependence compared with positive dependence. It also tends to be smaller when the data are exchangeable. Although results are not presented, we evaluated squared bias and MSE for larger values of p (e.g., p = 5, 10 and 20) and as expected the total squared bias and total MSE increased with the dimension p.

In Table 2 we summarize the estimated coverage probabilities of the bootstrap confidence intervals when p = 5. Our simulation study suggests that the proposed bootstrap methodology seems to perform better for larger sample sizes but rather poorly for smaller samples sizes.

Type I errors for different patterns considered in our simulation study are summarized in Table 3.

Our simulation studies suggest that in every case the proposed bootstrap based test maintains the nominal level of 0.05. In general it is slightly conservative. The performance of the test is not affected by the shape of the underlying distribution. That is not surprising owing to the nonparametric nature of the test. Furthermore, we evaluated the type I error of the proposed bootstrap test for testing the null hypothesis (3.7) for p as large as 20 with n = m = 10 and discovered that the proposed test attains the nominal level of 0.05 even np. See Table 4. As commented earlier in the paper Hotelling's T2 statistic can not be applied here since the Wishart matrix is singular in this case. However, the proposed method is still applicable since the estimation of the best direction does not require the inversion of a matrix.

The power of the tests (3.8) and (3.9) for various patterns considered in our simulation study are summarized in Table 5a and and5b5b.

Power comparisons of the two proposed test procedures with type I error of .050. Here δ1 = (0.1, 0.5, .9), δ2 = (0.1, 0.25, 0.5, 0.75, 0.9), δ3 = (0.5, 0.5, 0.5) and δ4 = (0.5, 0.5, 0.5, 0.5, 0.5) and n = m = 25.

As expected, in every case the power of the TMD test is higher than that of Sn test and the RMD test. The Sn,m test is almost always more powerful than the RMD test. The relative efficiency of Sn,m compared to the TMD test is quite high in most cases. When n = m = 15 the relative efficiency ranges between 65-95%. It is almost always above 90% when the sample size increases to 25 per group. In general the two integral tests had very similar power. They had larger power than Sn,m when ρ < 0. As ρ increased the power of Sn,m improved relative to the two integral tests. The test (3.8) seems to perform better when the components of δ were unequal. We also note that when the integral tests outperform Sn,m the difference is usually small whereas the Sn,m test can outperform the integral tests substantially. For example, observe pattern 2 where the powers of Sn,m and In,m are 0.93 and 0.97 respectively when ρ = −0.25 and 0.63 versus 0.40 when p = .90.

## 5. Illustration

Prior to conducting a 2-year rodent cancer bioassay to evaluate the toxicity/carcinogenicity of a chemical, the National Toxicology Program (NTP) routinely conducts a 90-day pre-chronic dose finding study. One of the goals of the 90-day study is to determine the maximum tolerated dose (MTD) that can be used in the 2-year chronic exposure study. Accurate determination of the MTD is critical for the success of the 2-year cancer bioassay. Cancer bioassays are typically very expensive and time consuming. Therefore their proper design, i.e., choosing the correct dosing levels, is very important. When the highest dose used in the 2-year study exceeds the MTD, a large proportion of animals in the high dose group(s) may die well before the end of the study and the data from such group(s) cannot be used reliably. This results in inefficiency and wasted resources.

Typically the NTP uses the 90-day study to determine the MTD on the basis of a large number of correlated endpoints that provide information regarding toxicity. These include body weight, organ weights, clinical chemistry (red blood cell counts, cell volume, hemoglobin, hematocrit, lymphocytes, etc.), histopathology (lesions in various target organs), number of deaths and so forth. The dose response data is analyzed for each variable separately using Dunnett's or the Williams’ test (or their nonparametric versions, Dunn's test and Shirley's test, respectively). NTP combines results from all such analyses qualitatively and uses other biological and toxicological information when making decisions regarding the highest dose for the 2-year cancer bioassay. Analyzing correlated variables one at a time may result in loss of information. The proposed methodology provides a convenient method to combine information from several outcome variables to make comparisons between groups.

We now illustrate our methodology by re-analyzing data obtained from a recent NTP study of the chemical Citral (NTP, 2003). Citral is a flavoring agent that is widely used in a variety of food items. The NTP assigned a random sample of 10 male rats to the control group and 10 to the 1785 mg/Kg dose group. Hematological and clinical chemistry measurements such as the number of Platelets (in 1000 per L), Urea Nitrogen (UN) (in mg/dL), Alkaline Phosphatase (AP) (in IU/L), and Bile Acids (BA) (in Mol/L) were recorded on each animal at the end of the study. The NTP performed univariate analysis on each of these variables and found no significant difference between the control and dose group except for the concentration of Urea Nitrogen which was increased in the high dose group. This increase was marginally significant at the 5% level and not at all after correcting for multiplicity. We applied the proposed methodology to compare the control with the high dose group (1785 mg/Kg) in terms of all non-negative linear combinations of the above mentioned four variables. We test the null hypothesis of no difference between the control and the high dose group against the alternative that the high dose group is stochastically larger (in the above four variables) than the control group. The resulting p-value based on 10,000 bootstrap samples was 0.25, which is significant at a 5% level of significance. The estimated value of smax was (0.074, 0.986, 0.012, 0.150)T and the estimated 95% confidence region is given by . Hence the confidence set includes any s which is within 21.5° degrees of ŝmax: This is a relatively large set due to the small sample sizes. Clearly our methodology appears to be sensitive to detect statistical differences which were not noted by NTP. Furthermore, our methodology allows us to infer that indeed 1785 mg/Kg dose group is larger in the multivariate stochastic order than the control group. This is a much stronger conclusion than the simple ordering of their means. Thus we believe that the proposed framework and methodology for studying ordered distributions can serve as a useful tool in toxicology and is also applicable to a wide range of other problems as alluded to in this paper.

## 6. Concluding remarks and some open problems

In many applications, researchers are interested in comparing two experimental conditions, e.g., a treatment and a control group, in terms of a multivariate response. In classical multivariate analysis one addresses such problems by comparing the mean vectors using Hotelling's T2 statistic. The assumption of MVN, underlying Hotelling's T2 test, may not hold in practice. Moreover if the data is not MVN then the comparison of population means may not always provide complete information regarding the differences between the two experimental groups. Secondly, Hotelling's T2 statistics is designed for two-sided alternatives and may not be ideal if a researcher is interested in one sided, i.e., ordered, alternatives. Addressing such problems requires one to compare the two experimental groups nonparametrically in terms of the multivariate stochastic order. Such comparisons, however, are very high dimensional and not easy to perform.

In this article we circumvent this challenge by considering the notion of the linear stochastic order between two random vectors. The linear stochastic order is a “weak” generalization of the univariate stochastic order. The linear stochastic order is simple to interpret and has an intuitive appeal. Using this notion of ordering, we developed nonparametric directional inference procedures. Intuitively, the proposed methodology seeks to determine the direction that best separates two multivariate populations. Asymptotic properties of the estimated direction are derived. Our test based on the best separating direction may be viewed as a generalization of Roy's classical largest root test for comparing several MVN populations. To the best of our knowledge this is the first general test for multivariate ordered distributions. Since in practice sample sizes are small, we use the bootstrap methodology for drawing inferences.

We illustrated the proposed methodology using a data obtained from a recent toxicity/carcinogenicity study conducted by the US National Toxicology Program (NTP) on the chemical Citral. A re-analysis of their 90-day data using our proposed methodology revealed a linear stochastic increase in Platelets, Urea Nitrogen, Alkaline Phosphatase, and Bile Acids in the high dose group relative to the control group, which was not seen in the original univariate analysis conducted by the NTP. These findings suggest that the proposed methodology may have greater sensitivity than the commonly used univariate statistical procedures. Our methodology is sufficiently general since it is nonparametric and can be applied to discrete and/or continuous outcome variables. Furthermore, our methodology exploits the underlying dependence structure in the data, rather than analyzing one variable at a time.

We note that our example and some of our results pertain to continuous RVs. However the methodology may be used, with appropriate modification (e.g., methods for dealing with ties) with discrete (or mixed) data with no problem. Although the focus of this paper has been the comparison of two multivariate vectors, in many applications, especially in dose response studies, researchers may be interested in determining trends (order) among several groups. Similar to classical parametric order restricted inference literature one could generalize the methodology developed in this paper to test for order restrictions among multiple populations. For example one could extend the results to K ≥ 2 RVs ordered by the simple ordering, i.e., X1 l-stX2 l-stl-stXK or to RVs ordered by the tree ordering, i.e., X1 l-stXj where j = 2,…., K. As pointed out by a referee the hypotheses H0: X stY versus H1:X stY can also be formulated and tested using the approach described. First note that the null hypothesis implies Ψ(s) ≥ 1/2 for all . On the other hand under the alternative there is an for which Ψ(s) < 1/2. Thus a test may be based on the statistic

$N1/2(Ψn,m(s^min)−1/2)$

where ŝmin is the value which minimizes Ψn,m(s). It is also clear that the least favorable configuration occurs when Ψ(s) = 1/2 for all which is equivalent to X=stY.

We believe that the result obtained here may be useful beyond order restricted inference. Our simulation study suggests that our estimator of the best separating direction, i.e., (3.5), may be useful even in the context of classical multivariate analysis where it may be viewed as a robust alternative to Roy's classical estimate. Finally we note that the linear stochastic order may be useful in a variety of other statistical problems. For example, we believe that it provide a useful framework for linearly combining the results of several diagnostic markers. This is a well known problem in the context of ROC curve analysis in diagnostic medicine.

## 7. Proofs

### Proof of Theorem 2.1

Proof. (i) Let g: pn be an affine increasing function. Clearly g(x)= v + Mx for some n vector v and n × p matrix M with non–negative elements. Thus for any we have . Hence

as required where the inequality holds because XstY. (ii) Fix I {1, …,p}. Let X = (XI, XĪ), Y = (YI, YĪ) where Ī is the complement of I in {1, …, p}. Further define, $sT=(sIT,sI¯T)$ where and set $sI¯T=0$. It follows that for all sI dim(I) we have

as required. (iii) Let : be any increasing function. Note that

The inequality is a consequence of X|Z = z l-stY|Z= z. Since is arbitrary it follows that X l-stY as required. (iv) Let X = (X1,…,Xn) and define Y similarly. Let where p = p1 + …+ pn. Now

by assumption for i = 1,…, n. In addition $siTXi$ and $siTXj$ are independent for ij. It follows from Theorem 1.A.3 in Shaked and Shanthikumar (2007) that , i.e., X stY as required. (v) By assumption Xn X and Yn Y where the symbol denotes convergence in distribution. By the continuous mapping theorem sTXn sTXn and sTYn sTY. It follows that

$P(sTXn≥t)→P(sTX≥t)andP(sTYn≥t)→P(sTY≥t).$
(7.1)

Moreover since Xn l-st Yn we have

(7.2)

Combining (7.1) and (7.2) we have (sTXt) ≤ (sTYt), i.e., X l-st Y as required.

Before proving Theorem 2.2 we provide a definition and a preliminary Lemma.

Definition 7.1. We say that the RV X has an elliptical distribution with parameters μ and Σ and generator (•), denoted X ~ Ep (μ, Σ,), if its characteristic function is given by exp (itTμ) (tTΣt).

For this and other facts about elliptical distributions which we use in the proofs below see Fang et al. (1987).

Lemma 7.1. Let X ~ E1 (μ,σ,and Y ~ E1(μ′,σ′, be univariate elliptical RVs supported on . Then X stY if and only if μ ≤ μ′ and σ = σ′.

Proof. Since X and Y have the same generator they have the stochastic representation:

$X=stμ+σRUandY=stμ′+σ′RU$
(7.3)

where R is a non-negative RV independent of the RV U satisfying (U = ±1) = 1/2 (cf., Fang et al. 1987). It follows that RU is a symmetric RV supported on with a strictly increasing DF which we denoted by F0. Let FX and FY denote the DFs of X and Y respectively. Note that X stY if and only if FX(t) ≥ FY(t) for all t , or equivalently by (7.3), if and only if

$F0(t−μσ)≥F0(t−μ′σ′)$
(7.4)

for all t . It is obvious that (7.4) holds when μμ′ and σ = σ′ establishing sufficiency. Now assume that X stY. Put t = μ in (7.4) and use the strict monotonicity of F0 to get 0 ≥ (μμ′)/σ′, i.e., μ′μ. Suppose now that σ > σ. It follows from (7.4) and the the strict monotonicity of F0 that (tμ)/σ ≥ (tμ′)/σ′ which is equivalent to t ≥ (μσ′ – μ′σ)/ (σ′ – σ). The latter, however, contradicts the fact that (7.4) holds for all t . A similar argument shows that σ′ < σ can not hold, hence we must have σ = σ′ as required.

Remark 7.1. Note that Lemma 7.1 may not hold for distributions with a finite support. For example if R ~ U (0,1) then by (7.3) X ~ U (μ – σ, μ + σ) and Y ~ U (μ′ – σ′, μ′ + σ′). It is easily verified that in this case X stY if and only if Δ = μ′ – μ ≥ 0 and –Δ ≤ σ′ – σ ≤ Δ, i.e., it is not required that σ = σ′. Hence the assumption that X and Y are supported on is necessary.

We continue with the proof of Theorem 2.2:

Proof. Let X and Y be be Ep (μ,Σ,) and Ep (μ′,Σ′, ) supported on p. Suppose that X l-stY. Choose s = ei where eik= 1 if i = k and 0 otherwise. It now follows from the definition 1.1 that Xi stYi. Since Xi and Yi are marginally elliptically distributed RVs with the same generator and supported on then by Lemma 7.1 we must have

$μi≤μi′andσii=σii′$
(7.5)

The latter holds, of course, for all 1 ≤ ip. Choosing s = ei + ej we have Xi+Xj stYi+Yj. Note that Xi+Xj and Yi+Yj are supported on and follow a univariate elliptical distribution with the same generator (Fang et al. 1987). Applying Lemma 7.1 again we find that

$μi+μj≤μi′+μj′andσii+σjj+2σij=σii′+σjj′+2σij′.$
(7.6)

The latter holds, of course, for all 1 ≤ ijp. It is easy to see that equations (7.5) and (7.6) imply that μ<μ and Σ = Σ′. Recall (cf. Fang et al. 1987) that we may write X=stμ+ RSU and Y=stμ′ + RSU where Σ = STS, U is a a uniform RV on $S+p−1$ and R is a non-negative RV. Let S be an upper set in p. Clearly the set [Sμ]:= {xμ: x S} is also an upper set and [Sμ] [Sμ′] since μμ′. Now,

where X0 = RSU, hence X stY. This proves the if part. The only if part follows immediately.

### Proof of Theorem 2.3

Proof. Let χp= {x:(x1, …,xp) {0,1}p} denote the support of a p dimensional multivariate binary (MVB) RV. By definition the relationship X l-st Y implies that for all

$P(sTX>t)≤P(sTY>t).$
(7.7)

Now note that

(7.8)

where f and g are the probability mass functions of X and Y, respectively. Let U be an upper set on χp. It is well known (cf. Davey and Priestley 2002) that U can be written as

(7.9)

where xj are the distinct minimal elements of U and U(xj)= {x: xxj} are themselves upper sets (in fact U (xj) is an upper orthant). The set {xj: j J} is often referred to as an anti–chain. Now observe that for any the set {x: sTx > t} is an upper set. Hence it must be of the form (7.9) for some anti–chain {xj: j J}. Suppose now, that for some U χp there is a vector such that $U={X:sUTX>t}$ for some fixed t > 0. The n using (7.7) and (7.8) we have

We will complete the proof by showing that for each upper set U χp we can find a vector sU for which $sUTx>t$ for x U and $sUTx≤t$ for x Uc= χ\U if and only if p ≤ 3. To do so we will first solve the system of equations sTxj= t for j J. This system can also be written as Xs = t where

is a J × p matrix whose rows are the member of the anti–chain defining U and t =(t, …,t) has dimension J. Clearly the elements of X are ones and zeros. If Jp the matrix X is of full rank since its rows are linearly independent by the fact that they are an anti–chain. Hence a solution for s exists. With a bit of algebra we can further show that a solution s ≥ 0 exists. This, of course, is trivially verified when p ≤ 3. Now set sU= s + ε for some ε ≥ 0. It is clear that we can choose ε small enough to guarantee that $sUTX>t$ if and only if x U. Hence if Jp the upper set (7.9) can be mapped to a vector sU. However the inequality Jp for all upper sets U χp holds if and only if p ≤ 3. This can be easily shown by enumerating all 18 upper sets belonging to Κ3 (cf. Davidov and Peddada 2011) and noting that they have at most three minimal elements. Hence if p ≤ 3 then X l-st Y X stY as required.

Now let p = 4 and consider the upper set U generated by the antichain xj, j = 1, …,J where xj are all the distinct permutations of the vector (1,1, 0, 0). Clearly J = 6. Note that although J > p the system of equations Xs = t is uniquely solved by = (t/2, t/2, t/2,t/2). However, this solution coincides with the solution of the system X′s = t where X′ is any matrix obtained from X be deleting any two (or just one) of its rows. Note that the rows of X′ corresponds to an upper set U′ U. This, in turn, implies that for any such U′ one can not find a vector sU’ satisfying $sUT,x>t$ if and only if x U′ because the inequality will hold for all x U. Thus U′ does not define a upper half plane. This shows that the linear stochastic order and the multivariate stochastic order do not coincide when p = 4: A similar argument may be used for any p ≥ 5. This completes the proof.

We first define the term copula.

Definition 7.2 Let F be the DF of a p dimensional RV with marginal DFs F1, …,Fp.The copula C associated with F is a DF such that

$F(x)=C(x)=C(F1(X1),…,Fp(Xp)).$

It follows that the tail–copula C̄(•) is nothing but the tail of the DF C(•).

### Proof of Theorem 2.4

Proof. Suppose that X and Y have the same copula. Let X l-stY. Choosing s = ei where eik= 1 if i = k and 0 otherwise we find using the definition that Xi stYi. The latter holds, of course, for all 1 < i < p. Applying Theorem 6.B.14 in Shaked and Shanthikumar (2007) we find that X stY. The reverse direction is immediate.

### Proof of Theorem 2.5

Proapplof. Note that for any x p we have

$F(X)=CX(F1(X1),…,Fp(Xp))≥CX(G1(X1),…,Gp(Xp)≥CY(G1(X1),…,Gp(Xp))=G(X).$

This means that X loY. The other part of the Theorem is proved similarly.

### Proof of Proposition 3.1

Proof. Let X and Y be independent MVNs with means μv and common variance matrix Σ. Clearly

$P(sTX≤sTY)=Φ(−sT(μ−v)2sT∑s).$

where is the DF of a standard normal RV. It follows that (sTXsTY) is maximized when the ratio $sT(v−μ)/sT∑s$ is maximized. From the Cauchy–Schwartz inequality we have

$sT(v−μ)sT∑s≤(v−μ)T∑−1(v−μ)$
(7.10)

for all s. It is now easily verified that s = Σ−1(νμ) maximizes the left hand side of (7.10).

### Proof of Proposition 3.2

Proof. Let Qq,q = 1, …,4 be the four quadrants. It is clear that maximizing (3.1) is equivalent to maximizing

(7.11)

It is also clear that for any s the indictors I(stZij≥0) are independent of the length of Zij which we therefore take to have length unity. Observe that the value of (7.11) is constant in the intervals (θ[i], θ[i+1]) where θ[i] are defined in Algorithm 3.1. At each point θ[i], i = 0, …,M + 1 the value of (7.11) may increase or decrease. It follows that for all where s[i] are defined in Algorithm 3.1. Therefore the maximum value of (3.1) is an element of the above list. Now suppose that s[i] is a global maximizer of (7.11). Clearly either $Ψn,m′(s[i])=Ψn,m′(s[i−1])orΨn,m′(s[i])=Ψn,m′(s[i+1])$ must hold. In which case any value in [θ[i-1][i]] or [θ[i][i-1]] is a global maximizer. This concludes the proof.

### Proof of Theorem 3.1

Proof. Using Hajek's projection and for any s we may write

$Ψn,m(s)=Ψ(s)+n−1∑i−1nψ1(Xi,S)+m−1∑j=1mψ2(Yj,s)+Rn.m(s)$
(7.12)

where

$Ψ1(Xi,s)=G¯(sTXi)−Ψ(s),Ψ2(Xj,s)=F(sTYj)−Ψ(s),$

and Rn,m (s) is a remainder term. Here (sTx) = (sTYsTx),F(sTy) = (sTXsTy) and Ψ(s) = E((sTXi)) = E(F(sTYj)).Clearly E[ψ1(Xi, s)] = E[ψ2(Yj, s)] = 0 for all i and j so by the strong law of large numbers $n−1∑i=1nΨ1(Xi,sandm−1∑j=1mΨ2(Yj,s)$ both converge to zero with probability one. Now,

The set $S+p−1$ is compact and the function ψ1(x,s) is continuous in for all values of x and bounded (in fact 1 (x,s)| ≤ 2). Thus the conditions in Theorem 3.1 in DasGupta (2008) are satisfied and it follows that .

Similarly . Since Ψn,m(s) is bounded all its moments exist; therefore from Theorem 5.3.3 in Serfling (1980) we have that with probability one that Rn,m(s)= o(1/N). Moreover it is clear that the latter holds uniformly for all s. Thus,

By assumption Ψ(smax) > Ψ(s) for all so we can apply Theorem 2.12 in Kosorok (2008) to conclude that

$s^max→a.s.smax$

i.e., Ŝmax is strongly consistent. This completes the first part of the proof.

Since the densities of X and Y are differentiable it follows that Ψ(s) is continuous and twice differentiable. In particular at the matrix ‒2Ψ(smax) exists and i positive definite. A Taylor expansion implies that

$sup‖s−smax‖<δΨ(s)−Ψ(smax)≤−Cδ2.$

It is also obvious that

$n−1∑i−1nΨ1(Xi,smax)=Op(1/N)andm−1∑j=1mΨ2(Yj,smax)=Op(1/N).$

Finally as noted above |Rn,m(s)| = O(1=N) for all s as n,m→∞. Therefore by Theorem 1 in Sherman (1993) we have that

$s^max=smax+Op(N−1/2)$
(7.13)

i.e., ŝmax converges to smax at a N1/2 rate. This completes the second part of the proof.

The functions Ψ(s), ψ1(Xi, s) and ψ2(Yj, s) on the right hand side of (7.12) all admit a quadratic expansion. A bit of algebra shows that for s in an Op(N−1/2) neighborhood of smax we have

$Ψn,m(s)=Ψ(smax)+(s−smax)TMn,mN1/2+12(s−smax)TV(s−smax)+op(1/N)$
(7.14)

where

λn,m = n/N, for j = 1,2 the function ψj (•, smax)T is the gradient of ψj (•, s) evaluated at smax and the matrix V is given by

Note that the op(1/N) term in (7.14) absorbs Rn,m(s) in (7.12) as well as the higher order terms in the quadratic expansions of $Ψ(s),n−1∑i=1nψ1(Xi,s)andm−1∑j=1mΨ2(Yj,s).$

Now by the CLT and Slutzky's Theorem we have that

where

Finally it follows by Theorem 2 in Sherman (1993) that

where Σ = V−1ΔV−1, completing the proof.

### Proof of Theorem 3.2

Proof. Suppose that X and Y are discrete RVs with finite support. Let pa= (X = xa) >0 and qb= (Y = yb) > 0 where a = 1, …,A and b = 1, …,B; A and B are finite. Define the set A simple argument shows that,

where K ≤ 2AB is finite, the sets Sk are distinct, and αk = Σ(a,b)Jk paqb with Jk = {(a,b): SkSab}. Thus Ψ(s) is a simple function on $S+p−1$ and Smax is the set associated with the largest αk. We will assume, without any loss of generality, that α1 > αk for all k ≥ 2. Now note that

where $na=∑i−1nI(Xi=Xa),mb=∑j−1mI(Yi=Yb)$ where a = 1, …,A and b = 1, …,B. Clearly Ψn;m(s) is also a simple function. Moreover for large enough n and m we will have na> 0 and mb> 0 for all a = 1, …,A and b = 1, …, B and consequently Ψn,m(s) is defined over the same sets as Ψ(s), i.e.,

where k = Σ(a,b)Jkab with a = na/n and b = mb/m. Furthermore the maximizer of Ψn,m(s) is any s Sk provided that Sk is associated with the largest k. Hence,

(7.15)

A bit of rearranging shows that

where

Note that $Zij(k)$ may be viewed as a kernel of a two sample U–Statistic. Moreover

$−|Jk\J1|≤Zij(k)≤|Jk\J1|$

is bounded (here |•| denotes set cardinality) and $E(Zij(k)=μk=E(α^1−α^k)>0$ by assumption. Applying Theorem 2 and the derivations in Section 5b in Hoeffding (1963) we have that

$P(n−1m−1∑i=1n∑j=1mZij(k)≤0)≤exp(−Nmin(λn,m,1−λn,m)μk22(|Jk\J1|+|J1\Jk|)2)$
(7.16)

where λn,m = n/N → λ (0,1) as n,m → ∞. Finally from (7.15) and (7.16) we have that

where C1= K – 1 and $C2=min(λ,1−λ)min2μk2(|Jk\J1|+|J1\Jk|)−2$ completing the proof.

### Proof of Theorem 3.3

Proof. Choose ε > 0. We have already seen that under the stated conditions Ψ(s) is continuous and therefore for each s the set Bs,ε= {s′: |Ψ(s′) – Ψ(s)| < ε} is open. The collection is an open cover for $S+p−1$. Since $S+p−1$ is compact there exists a finite subcover Bs1,ε…, Bsk,ε for $S+p−1$ where K < ∞. Hence each s belongs to some Bsk and therefore

By construction supsBsi |Ψ(si) – Ψ(s)| < ε for all i. By the law of large numbers $ΨN(si)→a.s.Ψ(si)$ as N → ∞ for each i = 1, …,K. Since K is finite

$max1≤i≤K|ΨN(si)−Ψ(si)|≤∑i=1k|ΨN(si)−Ψ(si)|→a.s.0.$

Now for any . This implies that we can choose N large enough so | ΨN(s′) – ΨN(si)| < 2ε. Moreover this bound holds for all and i so

on $S+p−1$. Since ε is arbitrary we conclude that N→∞. By assumption Ψ(smax) > Ψ(s) for all so we can apply Theorem 2.12 in Kosorok (2008) to conclude that

$s^max→a.s.smax$

i.e., ŝmax is strongly consistent. This completes the first part of the proof. We have already seen that

$sup‖s−smax‖<δΨ(s)−Ψ(smax)≤−Cδ2$

holds. We now need to bound , where E* denotes the outer expectation and (s)=I(sTZ≥0) – Ψ(s). We first note that the bracketing entropy of the upper half–planes is of the order δ/ε2. The envelope function of the class $I(sTz≥0)−I(smaxTz≥0)$ where‖ ssmax‖ < δ is bounded by $I(sTz≥0>smaxTz)+I(smaxTz≥0>sTz)$ whose squared L2 norm is

$P(sTZ≥0>smaxTZ)+P(smaxTZ≥0>sTZ).$
(7.17)

Note that we may replace the RV Z in (7.17) with the RV Z′ = Z/Z‖ whose mass is concentrated on the unit sphere. The condition that ‖ssmax‖ < δ implies that the angle between s and smax is of the order O (δ) and therefore $P(sTZ′≥0>smaxTZ′)$ is computed as surface integral on a spherical wedge with maximum width δ. It follows that (7.17) is bounded by 2Ap-1δ ‖h’‖ where Ap-1 is the area of $S+p−1$ and ‖h′‖ is the supremum of the density of Z′. Clearly ‖h’‖< ∞ since the density of Z is bounded by assumption. Thus by Corollary 19.35 in van der Vaart (2000) we have

It now follows that

(7.18)

which implies by Theorem 5.52 in van der Vaart (2000) and Theorem 14.4 of Kosorok (2008) that

$s^max=smax+Op(N−1/3)$

i.e., ŝmax converges to smax at a cube root rate. This completes the second part of the proof. The limit distribution is derived by verifying the conditions in Theorem 1.1 of Kim and Pollard (1990), denoted henceforth by KP. First note that (7.18) is condition (i) in KP. Since ŝmax is consistent condition (ii) also holds and condition (iii) holds by assumption. The differentiability of the density of Z implies that Ψ(s) is twice differentiable. The uniqueness of the maximizer implies that –2Ψ(smax) is positive definite hence condition (iv) holds (see also example 6.4 in KP for related calculations). Condition (v) in KP is equivalent to the existence of the limit H(u,v)= limα→∞ αE ((smax +u/α) (smax +v/ α)) which can be rewritten as

$limβ→01β[P(smax+βu)TZ≥0,(smax+βv)TZ≥)−P((smax+βu)TZ≥0)P((smax+βv)TZ≥0)].$

With some algebra we find that this limit exists and equals

where $δ(smaxTz)$ is the usual Dirac function; hence integration is with respect to the surface measure on ${smaxTz=0}$. It follows that condition also (v) holds. Conditions (vi) – (vii) were verified in the second part of the proof. Thus we may apply Theorem 1.1 in KP to get

where by KP Q(s)= sT2 Ψ(smax)s and W(s) is a zero mean Gaussian process with co–variance function H(u,v). This completes the proof.

### Proof of Proposition 3.3

Proof. Note that

$Ψ(s)=P(sTX≤sTY)=P(sTZ≥0)=P(sTz−sTδsT∑s≥−sTδsT∑s)=1−F(−sTδsT∑s).$

Now, by assumption the DF F is independent of s. Therefore Ψ(s) is uniquely maximized on $S+p−1$ if and only if the function

$χ(s)=sTδsT∑s$

is uniquely maximized on $S+p−1$. If Σ = I then Κ(s)= sTδ and we wish to maximize a linear function on $S+p−1$. It is easily verified (by using ideas from linear programming) that the maximizer is unique if δ ≥ 0 which is true by assumption. Incidentally, it is easy to show directly that Κ(s) is maximized at s*/ ‖s*‖ where

Now let Σ ≠ I and assume that a unique maximizer does not exist, i.e., suppose that Κ(s) is maximized by both s1 and s2. It is clear that Κ1s1) = Κ2s2) for all λ1, λ2 > 0, i.e., the value of x (•) is constant along rays through the origin. The rays passing through s1 and s2, respectively, intersect the ellipsoid sTΣs = 1 at the points p1 and p2. It follows that Κ(p1) = Κ(p2), moreover p1 and p2 maximize Κ() on the ellipsoid. Now since $p1T∑p1=1=p2T∑p2$ we must have $p1Tδ=p2Tδ$. Recall that a linear function on ellipsoid is uniquely maximized (just like on a sphere, see the comment above). Therefore we must have p1= p2 which implies that s1= s2 as required.

### Proof of Theorem 3.4

Proof. If X=stY then for all s we have sTX =stsTY. By assumption both sTX and sTY are continuous RVs so (sTXsTY) = 1/2. Suppose now that both X stY and (sTXsTY) > 1/2 for some , hold. Then we must have X l-st Y. Since X stY we have Xj stYj for 1 ≤ jp. One of these inequalities must be strict, otherwise X=stY contradicting the fact that X l-st Y. Now use Theorem 1 in Davidov and Peddada (2011) to complete the proof.

### Proof of Theorem 3.5

Proof. The functions ψ1 and ψ2 defined in the proof of Theorem 3.1 are Donsker (cf. example 19.7 in van der Vaart 2000). Hence by the theory of empirical processes applied to (7.12) we find that

(7.19)

where G(s) is a zero mean Gaussian process and convergence holds for all . We also note that (7.19) is a two sample U-Processes. A central limit theorem for such processes is described by Neumeyer (2004). Hence by the continuos mapping theorem, and under H0, we have where the covariance function of G(s), denoted by C(u,v), is given by

$1λP(uTX1≤uTX2,vTX1≤vTX3)+11−λP(uTX1≤uTX2,vTX3≤vTX2)−14λ(1−λ)$
(7.20)

where X1,X2, X3 are IID from the common DF.

### Proof of Theorem 3.6

Proof. Suppose that X l-st Y. Then for some we have which implies that By definition so Ψ(smax) > 1/2. It follows from the proof of Theorem 3.1 that Ψn,m(ŝmax) →Ψ (smax) with probability one. Thus,

$Sn,m=N1/2(Ψn,m(s^max)−1/2)→a.s.∞asn,m→∞.$

Therefore by Slutzky's theorem

$P(Sn,m>qn,m,1−α;H1)→1asn,m→∞,$

where qn,m,1−α is the critical value for an α level test based on samples of size n and m and qn,m,1−α → q1−α. Hence the test based on Sn,m is consistent. Consistency for In, m and $In,m+$ is established in a similar manner.

Now assume that X l-st Y l-st Z so that sTY stZ for all . Fix xi, i = 1,…., n and choose . Without any loss of generality assume that sTx1s x2sTxn. Define $Uj=∑i=1nI(sTxi≤sTYj)$ and $Vj=∑i=1nI(sTxi≤sTZj)$. Clearly Uj and Vj take values in J = {0,…., n}. Now, for k J we have

$P(Uj≥k)=P(sTYj≥sTXk)≤P(sTZjsTXk)=P(Vj≥k)$

where we use the fact that sTY st sTZ. It follows that Uj stVj for j = 1,…., m. Moreover {Uj} and {Vj} are all independent and it follows from Theorem 1.A.3 in Shaked and Shanthikumar (2007) that Thus . The latter holds for every value of x1,…, xn and therefore it holds unconditionally as well, i.e.,

It follows that for all Where $ΨX,Yn,m(s)$ and $ΨX,Yn,m(s)$ are defined in (3.1) and the superscripts emphasize the different arguments used to evaluate them. Thus,

and as a consequence $P(Sn,mX,Y>qn,m,1−α)≤P(Sn,mX,Y>qn,m,1−α)$ as required. The monotonicity of the power function of In,m and $In,m+$ follows immediately from the fact that for all .

## Acknowledgments

The research of Ori Davidov was partially supported by the Israeli Science Foundation Grant No 875/09 and was conducted in part when visiting Shyamal Das Peddada at the NIEHS. This research [in part] was supported by the Intramural Research Program of the NIH, National Institute of Environmental Health Sciences (Z01 ES101744-04). We thank Grace Kissling (NIEHS), Alexander Goldenshluger, Yair Goldberg and Danny Segev (University of Haifa), for their useful comments and suggestions.

## Contributor Information

Ori Davidov, Department of Statistics, University of Haifa, Mount Carmel, Haifa 31905 Israel.

Shyamal Peddada, Biostatistics Branch, National Institute of Environmental Health Sciences, Alexander, Drive, RTP, NC 27709.

## References

• Abrevaya J, Haung J. On the Bootstrap of the maximum score estimator. Econometrica. 2005;73:1175–1204.
• Arcones MA, Kvam PH, Samaniego FJ. Nonparametric estimation of a distribution subject to a stochastic precedence constraint. Journal of the American Statistical Association. 2002;97:170–182.
• Audet C, Bechard V, Le Digabel S. Nonsmooth optimization through mesh adaptive direct search and variable neighborhood search. Journal of Global Optimization. 2008;41:299–318.
• Bekele BN, Thall PF. Dose-finding based on multiple toxicities in a soft tissue sarcoma trial. Journal of the American Statistical Association. 2004;99:26–35.
• Bickel P, Sackov A. On the choice of m in the m out of n bootstrap and confidence bounds for extrema. Statistica Sinica. 2008;18:967–985.
• DasGupta A. Asymptotic Theory of Statistics and Probability. Springer; 2008.
• Davey BA, Priestley HA. Introduction to Lattices and Order. Cambridge University Press; 2002.
• Davidov O, Herman A. Multivariate stochastic orders induced by case-control sampling. Methodology and Computing in Applied Probability. 2011;13:139–154.
• Davidov O, Peddada SD. Order restricted inference for multivariate binary data with application to toxicology. Journal of the American Statistical Association. 2011;106:1394–1404. [PubMed]
• Davidov O. Ordered inference, rank statistics and combining p-values: a new perspective. Statistical Methodology. 2012;9:456–465.
• Davidov O, Herman A. Ordinal dominance curve based inference for stochastically ordered distributions. Journal of the Royal Statistical Society, Series B. 2012;74:825–847.
• Delagdo MA, Rodriguez-Poo JM, Wolf M. Subsampling inference in cube root asymptotics with an application to Manski's maximum score estimator. Economics Letters. 2001;73:241–250.
• Ding Y, Zhang X. Some stochastic orders for Kotz type distributions. Statistics and Probability Letters. 2004;69:389–396.
• Fang KT, Kots S, Ng KW. Symmetric Multivariate and Related Distributions. Chapman and Hall; 1989.
• Fisher NI, Hall P. Bootstrap confidence regions for directional data. Journal of the American Statistical Association. 1989;84:996–1002.
• Hajek J, Sidak Z, Sen PK. Theory of Rank Tests. Academic Press; 1999.
• Hoeffding W. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association. 1963;58:13–30.
• Hu J, Homem-de-Mello T, Mehrotra S. Concepts and applications of stochastically weighted stochastic dominance. 2011 Unpublished manuscript, see http://www.optimization-online.org/DB_FILE/2011/04/2981.pdf.
• Ivanova A, Murphy M. An adaptive first in man dose-escalation study of NGX267: statistical, clinical, and operational considerations. Journal of Biopharma-ceutical Statistics. 2009;19:247–255. [PubMed]
• Joe H. Multivariate Models and Dependence Concepts. Chapman and Hall; 1997.
• Johnson R, Wichern D. Applied Multivariate Statistical Analysis. Prentice Hall; 1998.
• Kim J, Pollard D. Cube root asymptotics. The Annals of Statistics. 1990;18:191–219.
• Kosorok MR. Introduction to Empirical Processes and Semiparametric Inference. Springer; 2008.
• Lee S. On a class of m out of n bootstrap confidence intervals. Journal of the Royal Statistical Society, Series B. 1999;61:901–911.
• Lucas LA, Wright FT. Testing for and against stochastic ordering between multivariate multinomial populations. Journal of Multivariate Analysis. 1991;38:167–186.
• Moser V. Observational batteries in neurotoxicity testing. International Journal of Toxicology. 2000;19:407–411.
• Neumayer N. A central limit theorem for two sample U-processes. Statistics and Probability Letters. 2004;67:73–85.
• NTP. NTP toxicology and carcinogenesiss studies of citral (microencapsulated) (CAS No 5392-40-5) in F344/N rats and B6C3F1 mice (feed studies) 2003 [PubMed]
• Peddada SD. A short note on the Pitman measure of nearness. American Statistician. 1985;39:298–299.
• Peddada SD, Chang T. Bootstrap confidence region estimation of the motion of rigid bodies. Journal of the American Statistical Association. 1996;91:231–241.
• Pitman EJG. The closest estimates of statistical parameters. Proceedings of the Cambridge Philosophical Society. 1937;33:212–222.
• Price CJ, Reale M, Robertson BL. A direct search method for smooth and non–smooth unconstrained optimization problems. Australian & New Zealand Industrial and Applied Mathematics Journal. 2008;48:927–948.
• Roy SN. On a heuristic method of test construction and its use in multivariate analysis. Annals of Mathematical Statistics. 1953;24:220–238.
• Sampson AR, Whitaker LR. Estimation of multivariate distributions under stochastic ordering. Journal of the American Statistical Association. 1989;84:541–548.
• Sen B, Banerjee M, Woodroofe M. Inconsistency of bootstrap: The Grenander estimator. The Annals of Statistics. 2010;38:1953–1977.
• Serfling RJ. Approximation Theorem of Mathematical Statistics. Wiley; 1980.
• Shaked M, Shanthikumar JG. Stochastic Orders. Springer; 2007.
• Sherman RP. The limiting distribution of the maximum rank correlation estimator. Econometrica. 1993;61:123–137.
• Silvapulle MJ, Sen PK. Constrained Statistical Inference. Wiley & Sons; 2005.
• van der Vaart AW. Asymptotic Statistics. Cambridge University Press; 2000.

 PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers.