Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC3611334

Formats

Article sections

- Abstract
- 1. Introduction
- 2. Some properties of the linear stochastic order
- 3. Directional inference
- 4. Simulations
- 5. Illustration
- 6. Concluding remarks and some open problems
- 7. Proofs
- References

Authors

Related links

Ann Stat. Author manuscript; available in PMC 2013 March 29.

Published in final edited form as:

Ann Stat. 2013 February 1; 41(1): 1–40.

doi: 10.1214/12-AOS1062PMCID: PMC3611334

NIHMSID: NIHMS426027

Ori Davidov, Department of Statistics, University of Haifa, Mount Carmel, Haifa 31905 Israel;

Ori Davidov: li.ca.afiah.tats@vodivad; Shyamal Peddada: vog.hin.shein@adaddep

See other articles in PMC that cite the published article.

Researchers are often interested in drawing inferences regarding the order between two experimental groups on the basis of multivariate response data. Since standard multivariate methods are designed for two sided alternatives they may not be ideal for testing for order between two groups. In this article we introduce the notion of the linear stochastic order and investigate its properties. Statistical theory and methodology are developed to both estimate the direction which best separates two arbitrary ordered distributions and to test for order between the two groups. The new methodology generalizes Roy's classical largest root test to the nonparametric setting and is applicable to random vectors with discrete and/or continuous components. The proposed methodology is illustrated using data obtained from a 90–day pre–chronic rodent cancer bioassay study conducted by the National Toxicology Program (NTP).

In a variety of applications researchers are interested in comparing two treatment groups on the basis of several, potentially dependent, outcomes. For example, to evaluate if a chemical is a neuro–toxicant, toxicologists compare a treated group of animals with an untreated control group in terms of various correlated outcomes such as: Tail-pinch response, Click response, and Gait score, among many others (cf. Moser, 2000). The statistical problem of interest is to compare the multivariate distributions of the outcomes in the control and treatment groups. Moreover, the outcome distributions are expected to be ordered in some sense. The theory of stochastic order relations (Shaked and Shanthikumar, 2007) provides the theoretical foundation for such comparisons.

To fix ideas let ** X** and

In this paper we address the dimensionality problem by considering an easy to understand stochastic order which we refer to as the linear stochastic order.

**Definition 1.1.** The *RV X* is said to be smaller than the

$${\mathit{s}}^{T}\mathit{X}\phantom{\rule{0.2em}{0ex}}{\underset{\_}{\prec}}_{\mathit{s}t}\phantom{\rule{0.2em}{0ex}}{\mathit{s}}^{T}\mathit{Y}$$

(1.1)

where _{st} in (1.1) denotes the usual (univariate) stochastic order.

Note that it is enough to limit (1.1) to all non-negative real vectors satisfying ║*s*║ = 1 and accordingly we denote by
${S}_{+}^{p-1}$ the positive part of the unit sphere in
* ^{P}*. We call each
$\mathit{s}\phantom{\rule{0.2em}{0ex}}\in \phantom{\rule{0.2em}{0ex}}{S}_{+}^{p-1}$ a “direction”. In other words the RVs

Comparing linear combinations has a long history in statistics. For example, in Phase I clinical trials it is common to compare dose groups using an overall measures of toxicity. Typically, this quantity is an ad hoc weighted average of individual toxicities where the weights are often known as “severity weights” (cf. Bekele and Thall, 2004, Ivanova and Murphy, 2009). This strategy of dimension reduction is not new in the statistical literature and has been used in classical multivariate analysis when comparing two or more multivariate normal populations. For example, using the union-intersection principle, the comparison of multivariate normal populations can be reduced to the comparison of all possible linear combinations of their mean vectors. This approach is the basis of Roy's classical largest root test (Roy 1953, Johnson and Wichern, 1998). Our proposed test may be viewed as nonparametric generalization of the classical normal theory method described above with the exception that we limit consideration only to non-negative linear combinations (rather than all possible linear combinations) since our main focus is to make comparisons in terms of stochastic order. We emphasize that the linear stochastic order will allow us to address the much broader problem of directional ordering for multivariate ordered data, i.e., to find the direction which best separates two ordered distributions. Based on our survey of the literature, we are not aware of any methodology that addresses the problems investigated here.

This paper is organized in the following way. In Section 2 some probabilistic properties of the linear stochastic order are explored and its relationships with other multivariate stochastic orders are clarified. In Section 3 we provide the background and motivation for directional inference under the linear stochastic order and develop estimation and testing procedure for independent as well as paired samples. In particular the estimator of the best separating direction is presented and its large sampling properties derived. We note that the problem of estimating the best separating direction is a non-smooth optimization problem. The limiting distribution of the best separating direction is derived in a variety of settings. Tests for the linear stochastic order based on the best separating direction are also developed. One advantage of our approach is that it avoids the estimation of multivariate distributions subject to order restrictions. Simulation results, presented in Section 4, reveal that for large sample sizes the proposed estimator has negligible bias and mean squared error (MSE). The bias and MSE seem to depend on the true value of the best separating direction, the dependence structure and the dimension of the problem. Furthermore, the proposed test honors the nominal type I error rate and has sufficient power. In Section 5 we illustrate the methodology using data obtained from the National Toxicology Program (NTP). Concluding remarks and some open research problems are provided in Section 6. For convenience all proofs are provided in an Appendix where additional concepts are defined when needed.

We start by clarifying the relationship between the linear stochastic order and the multivariate stochastic order. First note that *X*_{l-st}** Y** if and only if (

**Example 2.1.** Let ** X** and

The following theorem provides some closure results for the linear stochastic order.

(*i*) If *X*_{l-st}** Y** then

Theorem 2.1 shows that the linear stochastic order is closed under increasing linear transformations, marginalization, mixtures, conjugations and convergence. In particular parts (*ii*) and (*iii*) of Theorem 2.1 imply that if *X*_{l-st}** Y** then

Let **X** and **Y** be continuous elliptically distributed RVs supported on
^{p} with the same generator. Then **X** _{l-st}
**Y** if and only if **X** _{st}
** Y**.

Note that the elliptical family of distributions is large and includes the multivariate normal, multivariate *t* and the exponential power family, see Fang et al. (1987). Thus Theorem 2.2 shows that the multivariate stochastic order coincides with the linear stochastic order in the normal family. Incidentally, in the proof of Theorem 2.2 we generalize the results of Ding and Zhang (2004) on multivariate stochastic ordering of elliptical RVs. Another interesting example is:

Let **X** and **Y** be multivariate binary RVs. Then **X** _{l-st}
**Y** is equivalent to **X** _{st}
**Y** if and only if p ≤ 3.

**Remark 2.1.** In the proof of Theorem 2.2 distributional properties of the elliptical family play a major role. In contrast, Theorem 2.3 is a consequence of the geometry of the upper sets of multivariate binary RVs which turn out to be upper half planes if and only if *p* ≤ 3.

We now explore the role of the dependence structure.

Let **X** and **Y** have the same copula. Then **X** _{l-st}
**Y** if and only if **X** _{st}
**Y**.

Theorem 2.4 establishes that if two RVs have the same dependence structure as quantified by their copula function (cf. Joe 1997) then the linear and multivariate stochastic orders coincide. Such situations arise when the correlation structure among outcomes is not expected to vary with dose.

The orthant orders are also of interest in statistical applications. We say that ** X** is smaller than

If **X** _{i-st}
**Y** and C** _{X}** (

Note that *C***_{X}** (

Additional properties of the linear stochastic order as they relate to estimation and testing problems are given in Section 3.

There exists a long history of well developed theory for comparing two or more multivariate normal (MVN) populations. Methods for assessing whether there are any difierences between the populations, which differ? in which component(s)? and by how much? have been addressed in the literature using a variety of simultaneous con…dence intervals and multiple comparison methods (cf. Johnson and Wichern, 1998). Of particular interest to us is Roy's largest root test. To fix ideas consider two multivariate normal random vectors ** X** and

Our objective is to extend and generalize the classical multivariate method, described above, to non-normal multivariate ordered data. Our approach will be nonparametric. Recall that comparing MVNs is done by considering the family of statistics *T _{n,m}*(

$${R}_{k}\phantom{\rule{0.2em}{0ex}}(\mathit{s})\phantom{\rule{0.2em}{0ex}}\phantom{\rule{0.2em}{0ex}}=\phantom{\rule{0.2em}{0ex}}\sum _{i=1}^{n}{\mathbb{I}}_{({\mathit{s}}^{T}{\mathit{X}}_{i}\le {\mathit{s}}^{T}{\mathit{X}}_{k})}\phantom{\rule{0.2em}{0ex}}+\phantom{\rule{0.2em}{0ex}}\sum _{j=1}^{m}{\mathbb{I}}_{({\mathit{s}}^{T}{\mathit{Y}}_{j}\le {\mathit{s}}^{T}{\mathit{X}}_{k})}$$

be the rank of *s** ^{T}X_{k}* in the combined sample

$${\Psi}_{n,m}(\mathit{s})\phantom{\rule{0.2em}{0ex}}\phantom{\rule{0.2em}{0ex}}=\phantom{\rule{0.2em}{0ex}}\frac{1}{\mathit{\text{nm}}}\sum _{i=1}^{n}\sum _{j=1}^{m}{\mathbb{I}}_{({\mathit{s}}^{T}{\mathit{X}}_{i}\le {\mathit{s}}^{T}{\mathit{Y}}_{j})}$$

(3.1)

where *s* varies over
${\mathit{S}}_{+}^{p-1}$ Note that (3.1) unbiasedly estimates

$$\Psi (\mathit{s})\phantom{\rule{0.2em}{0ex}}\phantom{\rule{0.2em}{0ex}}=\phantom{\rule{0.2em}{0ex}}\mathbb{P}\phantom{\rule{0.2em}{0ex}}({\mathit{s}}^{T}\mathit{X}\phantom{\rule{0.2em}{0ex}}\le \phantom{\rule{0.2em}{0ex}}{\mathit{s}}^{T}\mathit{Y}).$$

(3.2)

The following result is somewhat surprising.

Let **X** and **Y** be independent MVNs with means **μ** ≤ **ν** and common variance matrix **Σ**. Then Roy's maximal separating direction **Σ**^{−1}(**ν** − **μ**) also maximizes (**s**^{T}**X** ≤ *s*^{T}** Y**).

Proposition 3.1 shows that the direction which separates the means, in the sense of Roy, also maximizes (3.2). Thus it provides further support for choosing (3.1) as our test statistic. Note that in general **Σ**^{−1}(** ν** −

Once *s*_{max} is estimated we can plug it in (3.1) to get a test statistic. Hence our test may be viewed as a natural generalization of Roy's largest root test from MVNs to arbitrary ordered distributions. However unlike Roy's method, which does not explicitly estimate **s**_{max} we do. On the other hand the proposed test does not require the computation of the inverse of the sample covariance matrix whereas Roy's test and Hotteling's *T ^{2}* test require such computations. Consequently, such tests cannot be used when

**Remark 3.1.** In the above description *X** _{i}* and

$${\Psi}_{N}(\mathit{s})=\frac{1}{N}\sum _{i=1}^{N}{\mathbb{I}}_{({\mathit{s}}^{T}{\mathit{X}}_{i}\le {\mathit{s}}^{T}{\mathit{Y}}_{i})}.$$

(3.3)

In the following we consider both sampling designs which we refer to as: (a) independent samples; and (b) paired or dependent samples. Results are developed primarily for independent samples but modification for paired samples are mentioned as appropriate.

Consider first the case of independent samples, i.e., *X*_{1},….,*X** _{n}* and

$${\Psi}_{n,m}(\mathit{s})=\frac{1}{nm}\sum _{i=1}^{n}\sum _{j=1}^{m}{\mathbb{I}}_{({\mathit{s}}^{T}{\mathit{Z}}_{ij}\ge 0)}$$

(3.4)

where *Z*_{ij}=*Y** _{j}*−

$${\widehat{\mathit{s}}}_{max}=arg\underset{\mathit{s}\in {\mathit{S}}_{+}^{p-1}}{max}{\Psi}_{n,m}(\mathit{s}).$$

(3.5)

Finding (3.5) with
$\mathit{s}\phantom{\rule{0.2em}{0ex}}\in \phantom{\rule{0.2em}{0ex}}{S}_{+}^{p-1}$ is a non-smooth optimization problem. Consider first the situation where *p =* 2: In this case we maximize (3.4) subject to
$\mathit{s}\phantom{\rule{0.2em}{0ex}}\in \phantom{\rule{0.2em}{0ex}}{S}_{+}^{1}\phantom{\rule{0.2em}{0ex}}=\phantom{\rule{0.2em}{0ex}}\left\{({\mathit{s}}_{1},{\mathit{s}}_{2})\phantom{\rule{0.2em}{0ex}}:\phantom{\rule{0.2em}{0ex}}{\mathit{s}}_{1}^{2}+{\mathit{s}}_{2}^{2}\phantom{\rule{0.2em}{0ex}}=\phantom{\rule{0.2em}{0ex}}1,\phantom{\rule{0.2em}{0ex}}({\mathit{s}}_{1,}{\mathit{s}}_{2})\phantom{\rule{0.2em}{0ex}}\ge \phantom{\rule{0.2em}{0ex}}0\right\}.$. Geometrically
${S}_{+}^{1}$ is a quarter circle spanning the first quadrant. Now let *Z**=* (*Z*_{1}*,Z*_{2}) and without any loss of generality assume that ║** Z**║

Let M denote the number of **Z _{ij}**'s which belong to the second or fourth quadrant. Map

$${\mathit{Z}}_{ij}\mapsto {\theta}_{ij}=\{\begin{array}{c}\pi /2-{cos}^{-1}({\mathit{Z}}_{ij,1})\phantom{\rule{0.2em}{0ex}}\mathit{\text{if}}\phantom{\rule{0.2em}{0ex}}{\mathit{Z}}_{ij,1}\ge 0,{\mathit{Z}}_{ij,2}<0\\ {cos}^{-1}({\mathit{Z}}_{ij,1})-\pi /2\phantom{\rule{0.2em}{0ex}}\mathit{\text{if}}\phantom{\rule{0.2em}{0ex}}{\mathit{Z}}_{\mathit{\text{ij}},1}<0,{\mathit{Z}}_{ij,2}\ge 0.\end{array}$$

(3.6)

Relabel and order the resulting angles as θ _{[1]} <…< *θ*_{[M]}. Also define *θ*_{[0]} = 0 and *θ*_{[M+1]} = π/2. Evaluate Ψ* _{n,m}*(

In light of the above discussion it can be easily proved that:

For p = 2 Algorithm 3.1 maximizes (3.1).

In the general case, i.e., for *p* ≥ 3; each observation ** Z_{ij}** is associated with a “slice” of
${S}_{+}^{p-1}$. The boundaries of the slice are the intersection of
${S}_{+}^{p-1}$ and some half-plane. Note that when

**Remark 3.2.** It is clear that the estimation procedure for paired samples is the same as for independent samples.

We find three different asymptotic regimes for *ŝ*_{max} depending on distributional assumptions and the sampling scheme (paired versus independent samples).

Note that the parameter space is the unit sphere not the usual Euclidian space. There are several ways of dealing with this irregularity, one of which is to re-express the last coordinate of
$\mathit{s}\phantom{\rule{0.2em}{0ex}}\in \phantom{\rule{0.2em}{0ex}}{S}_{+}^{p-1}$ as
${\mathit{s}}_{p}\phantom{\rule{0.2em}{0ex}}=\phantom{\rule{0.2em}{0ex}}\sqrt{1-{\mathit{s}}_{1}^{2}-\mathrm{\dots}-{\mathit{s}}_{p-1}^{2}}$ and consider the parameter space
$\left\{\mathit{s}\phantom{\rule{0.2em}{0ex}}\ge \phantom{\rule{0.2em}{0ex}}0\phantom{\rule{0.2em}{0ex}}:\phantom{\rule{0.2em}{0ex}}{\mathit{s}}_{1}^{2}+\mathrm{\dots}+{\mathit{s}}_{p-1}^{2}\phantom{\rule{0.2em}{0ex}}\le \phantom{\rule{0.2em}{0ex}}1\right\}$ which is a compact subset of
^{p−1}. Clearly these parameterizations are equivalent and without any further ambiguity we will denote them both by
${S}_{+}^{p-1}$. Thus in the proofs below both views of
${S}_{+}^{p-1}$ are used interchangeably as convenient.

We begin our discussion with independent samples assuming continuous distributions for both ** X** and

Let ** X** and

$${N}^{1/2}({\widehat{\mathit{s}}}_{max}-{\mathit{s}}_{max})\Rightarrow N(0,\sum )$$

where the matrix Σ is defined in the body of the proof.

Although (3.1) is not continuous (nor differentiable) its U–statistic structure guarantees that it is “almost” so (i.e., it is continuous up to an *o _{p}*(

**Remark 3.3**. Note that if either ** X** or

We have not been able to find general condition(s) for a unique maximizer for Ψ(** s**) although important sufficient conditions can be found. For example,

If ** Z** =

$$\frac{{\mathit{s}}^{T}\mathit{Z}-{\mathit{s}}^{T}\mathit{\delta}}{\sqrt{{\mathit{s}}^{T}\sum \mathit{s}}}$$

is independent of ** s** then the maximizer of Ψ(

The condition above is satisfied by location scale families and it may be convenient to think of ** δ** and

If **X** and **Y** have discrete distributions with finite support and **ŝ**_{max} is a maximizer of (3.1), then

$$\mathbb{P}({\widehat{\mathit{s}}}_{max}\notin {S}_{max})\le {C}_{1}exp(-{C}_{2}N)$$

for some positive constants C_{1} and C_{2}.

Theorem 3.2 shows that the probability that a maximizer of Ψ_{n,m} is not in *S*_{max} is exponentially small when the underlying distributions of ** X** and

Finally, we consider the case of continuous RVs under paired samples. Then under the conditions of Theorem 3.1 and provided the density of *Z**=*** Y** −

Under the above mentioned conditions *ŝ*_{max}, the maximizer of (3.3), is strongly consistent, i.e.,
${\widehat{\mathit{s}}}_{max}\stackrel{a.\mathit{s}.}{\to}\phantom{\rule{0.2em}{0ex}}{\mathit{s}}_{max}$, converges at a cube root rate, that is *ŝ*_{max} = **s**_{max} + *O _{p}*(

$${N}^{1/3}({\widehat{\mathit{s}}}_{max}-{\mathit{s}}_{max})\Rightarrow \mathit{W}$$

where **W** has the distribution of the almost surely unique maximizer of the process s −[*Q* (** s**) + W;(

Theorem 3.3 shows that in paired samples *ŝ*_{max} is consistent but in contrast with Theorem 3.1 it converges at a cube-root rate to a non-normal limit. The cube root rate is due to the discontinuous nature of the objective function (3.3). General results dealing with this kind of asymptotics for independent observations are given by Kim and Pollard (1990). The main difference between Theorems 3.1 and 3.3 is that the objective function (3.1) is smoothed by its U-statistic structure while (3.3) is not.

Since the parameter space is the surface of a unit sphere it is natural to define the (1−α) × 100% confidence set for *s*_{max} centered at *ŝ*_{max} by

$$\{\mathit{s}\in {S}_{+}^{p-1}:{\widehat{\mathit{s}}}_{max}^{T}\mathit{s}\le {C}_{\alpha ,N}\}$$

where *C _{α,N}* satisfies
$\mathbb{P}({\widehat{\mathit{s}}}_{max}^{T}\mathit{s}\phantom{\rule{0.2em}{0ex}}\le \phantom{\rule{0.2em}{0ex}}{C}_{\alpha ,N})\phantom{\rule{0.2em}{0ex}}=\phantom{\rule{0.2em}{0ex}}1-\alpha .$For more details see Fisher and Hall (1989) or Peddada and Chang (1996). Hence the confidence set is the set of all
$\mathit{s}\phantom{\rule{0.2em}{0ex}}\in \phantom{\rule{0.2em}{0ex}}{S}_{+}^{p-1}$ which have a small angle with

**Remark 3.4.** Since in the case of paired samples, the estimator converges at cube root rate rather than the square root rate, the standard bootstrap methodology may yield inaccurate coverage probabilities (see Abrevaya and Huang 2005 and Sen et al. 2010). For this reason we recommend the “M out of N” bootstrap methodology. For further discussion on the “M out of N” bootstrap methodology one may refer to Lee (1999), Delagdo et al. (2001), Bickel and Sackov (2008).

Consider first the case of independent samples where interest is in testing the hypothesis

$${H}_{0}:\mathit{X}{=}_{\mathit{\text{st}}}\mathit{Y}\phantom{\rule{0.4em}{0ex}}\text{versus}\phantom{\rule{0.4em}{0ex}}{H}_{1}:\mathit{X}{\prec}_{\mathit{\text{st}}}\mathit{Y}.$$

(3.7)

Thus (3.7) tests whether the distributions of ** X** and

Let **X** and **Y** be independent and continuous RVs. If ** X** =

Theorem 3.4 says that if it is known apriori that {*X*_{st}** Y**}

**Remark 3.5**. The assumption that **X** _{st}**Y** is natural in applications such as environmental sciences where high exposures are associated with increased risk. Nevertheless if the assumption that **X** _{st}**Y** is not warranted then the alternative hypothesis formulated in terms of the linear stochastic order actually tests whether there exists a ** s**
$\in \phantom{\rule{0.2em}{0ex}}{S}_{+}^{p-1}$ for which (

**Remark 3.6**. In the proof of Theorem 3.4 we use the fact that given that **X** _{st}
**Y** we have **X** _{st}**Y** provided **X _{i}**

Hence (3.7) can be reformulated in terms of the linear stochastic. In particular it justifies using the statistic

$${S}_{n,m}={N}^{1/2}({\Psi}_{n,m}({\widehat{\mathit{s}}}_{max})-1/2).$$

(3.8)

To the best of our knowledge this is the first general test for multivariate ordered distributions. In practice we first estimate *ŝ*_{max} and then define
${\widehat{U}}_{\mathrm{i}}\phantom{\rule{0.2em}{0ex}}=\phantom{\rule{0.2em}{0ex}}{\widehat{\mathit{s}}}_{max}^{\mathrm{T}}{\mathit{X}}_{\mathrm{i}}$ and
${\widehat{V}}_{\mathrm{i}}\phantom{\rule{0.2em}{0ex}}=\phantom{\rule{0.2em}{0ex}}{\widehat{\mathit{s}}}_{max}^{\mathrm{T}}{\mathit{Y}}_{\mathrm{j}}$ where *i =* 1,…,*n* and *j =* 1,…,*m*. Hence (3.8) is nothing but a WMW test based on the ** U**ŝ's and

The large sample distribution of (3.8) is given in the following.

Suppose the null (3.7) holds. Let n, m → ∞ and n/(n + m) → λ (0, 1). Then,

$${S}_{n,m}\Rightarrow S=\underset{\mathit{s}\in {S}_{+}^{p-1}}{\text{sup}}\mathbb{G}(\mathit{s})$$

*where* G (** s**)

**Remark 3.7**. Since
${\widehat{\mathit{s}}}_{max}\stackrel{a.\mathit{s}.}{\to}\phantom{\rule{0.2em}{0ex}}{\mathit{s}}_{max}$ by Slutzky's theorem the power of the test (3.8) converges to the power of a WMW test comparing the samples
$({\mathit{s}}_{max}^{T}{\mathit{X}}_{1},\dots ,{\mathit{s}}_{max}^{T}{\mathit{X}}_{n})$ and
$({\mathit{s}}_{max}^{T}{\mathit{Y}}_{1},\dots ,{\mathit{s}}_{max}^{T}{\mathit{Y}}_{m})$ The “synthetic” test assuming that **s**_{max} is known serves as a gold standard as verified by our simulation study.

**Remark 3.8**. Furthermore, the power of the test under local alternatives, i.e., when **Y** =_{st}**X** + N^{−1/2}**δ** and N → ∞ is bounded by the power of the WMW test comparing the distributions of
${\mathrm{s}}_{max}^{T}\mathit{X}$ and
${\mathrm{s}}_{max}^{T}\mathit{Y}\phantom{\rule{0.2em}{0ex}}=\phantom{\rule{0.2em}{0ex}}{\mathrm{s}}_{max}^{T}\mathit{X}+{N}^{-1/2}{\mathrm{s}}_{max}^{T}\delta .$

Alternatives to the “sup” statistic (3.8) are the “integrated” statistics

$${I}_{n,m}={\int}_{\mathit{s}\in {S}_{+}^{p-1}}\left[{N}^{1/2}(\Psi n,m(\mathit{s})-1/2)\right]\mathit{ds}\text{and}{I}_{n,m}^{+}={\int}_{\mathit{s}\in {S}_{+}^{p-1}}{\left[{N}^{1/2}(\Psi n,m(\mathit{s})-1/2)\right]}_{+}\mathit{\text{ds}}$$

(3.9)

where [*x*]* _{+}=* max (0,

$${I}_{n,m}^{+}\Rightarrow {\int}_{\mathit{s}\in {S}_{+}^{p-1}}{\left[\mathbb{G}(\mathit{s})\right]}_{+}\mathit{\text{ds}}.$$

This distribution does not have a closed form. The statistics *I _{n,m}* and
${I}_{n,m}^{+}$ have univariate analogues, cf. Davidov and Herman (2012). Finally,

The tests (3.8) and (3.9) are consistent. Furthermore if **X** _{st}**Y** _{st}**Z** then all three tests for H_{0}: **X** =_{st}**Z** versus H_{1}: **X** _{st}**Z** are more powerful than the respective tests for H_{0}: **X** =_{st}**Y** versus H_{1}: **X** _{st}**Y**.

Theorem 3.6 shows that the tests are consistent and that their power function is monotone in the linear stochastic order.

**Remark 3.9**. Qualitatively similar results are obtainable in the paired sampling case; the only difference being the limiting process. For example it easy to see that the paired sample analogue of (3.8) satisfies

$${N}^{1/2}({\Psi}_{N}({\widehat{\mathit{s}}}_{max})-1/2)\Rightarrow \underset{\mathit{s}\in {S}_{+}^{p-1}}{\text{sup}}\mathbb{Q}(\mathit{s})$$

*where ( s) is the empirical process on*
${S}_{+}^{p-1}$ associated with (3.3). Analogues of I

For simplicity of exposition, and motivated by the fact that the example we analyzed in this paper deals with independent samples, we limit our simulations to the case of independent samples.

We start by investigating the distribution of *ŝ*_{max} by simulation. For simplicity we choose *p =* 3 and generated *X** _{i}*(i = 1,…,

The simulation study consists of three parts. In the first part we evaluate the accuracy and precision of *ŝ*_{max} by estimating its bias and mean squared error (MSE). In the second part we investigate the coverage probability of bootstrap confidence intervals: In the third part we estimate type I errors and powers of the proposed test *S _{n,m}* as well as the integral tests

To evaluate the bias and MSEs we generated *X*_{1},…., *X** _{n}* ~

We conducted extensive simulation studies to evaluate the performance of the bootstrap confidence intervals. In this paper we present a small sample of our study. We generated data from two 5-dimensional normal populations with means **0** and ** δ**, respectively and a common covariance

Coverage probabilities for the bootstrap confidence intervals for *p* = 5 normal data. Pattern *i* = 1,2 corresponds to *δ*_{1}
*=* (.1, .25, .5, .75, .9) and *δ*_{2}
*=* (.5, .5, .5, .5, .5).

The goal of the third part of our simulation study was to evaluate the type I error and the power of the test (3.8). To evaluate the type I error three different baseline distributions for the two populations ** X** and

Type I errors for the proposed procedure with nominal level *α* = .05. Three types of distributions are considered: MVNs, MV-LogN (multivariate lognormal) and Mix-MVN (mixtures of MVNs).

Power comparisons were carried out for data generated from *X*_{1},…., *X** _{n}* ~

Additionally, in Table 5 we evaluate the type I error and power of our test when *X*_{1},….,*X** _{n}* ~

Power comparisons of the two proposed test procedures with type I error of .050. Here *δ*_{1} = (0.1, 0.5, .9), *δ*_{2} = (0.1, 0.25, 0.5, 0.75, 0.9), *δ*_{3} = (0.5, 0.5, 0.5) and *δ*_{4} = (0.5, 0.5, 0.5, 0.5, 0.5) and *n* = *m* = 15.

Simulation results reported in this paper are based on 1000 simulation runs. Confidence sets are calculated using 1000 bootstrap samples, The bootstrap critical values for estimating type I error were based on 500 bootstrap samples. Since the results between 100 bootstrap samples and 500 bootstrap samples did not differ by much, all powers were estimated using 100 bootstrap samples.

The Bias and MSEs for the patterns considered are summarized in Table 1.

It is clear that the bias decreases with the sample size as do the MSEs. We observe that the bias tends to be smaller under independence and negative dependence compared with positive dependence. It also tends to be smaller when the data are exchangeable. Although results are not presented, we evaluated squared bias and MSE for larger values of *p* (e.g., *p* = 5, 10 and 20) and as expected the total squared bias and total MSE increased with the dimension *p*.

In Table 2 we summarize the estimated coverage probabilities of the bootstrap confidence intervals when *p = 5.* Our simulation study suggests that the proposed bootstrap methodology seems to perform better for larger sample sizes but rather poorly for smaller samples sizes.

Type I errors for different patterns considered in our simulation study are summarized in Table 3.

Our simulation studies suggest that in every case the proposed bootstrap based test maintains the nominal level of 0.05. In general it is slightly conservative. The performance of the test is not affected by the shape of the underlying distribution. That is not surprising owing to the nonparametric nature of the test. Furthermore, we evaluated the type I error of the proposed bootstrap test for testing the null hypothesis (3.7) for *p* as large as 20 with *n = m =* 10 and discovered that the proposed test attains the nominal level of 0.05 even *n* ≤ *p*. See Table 4. As commented earlier in the paper Hotelling's *T*^{2} statistic can not be applied here since the Wishart matrix is singular in this case. However, the proposed method is still applicable since the estimation of the best direction does not require the inversion of a matrix.

The power of the tests (3.8) and (3.9) for various patterns considered in our simulation study are summarized in Table 5a and and5b5b.

Power comparisons of the two proposed test procedures with type I error of .050. Here *δ*_{1}
*=* (0.1, 0.5, .9), *δ*_{2} = (0.1, 0.25, 0.5, 0.75, 0.9), *δ*_{3} = (0.5, 0.5, 0.5) and *δ*_{4} = (0.5, 0.5, 0.5, 0.5, 0.5) and *n* = *m* = 25.

As expected, in every case the power of the TMD test is higher than that of *S _{n}* test and the RMD test. The

Prior to conducting a 2-year rodent cancer bioassay to evaluate the toxicity/carcinogenicity of a chemical, the National Toxicology Program (NTP) routinely conducts a 90-day pre-chronic dose finding study. One of the goals of the 90-day study is to determine the maximum tolerated dose (MTD) that can be used in the 2-year chronic exposure study. Accurate determination of the MTD is critical for the success of the 2-year cancer bioassay. Cancer bioassays are typically very expensive and time consuming. Therefore their proper design, i.e., choosing the correct dosing levels, is very important. When the highest dose used in the 2-year study exceeds the MTD, a large proportion of animals in the high dose group(s) may die well before the end of the study and the data from such group(s) cannot be used reliably. This results in inefficiency and wasted resources.

Typically the NTP uses the 90-day study to determine the MTD on the basis of a large number of correlated endpoints that provide information regarding toxicity. These include body weight, organ weights, clinical chemistry (red blood cell counts, cell volume, hemoglobin, hematocrit, lymphocytes, etc.), histopathology (lesions in various target organs), number of deaths and so forth. The dose response data is analyzed for each variable separately using Dunnett's or the Williams’ test (or their nonparametric versions, Dunn's test and Shirley's test, respectively). NTP combines results from all such analyses qualitatively and uses other biological and toxicological information when making decisions regarding the highest dose for the 2-year cancer bioassay. Analyzing correlated variables one at a time may result in loss of information. The proposed methodology provides a convenient method to combine information from several outcome variables to make comparisons between groups.

We now illustrate our methodology by re-analyzing data obtained from a recent NTP study of the chemical Citral (NTP, 2003). Citral is a flavoring agent that is widely used in a variety of food items. The NTP assigned a random sample of 10 male rats to the control group and 10 to the 1785 mg/Kg dose group. Hematological and clinical chemistry measurements such as the number of Platelets (in 1000 per L), Urea Nitrogen (UN) (in mg/dL), Alkaline Phosphatase (AP) (in IU/L), and Bile Acids (BA) (in Mol/L) were recorded on each animal at the end of the study. The NTP performed univariate analysis on each of these variables and found no significant difference between the control and dose group except for the concentration of Urea Nitrogen which was increased in the high dose group. This increase was marginally significant at the 5% level and not at all after correcting for multiplicity. We applied the proposed methodology to compare the control with the high dose group (1785 mg/Kg) in terms of all non-negative linear combinations of the above mentioned four variables. We test the null hypothesis of no difference between the control and the high dose group against the alternative that the high dose group is stochastically larger (in the above four variables) than the control group. The resulting p-value based on 10,000 bootstrap samples was 0.25, which is significant at a 5% level of significance. The estimated value of *s*_{max} was (0.074, 0.986, 0.012, 0.150)^{T} and the estimated 95% confidence region is given by
$\left\{\mathit{s}\phantom{\rule{0.2em}{0ex}}\in \phantom{\rule{0.2em}{0ex}}{S}_{+}^{p-1}:\phantom{\rule{0.2em}{0ex}}{\widehat{\mathit{s}}}_{max}^{T}\mathit{s}\phantom{\rule{0.2em}{0ex}}\le \phantom{\rule{0.2em}{0ex}}0.93\right\}$. Hence the confidence set includes any ** s** which is within 21.5° degrees of

In many applications, researchers are interested in comparing two experimental conditions, e.g., a treatment and a control group, in terms of a multivariate response. In classical multivariate analysis one addresses such problems by comparing the mean vectors using Hotelling's *T*^{2} statistic. The assumption of MVN, underlying Hotelling's *T*^{2} test, may not hold in practice. Moreover if the data is not MVN then the comparison of population means may not always provide complete information regarding the differences between the two experimental groups. Secondly, Hotelling's *T*^{2} statistics is designed for two-sided alternatives and may not be ideal if a researcher is interested in one sided, i.e., ordered, alternatives. Addressing such problems requires one to compare the two experimental groups nonparametrically in terms of the multivariate stochastic order. Such comparisons, however, are very high dimensional and not easy to perform.

In this article we circumvent this challenge by considering the notion of the linear stochastic order between two random vectors. The linear stochastic order is a “weak” generalization of the univariate stochastic order. The linear stochastic order is simple to interpret and has an intuitive appeal. Using this notion of ordering, we developed nonparametric directional inference procedures. Intuitively, the proposed methodology seeks to determine the direction that best separates two multivariate populations. Asymptotic properties of the estimated direction are derived. Our test based on the best separating direction may be viewed as a generalization of Roy's classical largest root test for comparing several MVN populations. To the best of our knowledge this is the first general test for multivariate ordered distributions. Since in practice sample sizes are small, we use the bootstrap methodology for drawing inferences.

We illustrated the proposed methodology using a data obtained from a recent toxicity/carcinogenicity study conducted by the US National Toxicology Program (NTP) on the chemical Citral. A re-analysis of their 90-day data using our proposed methodology revealed a linear stochastic increase in Platelets, Urea Nitrogen, Alkaline Phosphatase, and Bile Acids in the high dose group relative to the control group, which was not seen in the original univariate analysis conducted by the NTP. These findings suggest that the proposed methodology may have greater sensitivity than the commonly used univariate statistical procedures. Our methodology is sufficiently general since it is nonparametric and can be applied to discrete and/or continuous outcome variables. Furthermore, our methodology exploits the underlying dependence structure in the data, rather than analyzing one variable at a time.

We note that our example and some of our results pertain to continuous RVs. However the methodology may be used, with appropriate modification (e.g., methods for dealing with ties) with discrete (or mixed) data with no problem. Although the focus of this paper has been the comparison of two multivariate vectors, in many applications, especially in dose response studies, researchers may be interested in determining trends (order) among several groups. Similar to classical parametric order restricted inference literature one could generalize the methodology developed in this paper to test for order restrictions among multiple populations. For example one could extend the results to ** K** ≥ 2 RVs ordered by the simple ordering, i.e.,

$${N}^{1/2}({\Psi}_{n,m}({\widehat{\mathit{s}}}_{min})-1/2)$$

where *ŝ*_{min} is the value which minimizes Ψ* _{n,m}*(

We believe that the result obtained here may be useful beyond order restricted inference. Our simulation study suggests that our estimator of the best separating direction, i.e., (3.5), may be useful even in the context of classical multivariate analysis where it may be viewed as a robust alternative to Roy's classical estimate. Finally we note that the linear stochastic order may be useful in a variety of other statistical problems. For example, we believe that it provide a useful framework for linearly combining the results of several diagnostic markers. This is a well known problem in the context of ROC curve analysis in diagnostic medicine.

Proof. *(i)* Let *g:*
^{p} →
^{n} be an affine increasing function. Clearly *g*(*x*)*= v* + ** Mx** for some

$${\mathit{u}}^{T}g(\mathit{X})={\mathit{u}}^{T}(\mathit{v}+\mathit{\text{MX}})={\mathit{u}}^{T}\mathit{v}+{\mathit{s}}^{T}\mathit{X}{\underset{\_}{\prec}}_{\mathit{s}\mathrm{t}}{\mathit{u}}^{T}\mathit{v}+{\mathit{s}}^{T}\mathit{Y}={u}^{T}(v+\mathit{\text{MY}})={\mathit{u}}^{T}g(\mathit{Y})$$

as required where the inequality holds because *X**≤*_{st}** Y**. (

$${\mathit{s}}_{I}^{T}{\mathit{X}}_{I}={\mathit{s}}^{T}\mathit{X}{\underset{\_}{\prec}}_{\mathit{\text{st}}}{\mathit{s}}^{\mathit{T}}\mathit{Y}={\mathit{s}}_{I}^{T}{\mathit{Y}}_{I}$$

as required. (*iii*) Let :
→
be any increasing function. Note that

$$\mathbb{E}(\phi ({\mathit{s}}^{T}\mathit{X}))=\mathbb{E}(\mathbb{E}(\phi ({\mathit{s}}^{T}\mathit{X})|\mathit{Z}))\le \mathbb{E}(\mathbb{E}(\phi ({\mathit{s}}^{T}\mathit{Y})|\mathit{Z}))=\mathbb{E}(\phi ({\mathit{s}}^{T}\mathit{Y})).$$

The inequality is a consequence of ** X**|

$${\mathit{s}}^{T}\mathit{X}={\mathit{s}}_{1}^{T}{\mathit{X}}_{1}+\cdot \cdot \cdot +{\mathit{s}}_{n}^{T}{\mathit{X}}_{n}\phantom{\rule{0.2em}{0ex}}\text{and}\phantom{\rule{0.2em}{0ex}}{\mathit{s}}^{T}\mathit{Y}={\mathit{s}}_{1}^{T}{\mathit{Y}}_{1}+\cdot \cdot \cdot +{\mathit{s}}_{n}^{T}{\mathit{Y}}_{n}$$

by assumption
${\mathit{s}}_{i}^{T}{\mathit{X}}_{i}{\underset{\_}{\prec}}_{\mathit{s}\mathrm{t}}{\mathit{s}}_{i}^{T}{\mathit{Y}}_{i}$ for *i* = 1,…, *n*. In addition
${\mathit{s}}_{i}^{T}{\mathit{X}}_{i}$ and
${\mathit{s}}_{i}^{T}{\mathit{X}}_{j}$ are independent for *i* ≠ *j*. It follows from Theorem 1.A.3 in Shaked and Shanthikumar (2007) that
${\mathit{s}}_{1}^{T}{\mathit{X}}_{1}+\cdots +{\mathit{s}}_{n}^{T}{\mathit{X}}_{n}{\underset{\_}{\prec}}_{\mathit{s}\mathrm{t}}{\mathit{s}}_{1}^{T}{\mathit{Y}}_{1}+\cdots +{\mathit{s}}_{n}^{T}{\mathit{Y}}_{n}$, i.e., *X*_{st}** Y** as required. (

$$\mathbb{P}({\mathit{s}}^{T}{\mathit{X}}_{n}\ge t)\to \mathbb{P}({\mathit{s}}^{T}\mathit{X}\ge t)\phantom{\rule{0.2em}{0ex}}\text{and}\phantom{\rule{0.2em}{0ex}}\mathbb{P}({\mathit{s}}^{T}{\mathit{Y}}_{n}\ge t)\to \mathbb{P}({\mathit{s}}^{T}\mathit{Y}\ge t).$$

(7.1)

Moreover since *X*_{n}_{l-st}*Y** _{n}* we have

$$\mathbb{P}({\mathit{s}}^{T}{\mathit{X}}_{n}\ge t)\le \mathbb{P}({\mathit{s}}^{T}{\mathit{Y}}_{n}\ge t)\phantom{\rule{0.2em}{0ex}}\text{for all}\phantom{\rule{0.2em}{0ex}}n\in \mathbb{N}.$$

(7.2)

Combining (7.1) and (7.2) we have (*s*^{T}** X** ≥

Before proving Theorem 2.2 we provide a definition and a preliminary Lemma.

**Definition 7.1.** We say that the RV **X** has an elliptical distribution with parameters **μ** and **Σ** and generator (•), denoted X ~ E_{p} (**μ**, **Σ**,), if its characteristic function is given by exp (it^{T}**μ**) (t^{T}**Σt**).

For this and other facts about elliptical distributions which we use in the proofs below see Fang et al. (1987).

**Lemma 7.1.** Let X ~ E_{1} (μ,σ,and Y ~ E_{1}(μ′,σ′, be univariate elliptical RVs supported on
. Then X _{st}Y if and only if μ ≤ μ′ and σ = σ′.

**Proof.** Since *X* and *Y* have the same generator they have the stochastic representation:

$$\mathit{X}{=}_{\mathit{\text{st}}}\mu +\sigma \mathit{\text{RU}}\phantom{\rule{0.2em}{0ex}}\text{and}\phantom{\rule{0.2em}{0ex}}\mathit{Y}{=}_{\mathit{\text{st}}}{\mu}^{\prime}+{\sigma}^{\prime}\text{RU}$$

(7.3)

where *R* is a non-negative RV independent of the RV *U* satisfying (*U =* ±1) = 1/2 (cf., Fang et al. 1987). It follows that *RU* is a symmetric RV supported on
with a strictly increasing DF which we denoted by *F*_{0}. Let *F _{X}* and

$${F}_{0}(\frac{\mathrm{t}-\mu}{\sigma})\ge {F}_{0}(\frac{\mathrm{t}-{\mu}^{\prime}}{{\sigma}^{\prime}})$$

(7.4)

for all *t*
. It is obvious that (7.4) holds when *μ* ≤ *μ′* and σ = σ′ establishing sufficiency. Now assume that *X* * _{st}Y.* Put

**Remark 7.1.** Note that Lemma 7.1 may not hold for distributions with a finite support. For example if R ~ U (0,1) then by (7.3) X ~ U (μ – σ, μ + σ) and Y ~ U (μ′ – σ′, μ′ + σ′). It is easily verified that in this case X _{st}Y if and only if Δ = μ′ – μ ≥ 0 and –Δ ≤ σ′ – σ ≤ Δ, i.e., it is not required that σ = σ′. Hence the assumption that X and Y are supported on
is necessary.

We continue with the proof of Theorem 2.2:

**Proof.** Let ** X** and

$${\mu}_{i}\le {\mu}_{i}^{\prime}\phantom{\rule{0.2em}{0ex}}\text{and}\phantom{\rule{0.2em}{0ex}}{\sigma}_{\mathit{\text{ii}}}={\sigma}_{\mathit{\text{ii}}}^{\prime}$$

(7.5)

The latter holds, of course, for all 1 ≤ *i* ≤ *p.* Choosing *s* = e_{i} + e_{j} we have *X _{i}+X_{j}*

$${\mu}_{i}+{\mu}_{j}\le {\mu}_{i}^{\prime}+{\mu}_{j}^{\prime}\phantom{\rule{0.2em}{0ex}}\text{and}\phantom{\rule{0.2em}{0ex}}{\sigma}_{\mathit{\text{ii}}}+{\sigma}_{\mathit{\text{jj}}}+2{\sigma}_{\mathit{\text{ij}}}={\sigma}_{\mathit{\text{ii}}}^{\prime}+{\sigma}_{\mathit{\text{jj}}}^{\prime}+2{\sigma}_{\mathit{\text{ij}}}^{\prime}.$$

(7.6)

The latter holds, of course, for all 1 ≤ *i* ≠ *j* ≤ *p*. It is easy to see that equations (7.5) and (7.6) imply that *μ**<**μ**′* and **Σ** = **Σ**′. Recall (cf. Fang et al. 1987) that we may write *X**= _{st}*

$$\mathbb{P}(\mathit{X}\in S)=\mathbb{P}({\mathit{X}}_{0}\in \left[S-\mu \right])\le \mathbb{P}({\mathit{X}}_{0}\in \left[S-{\mu}^{\prime}\right])=\mathbb{P}(\mathit{Y}\in S)$$

where *X*_{0} = *RSU,* hence *X*_{st}** Y**. This proves the if part. The only if part follows immediately.

Proof. Let *χ _{p}=* {

$$\mathbb{P}({\mathit{s}}^{T}\mathit{X}>t)\le \mathbb{P}({\mathit{s}}^{T}\mathit{Y}>t).$$

(7.7)

Now note that

$$\mathbb{P}({\mathit{s}}^{T}\mathit{X}>t)=\sum _{\mathit{x}\in {\chi}_{p}}f(\mathit{x}){\mathbb{I}}_{({\mathit{s}}^{T}\mathit{x}>t)}\phantom{\rule{0.2em}{0ex}}\text{and}\phantom{\rule{0.2em}{0ex}}\mathbb{P}({\mathit{s}}^{T}\mathit{Y}>t\phantom{\rule{0.2em}{0ex}}=\sum _{\mathit{x}\in {\chi}_{p}}g(\mathit{x}){\mathbb{I}}_{({\mathrm{s}}^{\mathrm{T}}\mathit{x}>t)}$$

(7.8)

where *f* and *g* are the probability mass functions of ** X** and

$$U={\cup}_{j\in J}U({\mathit{X}}_{j})$$

(7.9)

where *x _{j}* are the distinct minimal elements of

$$\mathbb{P}(\mathit{X}\in U)=\sum _{\mathit{x}\in \{\mathit{x}:{\mathit{s}}_{U}^{T}\mathit{x}>\mathrm{t}\}}f(\mathit{x})=\mathbb{P}({\mathit{s}}_{U}^{T}\mathit{x}>t)\le \mathbb{P}({\mathit{s}}_{U}^{T}\mathit{Y}>t)=\sum _{\mathit{x}\in \{\mathit{x}:{\mathit{s}}_{U}^{T}\mathit{x}>t\}}g(\mathit{x})=\mathbb{P}(\mathit{Y}\in U).$$

We will complete the proof by showing that for each upper set *U* χ* _{p}* we can find a vector

$$\mathrm{X}=\left(\begin{array}{l}{\mathit{X}}_{1}\\ \vdots \\ {\mathit{X}}_{J}\end{array}\right)$$

is a *J* × *p* matrix whose rows are the member of the anti–chain defining *U* and *t =*(*t, …,t*) has dimension *J.* Clearly the elements of X are ones and zeros. If *J* ≤ *p* the matrix ** X** is of full rank since its rows are linearly independent by the fact that they are an anti–chain. Hence a solution for

Now let *p =* 4 and consider the upper set *U* generated by the antichain *x _{j}, j =* 1, …,

We first define the term copula.

**Definition 7.2** Let F be the DF of a p dimensional RV with marginal DFs F_{1}, …,F_{p.}The copula C associated with F is a DF such that

$$F(\mathit{x})=C(\mathit{x})=C({F}_{1}({\mathit{X}}_{1}),\dots ,{F}_{p}({\mathit{X}}_{p})).$$

It follows that the tail–copula C̄(•) is nothing but the tail of the DF C(•).

Proof. Suppose that ** X** and

Proapplof. Note that for any *x*
^{p} we have

$$\begin{array}{l}F(\mathit{X})={C}_{\mathit{X}}({F}_{1}({\mathit{X}}_{1}),\dots ,{F}_{p}({\mathit{X}}_{p}))\ge {C}_{\mathit{X}}({G}_{1}({\mathit{X}}_{1}),\dots ,{G}_{p}({\mathit{X}}_{p})\phantom{\rule{0.2em}{0ex}}\\ \ge {C}_{\mathit{Y}}({G}_{1}({\mathit{X}}_{1}),\dots ,{G}_{p}({\mathit{X}}_{p}))=G(\mathit{X}).\end{array}$$

This means that *X*_{l}_{o}** Y**. The other part of the Theorem is proved similarly.

Proof. Let ** X** and

$$\mathbb{P}({\mathit{s}}^{T}\mathit{X}\le {\mathit{s}}^{T}\mathit{Y})=\Phi (-\frac{{\mathit{s}}^{T}(\mu -v)\phantom{\rule{0.2em}{0ex}}}{\sqrt{2{\mathit{s}}^{T}\sum \mathit{s}}}).$$

where is the DF of a standard normal RV. It follows that (s^{T}** X** ≤

$$\frac{{\mathit{s}}^{T}(v-\mu )}{\sqrt{{\mathit{s}}^{T}\sum \mathit{s}}}\le \sqrt{{(v-\mu )}^{T}{\sum}^{-1}(v-\mu )}$$

(7.10)

for all *s*. It is now easily verified that *s =* Σ^{−1}(** ν–μ**) maximizes the left hand side of (7.10).

Proof. Let *Q _{q},q =* 1, …,4 be the four quadrants. It is clear that maximizing (3.1) is equivalent to maximizing

$${\Psi}_{n,m}^{\prime}(\mathit{s})=\sum _{{\mathit{Z}}_{ij}\in {Q}_{2}}{\mathbb{I}}_{({\mathit{s}}^{T}{z}_{ij}\ge 0)}+\sum _{{\mathit{Z}}_{ij}\in {Q}_{4}}{\mathbb{I}}_{({\mathit{s}}^{T}{\mathit{Z}}_{ij}\ge 0)}$$

(7.11)

It is also clear that for any *s* the indictors I_{(stZij≥0)} are independent of the length of **Z**_{ij} which we therefore take to have length unity. Observe that the value of (7.11) is constant in the intervals (θ_{[}_{i}_{]}, θ_{[}_{i+}_{1]}) where θ_{[}_{i}_{]} are defined in Algorithm 3.1. At each point θ_{[}_{i}_{]}*, i =* 0, …,*M* + 1 the value of (7.11) may increase or decrease. It follows that for all
$\mathit{s}\in {S}_{+}^{p-1}{\Psi}_{n,m}^{\prime}(\mathit{s})\in \left\{{\Psi}_{n,m}^{\prime}({\mathit{s}}_{\left[0\right]}),\dots ,{\Psi}_{n,m}^{\prime}({\mathit{s}}_{\left[M+1\right]})\right\}$ where *s*_{[}_{i}_{]} are defined in Algorithm 3.1. Therefore the maximum value of (3.1) is an element of the above list. Now suppose that s_{[}_{i}_{]} is a global maximizer of (7.11). Clearly either
${\Psi}_{n,m}^{\prime}({\mathit{s}}_{\left[i\right]})={\Psi}_{n,m}^{\prime}({\mathit{s}}_{\left[i-1\right]})\phantom{\rule{0.2em}{0ex}}\text{or}{\Psi}_{n,m}^{\prime}({\mathit{s}}_{\left[i\right]})={\Psi}_{n,m}^{\prime}({\mathit{s}}_{\left[i+1\right]})$ must hold. In which case any value in [θ_{[}_{i}_{-1]},θ_{[}_{i}_{]}] or [θ_{[}_{i}_{]},θ_{[}_{i}_{-1]}] is a global maximizer. This concludes the proof.

Proof. Using Hajek's projection and for any *s* we may write

$${\Psi}_{n,m}(\mathit{s})=\Psi (\mathit{s})+{n}^{-1}\sum _{i-1}^{n}{\psi}_{1}({\mathit{X}}_{i,}S)+{m}^{-1}\sum _{j=1}^{m}{\psi}_{2}({\mathit{Y}}_{j,}\mathit{s})+{R}_{n.m}(\mathit{s})$$

(7.12)

where

$$\begin{array}{l}{\Psi}_{1}({\mathit{X}}_{i,}\mathit{s})=\overline{G}({\mathit{s}}^{T}{\mathit{X}}_{i})-\Psi (\mathit{s}),\\ {\Psi}_{2}({\mathit{X}}_{j,}\mathit{s})=F({\mathit{s}}^{T}{\mathit{Y}}_{j})-\Psi (\mathit{s}),\end{array}$$

and *R _{n,m}* (s) is a remainder term. Here (

$$\underset{\mathit{s}\in {S}_{+}^{p-1}}{sup}|{\Psi}_{n,m}(\mathit{s})-\psi (\mathit{s})|\le \underset{\mathit{s}\in {S}_{+}^{p-1}}{\text{sup}}|{n}^{-1}\sum _{i=1}^{n}{\psi}_{1}({\mathit{X}}_{i},\mathit{s})|+\underset{\mathit{s}\in {S}_{+}^{p-1}}{sup}|{m}^{-1}\sum _{j=1}^{m}{\psi}_{2}({\mathit{Y}}_{j},\mathit{s})|+\underset{\mathit{s}\in {S}_{+}^{p-1}}{sup}|Rn,m(\mathit{s})|.$$

The set
${S}_{+}^{p-1}$ is compact and the function *ψ*_{1}(*x,s*) is continuous in
$\mathit{s}\in {S}_{+}^{p-1}$ for all values of *x* and bounded (in fact *|ψ _{1}* (

Similarly
${\text{sup}}_{\mathit{s}\in {S}_{+}^{p-1}}\left|{m}^{-1}{{\sum}_{i=1}^{m}\Psi}_{2}({\mathit{Y}}_{i},\mathit{s})\right|\stackrel{a.\mathit{s}.}{\to}0\phantom{\rule{0.2em}{0ex}}\text{as}\phantom{\rule{0.2em}{0ex}}m\to \infty $. Since *Ψ _{n,m}*(s) is bounded all its moments exist; therefore from Theorem 5.3.3 in Serfling (1980) we have that with probability one that

$$\underset{\mathit{s}\in {S}_{+}^{p-1}}{sup}|{\Psi}_{n,m}(\mathit{s})-\Psi (\mathit{s})|\stackrel{a.\mathit{s}.}{\to}0\phantom{\rule{0.2em}{0ex}}a\mathit{s}\phantom{\rule{0.2em}{0ex}}n,m\to \infty .$$

By assumption Ψ(*s*_{max}) > Ψ(*s*) for all
$\mathit{s}\in {S}_{+}^{p-1}\backslash {\mathit{s}}_{max}$ so we can apply Theorem 2.12 in Kosorok (2008) to conclude that

$${\widehat{\mathit{s}}}_{max}\stackrel{a.\mathit{s}.}{\to}{\mathit{s}}_{max}$$

i.e., *Ŝ*_{max} is strongly consistent. This completes the first part of the proof.

Since the densities of ** X** and

$$\underset{\Vert \mathit{s}-{\mathit{s}}_{max}\Vert <\delta}{\text{sup}}\Psi (\mathit{s})-\Psi ({\mathit{s}}_{max})\le -C{\delta}^{2}.$$

It is also obvious that

$${n}^{-1}\sum _{i-1}^{n}{\Psi}_{1}({\mathit{X}}_{i},{\mathit{s}}_{max})={O}_{p}(1/\sqrt{N})\phantom{\rule{0.2em}{0ex}}\text{and}\phantom{\rule{0.2em}{0ex}}{m}^{-1}\sum _{j=1}^{m}{\Psi}_{2}({\mathit{Y}}_{j},{\mathit{s}}_{max})={O}_{p}(1/\sqrt{N}).$$

Finally as noted above |*R _{n}*

$${\widehat{\mathit{s}}}_{max}={\mathit{s}}_{max}+{O}_{p}({N}^{-1/2})$$

(7.13)

i.e., *ŝ*_{max} converges to *s*_{max} at a *N*^{1/2} rate. This completes the second part of the proof.

The functions Ψ(*s*), *ψ*_{1}(*X*_{i}, *s*) and *ψ _{2}*(

$${\Psi}_{n,m}(\mathit{s})=\Psi ({\mathit{s}}_{max})+{(\mathit{s}-{\mathit{s}}_{max})}^{T}\frac{{M}_{n,m}}{{N}^{1/2}}+\frac{1}{2}{(\mathit{s}-{\mathit{s}}_{max})}^{T}V(\mathit{s}-{\mathit{s}}_{max})+{o}_{p}(1/N)$$

(7.14)

where

$${M}_{n,m}=\frac{1}{\sqrt{{\lambda}_{n,m}}}\frac{{\sum}_{i=1}^{n}\nabla {\psi}_{1}({\mathit{X}}_{i},{\mathit{s}}_{max})}{{n}^{1/2}}+\frac{1}{\sqrt{1-{\lambda}_{n,m}}}\frac{{\sum}_{j=1}^{m}\nabla {\psi}_{2}({\mathit{Y}}_{j},{\mathit{s}}_{max})}{{m}^{1/2}}$$

λ* _{n,m}* = n/N, for

$$\mathrm{V}=\mathbb{E}({\nabla}^{2}{\Psi}_{1}(\mathrm{X},{\mathit{s}}_{max}))+\mathbb{E}({\nabla}^{2}{\Psi}_{2}(\mathrm{Y},{\mathit{s}}_{max})).$$

Note that the *o _{p}*(

Now by the CLT and Slutzky's Theorem we have that

$${M}_{n,m}\Rightarrow N(0,\Delta )$$

where

$$\Delta =\frac{1}{\lambda}\mathbb{E}\left(\nabla {\Psi}_{1}(\mathit{X},{\mathit{s}}_{max})\nabla {\Psi}_{1}{(\mathit{X},{\mathit{s}}_{max})}^{T}\right)+\frac{1}{1-\lambda}\mathbb{E}\left(\nabla {\Psi}_{2}(\mathit{Y},{\mathit{s}}_{max})\nabla {\Psi}_{2}{(\mathit{Y},{\mathit{s}}_{max})}^{T}\right)$$

Finally it follows by Theorem 2 in Sherman (1993) that

$${N}^{1/2}({\widehat{\mathit{s}}}_{max}-{\mathit{s}}_{max})\Rightarrow N(0,\sum )$$

where Σ = V^{−1}ΔV^{−1}, completing the proof.

Proof. Suppose that ** X** and

$$\Psi (\mathit{s})=\mathbb{P}({\mathit{s}}^{T}\mathit{X}\le {\mathit{s}}^{T}\mathit{Y})=\sum _{a=1}^{A}\sum _{b=1}^{B}{\mathbb{I}}_{(\mathit{s}\in {S}_{\mathit{\text{ab}}}){p}_{a}{q}_{b}}=\sum _{k=1}^{k}{\alpha}_{k}{\mathbb{I}}_{(\mathit{s}\in {S}_{k})}$$

where *K* ≤ 2^{AB} is finite, the sets *S** _{k}* are distinct,
${\cup}_{k=1}^{k}{S}_{k}={S}_{+}^{p-1}$ and

$${\Psi}_{n,m}(\mathit{s})=\frac{1}{\mathit{\text{nm}}}\sum _{i=1}^{n}\sum _{j=1}^{m}{\mathbb{I}}_{({\mathit{s}}^{T}{\mathit{X}}_{i}\le {\mathit{s}}^{T}{\mathit{Y}}_{j})}=\sum _{a=1}^{A}\sum _{b=1}^{B}\frac{{n}_{a}{m}_{b}}{\mathit{\text{nm}}}{\mathbb{I}}_{(\mathit{s}\in {S}_{\mathit{\text{ab}}})}$$

where
${n}_{a}={\sum}_{i-1}^{n}{\mathbb{I}}_{({\mathit{X}}_{i}={\mathit{X}}_{a})},{m}_{b}={\sum}_{j-1}^{m}{\mathbb{I}}_{({\mathit{Y}}_{i}={\mathit{Y}}_{b})}$ where *a =* 1, …,*A* and *b =* 1, …,*B*. Clearly Ψ* _{n;m}*(

$${\Psi}_{n,m}(\mathit{s})=\sum _{k=1}^{k}{\widehat{\alpha}}_{k}{\mathbb{I}}_{(\mathit{s}\in {S}_{k})}$$

where * _{k}* = Σ

$$\begin{array}{l}\mathbb{P}({\widehat{\mathit{s}}}_{max}\notin {S}_{max})=\mathbb{P}(arg\underset{1\le k\le K}{max}{\widehat{\alpha}}_{k}\ne 1)=\mathbb{P}({\cup}_{k=2}^{k}\left\{{\widehat{\alpha}}_{1}\le {\widehat{\alpha}}_{k}\right\})\\ \le \sum _{k=2}^{k}\mathbb{P}({\widehat{\alpha}}_{1}\le {\widehat{\alpha}}_{k})\le (K-1)\underset{2\le k\le K}{max}\mathbb{P}({\widehat{\alpha}}_{1}\le {\widehat{\alpha}}_{k}).\end{array}$$

(7.15)

A bit of rearranging shows that

$$\mathbb{P}({\widehat{\alpha}}_{1}\le {\widehat{\alpha}}_{k})=\mathbb{P}(\sum _{(a,b)\in {J}_{1}}{\widehat{p}}_{a}{\widehat{q}}_{b}-\sum _{(a,b)\in {J}_{k}}{\widehat{p}}_{a}{\widehat{q}}_{b}\le 0)=\mathbb{P}({n}^{-1}{m}^{-1}\sum _{i=1}^{n}\sum _{j=1}^{m}{\mathit{Z}}_{ij}^{(k)}\le 0)$$

where

$${\mathit{Z}}_{\mathit{\text{ij}}}^{(k)}=\sum _{(a,b)\in {J}_{1}\backslash \backslash {J}_{k}}{\mathbb{I}}_{({\mathit{X}}_{i}={\mathit{x}}_{a})}{\mathbb{I}}_{({\mathrm{Y}}_{\mathrm{j}}={\mathit{y}}_{b})}-\sum _{(a,b)\in {J}_{k}\backslash {J}_{1}}{\mathbb{I}}_{({\mathit{X}}_{i}={\mathit{x}}_{a})}{\mathbb{I}}_{({\mathit{Y}}_{j}={\mathit{y}}_{b}).}$$

Note that ${\mathit{Z}}_{\mathit{\text{ij}}}^{(k)}$ may be viewed as a kernel of a two sample U–Statistic. Moreover

$$-|{J}_{k}\backslash {J}_{1}|\le {\mathit{Z}}_{\mathit{\text{ij}}}^{(k)}\le |{J}_{k}\backslash {J}_{1}|$$

is bounded (here |•| denotes set cardinality) and $\mathbb{E}({\mathit{Z}}_{\mathit{\text{ij}}}^{(k)}\phantom{\rule{0.2em}{0ex}}={\mu}_{k}=\mathbb{E}({\widehat{\alpha}}_{1}-{\widehat{\alpha}}_{k})>0$ by assumption. Applying Theorem 2 and the derivations in Section 5b in Hoeffding (1963) we have that

$$\mathbb{P}({n}^{-1}{m}^{-1}\sum _{i=1}^{n}\sum _{j=1}^{m}{\mathit{Z}}_{\mathit{\text{ij}}}^{(k)}\le 0)\le \text{exp}\left(-\frac{N\text{min}({\lambda}_{n,m},1-{\lambda}_{n,m}){\mu}_{k}^{2}}{2{(|{J}_{k}\backslash {J}_{1}|+|{J}_{1}\backslash {J}_{k}|)}^{2}}\right)$$

(7.16)

where λ* _{n,m}* = n/N → λ (0,1) as

$$\mathbb{P}({\widehat{\mathit{s}}}_{max}\notin {S}_{max})\le {C}_{1}\text{exp}(-{C}_{2}N)$$

where *C _{1}= K* – 1 and
${C}_{2}=\text{min}(\lambda ,1-\lambda )\phantom{\rule{0.2em}{0ex}}\text{min}\phantom{\rule{0.2em}{0ex}}2{\mu}_{k}^{2}{(\left|{J}_{k}\backslash {J}_{1}\right|+\left|{J}_{1}\backslash {J}_{k}\right|)}^{-2}$ completing the proof.

Proof. Choose ε *>* 0. We have already seen that under the stated conditions Ψ(*s*) is continuous and therefore for each *s* the set *B*_{s,ε}*=* {*s′*: |Ψ(*s′*) – Ψ(*s*)| < ε} is open. The collection
$\left\{{B}_{\mathit{s},\varepsilon}:\mathit{s}\in {S}_{+}^{p-1}\right\}$ is an open cover for
${S}_{+}^{p-1}$. Since
${S}_{+}^{p-1}$ is compact there exists a finite subcover *B _{s1,}*

$$\begin{array}{l}\underset{\mathit{s}\in {S}_{+}^{p-1}}{\text{sup}}|{\Psi}_{N}(\mathit{s})-\Psi (\mathit{s})|\le \underset{1\le i\le K}{max}\underset{\mathit{s}\in {B}_{\mathit{s}i,\varepsilon}}{\text{sup}}|{\Psi}_{N}(\mathit{s})-{\Psi}_{N}({\mathit{s}}_{i})|\\ +\underset{1\le i\le K}{max}|{\Psi}_{N}({\mathit{s}}_{i})-\Psi ({\mathit{s}}_{i})|+\underset{1\le i\le K}{max}\underset{\mathit{s}\in {B}_{\mathit{s}i,\varepsilon}}{\text{sup}}|\Psi ({\mathit{s}}_{i})-\Psi (\mathit{s})|.\end{array}$$

By construction sup_{s}_{}* _{Bsi}* |Ψ(

$$\underset{1\le i\le K}{max}|{\Psi}_{N}({\mathit{s}}_{i})-\Psi ({\mathit{s}}_{i})|\le \sum _{i=1}^{k}|{\Psi}_{N}({\mathit{s}}_{i})-\Psi ({\mathit{s}}_{i})|\stackrel{a.\mathit{s}.}{\to}0.$$

Now for any
$\mathit{s}\prime \in {B}_{{\mathit{s}}_{i},\varepsilon},{\Psi}_{N}(\mathit{s}\prime )\stackrel{a.\mathit{s}.}{\to}\left|\Psi (\mathit{s}\prime ),{\Psi}_{N}({\mathit{s}}_{i})\stackrel{a.\mathit{s}.}{\to}\Psi ({\mathit{s}}_{i})\phantom{\rule{0.2em}{0ex}}\text{and}\right|\Psi (\mathit{s}\prime )-\Psi ({\mathit{s}}_{i}\phantom{\rule{0.2em}{0ex}})|<\varepsilon $. This implies that we can choose *N* large enough so | Ψ_{N}(*s′*) – Ψ_{N}(*s _{i}*)| < 2ε. Moreover this bound holds for all
$\mathit{s}\prime \in {B}_{{\mathit{s}}_{i},\varepsilon}$ and

$$limsup|{\Psi}_{N}(\mathit{s})-\Psi (\mathit{s})|\le 3\varepsilon $$

on
${S}_{+}^{p-1}$. Since ε is arbitrary we conclude that
${sup}_{\mathit{s}\in {S}_{+}^{p-1}}\left|{\Psi}_{N}(\mathit{s})-\Psi (\mathit{s})\right|\stackrel{a.\mathit{s}.}{\to}0$
*N*→∞. By assumption Ψ(*s*_{max}) > Ψ(*s*) for all
$\mathit{s}\in {S}_{+}^{p-1}\backslash {\mathit{s}}_{max}$ so we can apply Theorem 2.12 in Kosorok (2008) to conclude that

$${\widehat{\mathit{s}}}_{max}\stackrel{a.\mathit{s}.}{\to}{\mathit{s}}_{max}$$

i.e., *ŝ*_{max} is strongly consistent. This completes the first part of the proof. We have already seen that

$$\underset{\Vert \mathit{s}-{\mathit{s}}_{max}\Vert <\delta}{\text{sup}}\Psi (\mathit{s})-\Psi ({\mathit{s}}_{max})\le -C{\delta}^{2}$$

holds. We now need to bound
${\mathbb{E}}^{\ast}{\text{sup}}_{\Vert \mathit{s}-{\mathit{s}}_{max}\Vert <\delta}{N}^{1/2}\left|({\mathbb{P}}_{N}(\overline{\Psi}(\mathit{s})-\overline{\Psi}({\mathit{s}}_{max}))\phantom{\rule{0.2em}{0ex}}\right|$, where E* denotes the outer expectation and (*s*)*=*I_{(sTZ≥0)} – Ψ(* s*). We first note that the bracketing entropy of the upper half–planes is of the order

$$\mathbb{P}({\mathit{s}}^{T}Z\ge 0>{\mathit{s}}_{max}^{T}Z)+\mathbb{P}({\mathit{s}}_{max}^{T}Z\ge 0>{\mathit{s}}^{T}Z).$$

(7.17)

Note that we may replace the RV **Z** in (7.17) with the RV ** Z′** =

$${\mathbb{E}}^{\ast}\underset{\Vert \mathit{s}-{\mathit{s}}_{max}\Vert <\delta}{\text{sup}}{N}^{1/2}|{\mathbb{P}}_{N}(\overline{\Psi}(\mathit{s})-\overline{\Psi}({\mathit{s}}_{max}))|\le C{\delta}^{1/2}.$$

It now follows that

$${\overline{\Psi}}_{N}({\widehat{\mathit{s}}}_{max})\ge \underset{\mathit{s}\in {S}_{+}^{p-1}}{\text{sup}}{\overline{\Psi}}_{N}(\mathit{s})-{o}_{p}({N}^{-2/3})$$

(7.18)

which implies by Theorem 5.52 in van der Vaart (2000) and Theorem 14.4 of Kosorok (2008) that

$${\widehat{\mathit{s}}}_{max}={\mathit{s}}_{max}+{O}_{p}\left({N}^{-1/3}\right)$$

i.e., *ŝ*_{max} converges to *s*_{max} at a cube root rate. This completes the second part of the proof. The limit distribution is derived by verifying the conditions in Theorem 1.1 of Kim and Pollard (1990), denoted henceforth by KP. First note that (7.18) is condition (*i*) in KP. Since *ŝ*_{max} is consistent condition (*ii*) also holds and condition (*iii*) holds by assumption. The differentiability of the density of **Z** implies that Ψ(*s*) is twice differentiable. The uniqueness of the maximizer implies that –^{2}Ψ(*s*_{max}) is positive definite hence condition (*iv*) holds (see also example 6.4 in KP for related calculations). Condition (*v*) in KP is equivalent to the existence of the limit *H*(*u,v*)*=* lim_{α→∞} αE ((*s*_{max} +*u/*α) (*s*_{max} +*v/* α)) which can be rewritten as

$$\underset{\beta \to 0}{lim}\frac{1}{\beta}\left[\mathbb{P}{({\mathit{s}}_{max}+\beta u)}^{T}\mathit{Z}\ge 0,{({\mathit{s}}_{max}+\beta v)}^{T}\mathit{Z}\ge )-\mathbb{P}({({\mathit{s}}_{max}+\beta u)}^{T}\mathit{Z}\ge 0)\mathbb{P}({({\mathit{s}}_{max}+\beta v)}^{T}\mathit{Z}\ge 0)\right].$$

With some algebra we find that this limit exists and equals

$${\int}_{z\in {S}_{+}^{p-1}}\delta ({\mathit{s}}_{max}^{T}z)({z}^{T}(u+v))h(z)\mathit{\text{dz}}-{\int}_{z\in {S}_{+}^{p-1}}\delta ({\mathit{s}}_{max}^{T}z)({z}^{T}u)h(z)\mathit{\text{dz}}{\int}_{z\in {S}_{+}^{p-1}}\delta ({\mathit{s}}_{max}^{T}z)({z}^{T}v)h(z)\mathit{\text{dz}}$$

where
$\delta ({\mathit{s}}_{max}^{T}z)$ is the usual Dirac function; hence integration is with respect to the surface measure on
$\left\{{\mathit{s}}_{max}^{T}z=0\right\}$. It follows that condition also (*v*) holds. Conditions (*vi*) – (*vii*) were verified in the second part of the proof. Thus we may apply Theorem 1.1 in KP to get

$${N}^{1/3}({\widehat{\mathit{s}}}_{max}-{\mathit{s}}_{max})\Rightarrow \text{arg max}\left\{-Q(\mathit{s})+\mathbb{W}(\mathit{s}\phantom{\rule{0.2em}{0ex}}:\mathit{s}\in {S}_{+}^{p-1}\right\}$$

where by KP *Q*(* s*)=

Proof. Note that

$$\Psi (\mathit{s})=\mathbb{P}({\mathit{s}}^{T}\mathit{X}\le {\mathit{s}}^{T}\mathit{Y})=\mathbb{P}({\mathit{s}}^{T}\mathit{Z}\ge 0)=\mathbb{P}(\frac{{\mathit{s}}^{T}z-{\mathit{s}}^{T}\delta}{\sqrt{{\mathit{s}}^{T}\sum \mathit{s}}}\ge -\frac{{\mathit{s}}^{T}\delta}{\sqrt{{\mathit{s}}^{T}\sum \mathit{s}}})=1-F(-\frac{{\mathit{s}}^{T}\delta}{\sqrt{{\mathit{s}}^{T}\sum \mathit{s}}}).$$

Now, by assumption the DF *F* is independent of * s*. Therefore Ψ(

$$\chi (\mathit{s})=\frac{{\mathit{s}}^{T}\delta}{\sqrt{{\mathit{s}}^{T}\sum \mathit{s}}}$$

is uniquely maximized on
${S}_{+}^{p-1}$. If Σ = * I* then

$${\mathit{s}}^{\ast}=({\delta}_{1}{\mathbb{I}}_{({\delta}_{1}\ge 0)},\dots ,{\delta}_{p}{\mathbb{I}}_{({\delta}_{p}\ge 0))}.$$

Now let Σ ≠ ** I** and assume that a unique maximizer does not exist, i.e., suppose that

Proof. If *X**= _{st}*

Proof. The functions *ψ*_{1} and *ψ*_{2} defined in the proof of Theorem 3.1 are Donsker (cf. example 19.7 in van der Vaart 2000). Hence by the theory of empirical processes applied to (7.12) we find that

$${N}^{1/2}({\Psi}_{n,m}(\mathit{s})-\Psi (\mathit{s}))\Rightarrow \mathbb{G}(\mathit{s})$$

(7.19)

where G(** s**) is a zero mean Gaussian process and convergence holds for all
$\mathit{s}\phantom{\rule{0.2em}{0ex}}\in \phantom{\rule{0.2em}{0ex}}{S}_{+}^{\mathit{p}-1}$. We also note that (7.19) is a two sample U-Processes. A central limit theorem for such processes is described by Neumeyer (2004). Hence by the continuos mapping theorem, and under

$$\frac{1}{\lambda}\mathbb{P}({\mathit{u}}^{T}{\mathit{X}}_{1}\le {\mathit{u}}^{T}{\mathit{X}}_{2},{v}^{T}{\mathit{X}}_{1}\le {v}^{T}{\mathit{X}}_{3})+\frac{1}{1-\lambda}\mathbb{P}({\mathit{u}}^{T}{\mathit{X}}_{1}\le {\mathit{u}}^{T}{\mathit{X}}_{2},{v}^{T}{\mathit{X}}_{3}\le {v}^{T}{\mathit{X}}_{2})-\frac{1}{4\lambda (1-\lambda )}$$

(7.20)

where ** X_{1}**,

**Proof.** Suppose that *X*_{l-st}** Y**. Then for some
${\mathit{s}}_{\ast}\phantom{\rule{0.2em}{0ex}}\in \phantom{\rule{0.2em}{0ex}}{S}_{+}^{\mathit{p}-1}$ we have
${\mathit{s}}_{\ast}^{T}\mathit{X}\phantom{\rule{0.2em}{0ex}}{\underset{\_}{\prec}}_{\mathit{s}t}\phantom{\rule{0.2em}{0ex}}{\mathit{s}}_{\ast}^{T}\mathit{Y}$ which implies that
$\mathbb{P}({\mathit{s}}_{\ast}^{T}\mathit{X}\phantom{\rule{0.2em}{0ex}}\le \phantom{\rule{0.2em}{0ex}}{\mathit{s}}_{\ast}^{T}\mathit{Y})\phantom{\rule{0.2em}{0ex}}>\phantom{\rule{0.2em}{0ex}}1/2.$ By definition
$\mathbb{P}({\mathit{s}}_{max}^{T}\mathit{X}\phantom{\rule{0.2em}{0ex}}\le \phantom{\rule{0.2em}{0ex}}{\mathit{s}}_{max}^{T}\mathit{Y})\phantom{\rule{0.2em}{0ex}}\ge \phantom{\rule{0.2em}{0ex}}\mathbb{P}({\mathit{s}}_{\ast}^{T}\mathit{X}\phantom{\rule{0.2em}{0ex}}\le \phantom{\rule{0.2em}{0ex}}{\mathit{s}}_{\ast}^{T}\mathit{Y})$ so Ψ(

$${S}_{n,m}={N}^{1/2}({\Psi}_{n,m}({\widehat{\mathit{s}}}_{max})-1/2)\stackrel{a.\mathit{s}.}{\to}\infty \phantom{\rule{0.2em}{0ex}}\text{as}\phantom{\rule{0.2em}{0ex}}n,m\to \infty .$$

Therefore by Slutzky's theorem

$$\mathbb{P}({S}_{n,m}>{q}_{n,m,1-\alpha};{H}_{1})\to 1\phantom{\rule{0.2em}{0ex}}\text{as}\phantom{\rule{0.2em}{0ex}}n,m\to \infty ,$$

where *q _{n,m,1−}*

Now assume that *X*_{l-st}*Y*_{l-st}** Z** so that

$$\mathbb{P}({U}_{j}\ge k)=\mathbb{P}({\mathit{s}}^{T}{\mathit{Y}}_{j}\ge {\mathit{s}}^{T}{\mathit{X}}_{k})\le \mathbb{P}({\mathit{s}}^{T}{\mathit{Z}}_{j}{\mathit{s}}^{T}{\mathit{X}}_{k})=\mathbb{P}({V}_{j}\ge k)$$

where we use the fact that *s*^{T}*Y** _{st}* s

$$\sum _{i=1}^{n}\sum _{j=1}^{m}{\mathbb{I}}_{({\mathit{s}}^{T}{\mathit{X}}_{i}\le {\mathit{s}}^{T}{\mathit{Y}}_{j})}{\underset{\_}{\prec}}_{\mathit{\text{st}}}\sum _{i=1}^{n}\sum _{j=1}^{m}{\mathbb{I}}_{({\mathit{s}}^{T}{\mathit{X}}_{i}\le {\mathit{s}}^{T}{\mathit{Z}}_{j}).}$$

It follows that $\Psi \underset{n,m}{\mathit{X},\mathit{Y}}(\mathit{s}){\underset{\_}{\prec}}_{\mathit{\text{st}}}\Psi \underset{n,m}{\mathit{X},\mathit{Z}}(\mathit{s})$ for all $\mathit{s}\phantom{\rule{0.2em}{0ex}}\in \phantom{\rule{0.2em}{0ex}}{S}_{+}^{\mathit{p}-1}$ Where $\Psi \underset{n,m}{\mathit{X},\mathit{Y}}(\mathit{s})$ and $\Psi \underset{n,m}{\mathit{X},\mathit{Y}}(\mathit{s})$ are defined in (3.1) and the superscripts emphasize the different arguments used to evaluate them. Thus,

$$\Psi \underset{n,m}{\mathit{X},\mathit{Y}}(\widehat{\mathit{s}}\underset{max}{\mathit{X},\mathit{Y}}){\underset{\_}{\prec}}_{\mathit{\text{st}}}\Psi \underset{n,m}{\mathit{X},\mathit{Z}}(\widehat{\mathit{s}}\underset{max}{\mathit{X},\mathit{Y}})\underset{\_}{\prec}\Psi \underset{n,m}{\mathit{X},\mathit{Z}}(\widehat{\mathit{s}}\underset{max}{\mathit{X},\mathit{Z}})$$

and as a consequence
$\mathbb{P}({S}_{n,m}^{\mathit{X},\mathit{Y}}\phantom{\rule{0.2em}{0ex}}>\phantom{\rule{0.2em}{0ex}}{q}_{n,m,1-\alpha})\phantom{\rule{0.2em}{0ex}}\le \phantom{\rule{0.2em}{0ex}}\mathbb{P}({S}_{n,m}^{\mathit{X},\mathit{Y}}\phantom{\rule{0.2em}{0ex}}>\phantom{\rule{0.2em}{0ex}}{q}_{n,m,1-\alpha})$ as required. The monotonicity of the power function of *I _{n,m}* and
${I}_{n,m}^{+}$ follows immediately from the fact that
$\Psi \underset{n,m}{\mathit{X},\mathit{Y}}(\mathit{s}){\underset{\_}{\prec}}_{\mathit{\text{st}}}\Psi \underset{n,m}{\mathit{X},\mathit{Z}}(\mathit{s})$ for all
$\mathit{s}\phantom{\rule{0.2em}{0ex}}\in \phantom{\rule{0.2em}{0ex}}{S}_{+}^{\mathit{p}-1}$.

The research of Ori Davidov was partially supported by the Israeli Science Foundation Grant No 875/09 and was conducted in part when visiting Shyamal Das Peddada at the NIEHS. This research [in part] was supported by the Intramural Research Program of the NIH, National Institute of Environmental Health Sciences (Z01 ES101744-04). We thank Grace Kissling (NIEHS), Alexander Goldenshluger, Yair Goldberg and Danny Segev (University of Haifa), for their useful comments and suggestions.

Ori Davidov, Department of Statistics, University of Haifa, Mount Carmel, Haifa 31905 Israel.

Shyamal Peddada, Biostatistics Branch, National Institute of Environmental Health Sciences, Alexander, Drive, RTP, NC 27709.

- Abrevaya J, Haung J. On the Bootstrap of the maximum score estimator. Econometrica. 2005;73:1175–1204.
- Arcones MA, Kvam PH, Samaniego FJ. Nonparametric estimation of a distribution subject to a stochastic precedence constraint. Journal of the American Statistical Association. 2002;97:170–182.
- Audet C, Bechard V, Le Digabel S. Nonsmooth optimization through mesh adaptive direct search and variable neighborhood search. Journal of Global Optimization. 2008;41:299–318.
- Bekele BN, Thall PF. Dose-finding based on multiple toxicities in a soft tissue sarcoma trial. Journal of the American Statistical Association. 2004;99:26–35.
- Bickel P, Sackov A. On the choice of
*m*in the*m*out of*n*bootstrap and confidence bounds for extrema. Statistica Sinica. 2008;18:967–985. - DasGupta A. Asymptotic Theory of Statistics and Probability. Springer; 2008.
- Davey BA, Priestley HA. Introduction to Lattices and Order. Cambridge University Press; 2002.
- Davidov O, Herman A. Multivariate stochastic orders induced by case-control sampling. Methodology and Computing in Applied Probability. 2011;13:139–154.
- Davidov O, Peddada SD. Order restricted inference for multivariate binary data with application to toxicology. Journal of the American Statistical Association. 2011;106:1394–1404. [PMC free article] [PubMed]
- Davidov O. Ordered inference, rank statistics and combining p-values: a new perspective. Statistical Methodology. 2012;9:456–465.
- Davidov O, Herman A. Ordinal dominance curve based inference for stochastically ordered distributions. Journal of the Royal Statistical Society, Series B. 2012;74:825–847.
- Delagdo MA, Rodriguez-Poo JM, Wolf M. Subsampling inference in cube root asymptotics with an application to Manski's maximum score estimator. Economics Letters. 2001;73:241–250.
- Ding Y, Zhang X. Some stochastic orders for Kotz type distributions. Statistics and Probability Letters. 2004;69:389–396.
- Fang KT, Kots S, Ng KW. Symmetric Multivariate and Related Distributions. Chapman and Hall; 1989.
- Fisher NI, Hall P. Bootstrap confidence regions for directional data. Journal of the American Statistical Association. 1989;84:996–1002.
- Hajek J, Sidak Z, Sen PK. Theory of Rank Tests. Academic Press; 1999.
- Hoeffding W. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association. 1963;58:13–30.
- Hu J, Homem-de-Mello T, Mehrotra S. Concepts and applications of stochastically weighted stochastic dominance. 2011 Unpublished manuscript, see http://www.optimization-online.org/DB_FILE/2011/04/2981.pdf.
- Ivanova A, Murphy M. An adaptive first in man dose-escalation study of NGX267: statistical, clinical, and operational considerations. Journal of Biopharma-ceutical Statistics. 2009;19:247–255. [PubMed]
- Joe H. Multivariate Models and Dependence Concepts. Chapman and Hall; 1997.
- Johnson R, Wichern D. Applied Multivariate Statistical Analysis. Prentice Hall; 1998.
- Kim J, Pollard D. Cube root asymptotics. The Annals of Statistics. 1990;18:191–219.
- Kosorok MR. Introduction to Empirical Processes and Semiparametric Inference. Springer; 2008.
- Lee S. On a class of
*m*out of*n*bootstrap confidence intervals. Journal of the Royal Statistical Society, Series B. 1999;61:901–911. - Lucas LA, Wright FT. Testing for and against stochastic ordering between multivariate multinomial populations. Journal of Multivariate Analysis. 1991;38:167–186.
- Moser V. Observational batteries in neurotoxicity testing. International Journal of Toxicology. 2000;19:407–411.
- Neumayer N. A central limit theorem for two sample U-processes. Statistics and Probability Letters. 2004;67:73–85.
- NTP. NTP toxicology and carcinogenesiss studies of citral (microencapsulated) (CAS No 5392-40-5) in F344/N rats and B6C3F1 mice (feed studies) 2003 [PubMed]
- Peddada SD. A short note on the Pitman measure of nearness. American Statistician. 1985;39:298–299.
- Peddada SD, Chang T. Bootstrap confidence region estimation of the motion of rigid bodies. Journal of the American Statistical Association. 1996;91:231–241.
- Pitman EJG. The closest estimates of statistical parameters. Proceedings of the Cambridge Philosophical Society. 1937;33:212–222.
- Price CJ, Reale M, Robertson BL. A direct search method for smooth and non–smooth unconstrained optimization problems. Australian & New Zealand Industrial and Applied Mathematics Journal. 2008;48:927–948.
- Roy SN. On a heuristic method of test construction and its use in multivariate analysis. Annals of Mathematical Statistics. 1953;24:220–238.
- Sampson AR, Whitaker LR. Estimation of multivariate distributions under stochastic ordering. Journal of the American Statistical Association. 1989;84:541–548.
- Sen B, Banerjee M, Woodroofe M. Inconsistency of bootstrap: The Grenander estimator. The Annals of Statistics. 2010;38:1953–1977.
- Serfling RJ. Approximation Theorem of Mathematical Statistics. Wiley; 1980.
- Shaked M, Shanthikumar JG. Stochastic Orders. Springer; 2007.
- Sherman RP. The limiting distribution of the maximum rank correlation estimator. Econometrica. 1993;61:123–137.
- Silvapulle MJ, Sen PK. Constrained Statistical Inference. Wiley & Sons; 2005.
- van der Vaart AW. Asymptotic Statistics. Cambridge University Press; 2000.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |