Results are displayed in Tables to . As can be seen, the new methods for tests of *H*_{0,C }controls type I error rates quite well. The power of the new methods are always higher than or very close to that of the methods for tests of *H*_{0,A }(Wang-Allison tests) and are higher than that of the methods for tests of *H*_{0,B }(Wilcoxon-Mann-Whitney tests and permutation tests for observations above the threshold *τ*) in some of the simulations.

| **Table 1**Performance (type 1 error rates) of the tests in simulation 1 under *H*_{0,C }(i.e., both *H*_{0,A }and *H*_{0,B }are true) and yet *f *(*Y*|*Y *≤ *τ *∩ *X *= 1) is radically different from *f *(*Y*|*Y *≤ *τ *∩ *X *= 0) (see Figure (more ...) |

| **Table 5**Performance of the tests in simulation 5, *H*_{0,B }is false, *H*_{0,A }is false and *f *(*Y*|*X *= 1) = 1.2*f *(*Y*|*X *= 0) (see Figure for details of simulation). |

Table shows the type I error rate of the tests (in simulation 1) when the null hypothesis *H*_{0,C }is true (i.e., both *H*_{0,A }and *H*_{0,B }are true) and yet *f *(*Y*|*Y *≤ *τ *∩ *X *= 1) is radically differentfrom *f *(*Y*|*Y *≤ *τ *∩ *X *= 0). The type I error rates of the new methods are comparable to those of the methods for tests of *H*_{0,A }and those of the methods for tests of *H*_{0,B }. It is note worthy that there is a slight but fairly consistent excess of type I errors when the sample 90^{th }percentile is used rather than a fixed cutoff point. This is because the sample 90^{th }percentile is a random variable and when it falls below its population level, the null hypothesis is no longer strictly true in our simulations. That is, the tests remain valid tests of differences in distributions above the actual value used but should not be strictly interpreted as tests of differences in distributions above the 90^{th }(or any other percentile). In practice, this distinction is probably trivial.

In simulation 2 (see Table ), where *H*_{0,A }is true, *H*_{0,B }is false and *f *(*Y*|*Y *≤ *τ *∩ *X *= 1) is radically different from *f *(*Y*|*Y *≤ *τ *∩ *X *= 0), the new methods for tests of *H*_{0,C }and the methods for tests of *H*_{0,A }have lower power than that of the corresponding methods for tests of *H*_{0,B}, however, the new methods for tests of *H*_{0,C }can slightly improve the power compared to the methods for tests of *H*_{0,A}.

| **Table 2**Performance of the tests in simulation 2, *H*_{0,A }is true, *H*_{0,B }is false and *f *(*Y*|*Y *≤ *τ *∩ *X *= 1) is radically different from *f *(*Y*|*Y *≤ *τ *∩ *X *= 0) (see Figure for details of simulation). |

Table shows the power of the tests in Simulation 3, where *H*_{0,B }is true, *H*_{0,A }is false and *f *(*Y*|*Y *≤ *τ *∩ *X *= 1) is radically different from *f *(*Y*|*Y *≤ *τ *∩ *X *= 0). The new methods for tests of *H*_{0,C }and the methods for tests of *H*_{0,A }have very similar power which is much higher than that of the corresponding methods for tests of *H*_{0,B}.

| **Table 3**Performance of the tests in simulation 3, *H*_{0,B }is true, *H*_{0,A }is false and *f *(*Y*|*Y *≤ *τ *∩ *X *= 1) is radically different from *f *(*Y*|*Y *≤ *τ *∩ *X *= 0) (see Figure for details of simulation). |

From simulation 4 (see Table ), where *H*_{0,B }is false, *H*_{0,A }is false and *f *(*Y*|*Y *≤ *τ *∩ *X *= 1) and *f *(*Y*|*Y *≤ *τ *∩ *X *= 0) are identical, we can find that the new methods for tests of *H*_{0,C }always have higher power than the corresponding methods for tests of *H*_{0,A}. When *τ *being set to the 90th percentile of the sample, the new methods also have higher power than the corresponding methods for tests of *H*_{0,B}.

| **Table 4**Performance of the tests in simulation 4, *H*_{0,B }is false, *H*_{0,A }is false and *f *(*Y*|*Y *≤ *τ *∩ *X *= 1) and *f *(*Y*|*Y *≤ *τ *∩ *X *= 0) are identical (see Figure for details of simulation). |

Finally, we conducted a set of simulations under what we perceived to be the most realistic situations. Here both *H*_{0,A }and *H*_{0,B }are false, *f *(*Y*|*Y *≤ *τ *∩ *X *= 1) is quite different from *f *(*Y*|*Y *≤ *τ *∩ *X *= 0), and the distributions have no discontinuities. In other words, there is just a simple reduction in the hazard rate when X = 1. Table presents the power of the tests in Simulation 5, where *f *(*Y*|*X *= 1) = 1.2*f *(*Y*|*X *= 0). In this simulation, the methods for tests of *H*_{0,B }almost have no power because the control group always has no or few observations above the threshold *τ *. The new methods for tests of *H*_{0,C}, when using a permutation test, have power higher than or equal to that of the methods for tests of *H*_{0,A}.

Illustration with real data

To illustrate the methods, we applied them to two real datasets. In both of these datasets, prior research had shown differences in overall survival rate and we tested for differences in 'maximum lifespan' herein. The first was a subset of data reported by Vasselli et al [

10]. The subset of the data consists of two groups of Sprague-Dawley rats, those kept on a high-fat diet ad libitum throughout life and becoming obese (EO-HF) and those kept on a high-fat diet ad libitum until early-middle adulthood, becoming obese, and subsequently reduced to normal weight via caloric restriction, but on the same high-fat diet (WL-HF). Each group had 49 rats (see Figure for the histograms for the data). The second dataset was from a study comparing the lifespan of Agouti-related protein-deficient (AgRP(-/-)) mice to wildtype mice (+/+) as reported by Redmann & Argyropoulos [

14]. This dataset consists of 16 mice with genotype '+/+' and 21 mice with genotype '-/-' (see Figure for the histograms for this dataset). From Figure , we can see the upper tails of the histograms of the two groups are different. Similar results can be found in Figure .

Results (p values of tests) are shown in Table . As can be seen, when setting

*τ *equal to 110 (100) for the first (second) datasets, both the methods for tests of

*H*_{0,A }and the new methods for tests of

*H*_{0,C }can detect the differences in 'maximum lifespan' between groups at nominal alpha levels of 0.01 (0.05) for the first (second) datasets. But the methods for tests of

*H*_{0,B }cannot detect the difference for all different values of

*τ *. The following description may provide some explanation to these results. For the first dataset, when set

*τ *= 110, the proportions of the observations greater than

*τ *in the EO-HF group and WL-HF group (i.e., estimations of

*P*(

*Y *>

*τ *|

*X *= 0) and

*P*(

*Y *>

*τ *|

*X *= 1)) are 0.061 and 0.306, respectively. These two proportions are significantly different and not surprisingly, the methods for tests of

*H*_{0,A }can detect the difference in 'maximum lifespan' between the two groups. Second, the sample means of the observations greater than

*τ *in the two groups (i.e., estimations of

*μ *(

*Y *|

*Y *>

*τ *∩

*X *= 1) and

*μ *(

*Y *|

*Y *>

*τ *∩

*X *= 0)) are 117.8 and 122.9, respectively, and there is no much difference between these sample means. However the sample means of the Z-values in the two group (i.e., the estimations of

*P*(

*Z *|

*X *= 0) and

*P*(

*Z *|

*X *= 1)) are 7.210 and 37.633, respectively, and are

*greatly *different, where,

*Z*_{i } *I*(

*Y*_{i }>

*τ*)

*Y*_{i}. These may explain that the methods for tests of

*H*_{0,B }cannot reject the null but the new methods for tests of

*H*_{0,C }can detect the difference in 'maximum lifespan' between the two groups. Similarly, for the second dataset, when set

*τ *= 100, the proportions of the observations greater than

*τ *in the group with genotype '+/+' and group with genotype '-/-' are 0.188 and 0.571, respectively. The sample means of the observations greater than

*τ *in the two groups are 109.3 and 110.9, respectively. The sample means of the Z-values in the two groups are 20.5 and 63.4 respectively.

| **Table 6**Results (p values of tests) of application to two real datasets. |

From Table we can also see that in almost all situations the p-values of the new methods for tests of *H*_{0,C }are somewhat smaller than those of the methods for tests of *H*_{0,A}. This is consistent with the simulations showing greater power of the new methods.