Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2896056

Formats

Article sections

- Abstract
- 1 Introduction
- 2 The two-stage, Bayesian hierarchical model
- 3 Ranking
- 4 Upper 100(1 − γ)% loss functions
- 5 Optimizers and other candidate ranks
- 6 Relations among estimators
- 7 Performance evaluations and comparisons
- 8 Simulation scenarios
- 9 Simulation results
- 10 Analysis of USRDS standardized mortality ratios
- 11 Discussion
- References

Authors

Related links

Bayesian Anal. Author manuscript; available in PMC 2010 July 2.

Published in final edited form as:

Bayesian Anal. 2006 January 1; 1(4): 915–946.

doi: 10.1214/06-BA130PMCID: PMC2896056

NIHMSID: NIHMS92308

See other articles in PMC that cite the published article.

Performance evaluations of health services providers burgeons. Similarly, analyzing spatially related health information, ranking teachers and schools, and identification of differentially expressed genes are increasing in prevalence and importance. Goals include valid and efficient ranking of units for profiling and league tables, identification of excellent and poor performers, the most differentially expressed genes, and determining “exceedances” (how many and which unit-specific true parameters exceed a threshold). These data and inferential goals require a hierarchical, Bayesian model that accounts for nesting relations and identifies both population values and random effects for unit-specific parameters. Furthermore, the Bayesian approach coupled with optimizing a loss function provides a framework for computing non-standard inferences such as ranks and histograms.

Estimated ranks that minimize Squared Error Loss (SEL) between the true and estimated ranks have been investigated. The posterior mean ranks minimize SEL and are “general purpose,” relevant to a broad spectrum of ranking goals. However, other loss functions and optimizing ranks that are tuned to application-specific goals require identification and evaluation. For example, when the goal is to identify the relatively good (e.g., in the upper 10%) or relatively poor performers, a loss function that penalizes classification errors produces estimates that minimize the error rate. We construct loss functions that address this and other goals, developing a unified framework that facilitates generating candidate estimates, comparing approaches and producing data analytic performance summaries. We compare performance for a fully parametric, hierarchical model with Gaussian sampling distribution under Gaussian and a mixture of Gaussians prior distributions. We illustrate approaches via analysis of standardized mortality ratio data from the United States Renal Data System.

Results show that SEL-optimal ranks perform well over a broad class of loss functions but can be improved upon when classifying units above or below a percentile cut-point. Importantly, even optimal rank estimates can perform poorly in many real-world settings; therefore, data-analytic performance summaries should always be reported.

Performance evaluation burgeons in many areas including health services (Goldstein and Spiegelhalter 1996; Christiansen and Morris 1997; Normand et al. 1997; McClellan and Staiger 1999; Landrum et al. 2000, 2003; Daniels and Normand 2006; Austin and Tu 2006), drug evaluation (DuMouchel 1999), disease mapping (Devine and Louis 1994; Devine et al. 1994; Conlon and Louis 1999; Wright et al. 2003; Diggle et al. 2006), and education (Lockwood et al. 2002; Draper and Gittoes 2004; McCaffrey et al. 2004; Rubin et al. 2004; Tekwe et al. 2004; Noell and Burns 2006). Goals of such investigations include valid and efficient estimation of population parameters such as average performance (over clinics, physicians, health service regions or other “units of analysis”), estimation of between-unit variation (variance components) and unit-specific evaluations. The latter includes estimating unit specific performance, computing the probability that a unit's true, underlying performance is in a specific region, ranking units for use in profiling and league tables (Goldstein and Spiegelhalter 1996), identification of excellent and poor performers.

Bayesian models coupled with optimizing a loss function provide a framework for computing non-standard inferences such as ranks and histograms and producing data-analytic performance assessments. Inferences depend on the posterior distribution, and how the posterior is used should depend on inferential goals. Gelman and Price (1999) showed that no single set of estimates can simultaneously optimize loss functions targeting the unit-specific parameters (e.g, unit-specific means, optimized by the posterior mean) and those targeting the ranks of these parameters. For example, as Shen and Louis (1998) and Liu et al. (2004) showed, ranking the unit-specific maximum likelihood estimates (MLEs) performs poorly as does ranking Z-scores for testing whether a unit's mean equals the population mean. In some situations, ranking the posterior means of unit-specific parameters can perform well, but in general an optimal approach to estimate ranks is needed.

In the Shen and Louis (1998) approach, SEL operates on the difference between the estimated and true ranks. But, in many applications interest focuses on identifying the relatively good (e.g., in the upper 10%) or relatively poor performers, a down/up classification. For example, quality improvement initiatives should be targeted at health care providers that have the highest likelihood of being the poorest performers; geography-specific, environmental assessments should be targeted at the most likely high incidence locations (Wright et al. 2003); genes with differential expression in the top 1% (say) should be selected for further study.

We construct new loss functions that focus on down/up classification and derive the optimizers for a subset of them. We develop connections between the new optimizers and others in the literature; report performance evaluations among the new ranking methods and other candidates; identify appropriate uncertainty assessments including a new performance measure. We evaluate performance for a fully parametric hierarchical model with unit-specific Gaussian sampling distributions and assuming either a Gaussian or a mixture of Gaussians prior. We evaluate performance and robustness under the prior and loss function that was used to generate the ranks as well as under other priors and loss functions. Shen and Louis (1998) showed that when the posterior distributions are stochastically ordered, maximum likelihood estimate based ranks, posterior mean based ranks, SEL-optimal ranks and those based on most other rank-specific loss functions are identical. We report performance assessments for the stochastically ordered case and compare approaches for situations when the posterior distributions are not stochastically ordered. We illustrate approaches using Standardized Mortality Ratio (SMR) data from the United States Renal Data System (USRDS).

We consider a two-stage model with independent identically distributed (*iid*) sampling from a known prior *G* with density *g* and possibly different unit-specific sampling distributions *f _{k}*:

$$\begin{array}{c}\hfill {\theta}_{1},\dots ,{\theta}_{K}\phantom{\rule{thinmathspace}{0ex}}\mathit{\text{iid}}\phantom{\rule{thinmathspace}{0ex}}G({\theta}_{k}),k=1,\dots ,K\hfill \\ \hfill {Y}_{k}|{\theta}_{k}\phantom{\rule{thinmathspace}{0ex}}\sim \phantom{\rule{thinmathspace}{0ex}}{f}_{k}({Y}_{k}|{\theta}_{k}).\hfill \end{array}$$

(1)

From model (1) we can derive the independent (ind) posterior distributions for Bayesian inferences:

$$[{\theta}_{k}|{Y}_{k}]\phantom{\rule{thinmathspace}{0ex}}\mathit{\text{ind}}\phantom{\rule{thinmathspace}{0ex}}{g}_{k}({\theta}_{k}|{Y}_{k})=\frac{{f}_{k}({Y}_{k}|{\theta}_{k})g({\theta}_{k})}{{\displaystyle \int {f}_{k}({Y}_{k}|u)g(u)du}}.$$

For computing efficiency, we assume that the θs are *iid*, though model (1) can be generalized to allow a regression structure in the prior and extended to three stages. Our theoretical results hold for these more general situations.

Let **θ** = (θ_{1} ,…, θ_{K}) and **Y** = (*Y*_{1}, …, *Y _{K}*). For a loss function

$${\text{Risk}}_{G}(\mathbf{a}(\mathbf{Y}),\mathbf{Y})\phantom{\rule{thinmathspace}{0ex}}=\phantom{\rule{thinmathspace}{0ex}}{\mathrm{E}}_{\mathbf{\theta}|\mathbf{Y}}[L(\mathbf{\theta},\mathbf{a}(\mathbf{Y}))\phantom{\rule{thinmathspace}{0ex}}|\phantom{\rule{thinmathspace}{0ex}}\mathbf{Y}],$$

and thereby the pre-posterior risk

$${\text{Risk}}_{G}(\mathbf{a})\phantom{\rule{thinmathspace}{0ex}}=\phantom{\rule{thinmathspace}{0ex}}{\mathrm{E}}_{\mathbf{Y}}[{\text{Risk}}_{G}(\mathbf{a}(\mathbf{Y}),\mathbf{Y})].$$

Also, for any **a**(**Y**) we can compute the frequentist risk:

$$\text{Risk}(\mathit{\theta},\mathbf{a}(\xb7))={\mathrm{E}}_{\mathbf{Y}|\mathit{\theta}}[L(\mathit{\theta},\mathbf{a}(\mathbf{Y}))|\mathit{\theta}].$$

Laird and Louis (1989) represented the ranks by,

$${R}_{k}(\mathit{\theta})=\text{rank}({\theta}_{k})={\displaystyle \sum _{j=1}^{K}{I}_{\{{\theta}_{k}\ge {\theta}_{j}\}}},$$

(3)

with the smallest θ having rank 1 and the largest having rank *K*. The non-linear form of (3) implies that, in general, the optimal ranks are neither the ranks of the observed data nor the ranks of the posterior means of the θs. A loss function is necessary to formalize developing estimates and related uncertainties.

Square error loss (SEL) is the most common loss function used in estimation. It is optimized by the posterior mean of the target parameter. For example, under the model (1), with the unit-specific θs as the target, the loss function is *L*(θ, *a*) = (θ – *a*)^{2} and the optimal estimator is posterior mean (PM) ${\theta}_{k}^{pm}=E({\theta}_{k}|\mathbf{Y})$.

When ranks are the target, producing SEL-optimal ranks by minimizing

$$\widehat{L}=\widehat{L}({\mathbf{R}}^{\mathbf{\text{est}}},R(\mathbf{\theta}))=\frac{1}{K}{\displaystyle \sum _{k}{({R}_{k}^{\mathit{\text{est}}}-{R}_{k}(\mathbf{\theta}))}^{2}}$$

(4)

and setting ${R}_{k}^{\mathit{\text{est}}}$ equal to,

$${\overline{R}}_{k}(\mathbf{Y})={E}_{\theta |\mathbf{Y}}[{R}_{k}(\theta )|\mathbf{Y}]={\displaystyle \sum _{j=1}^{K}\text{pr}({\theta}_{k}\ge {\theta}_{j}|\mathbf{Y}).}$$

(5)

The * _{k}* are shrunk towards the mid-rank (

$${\widehat{R}}_{k}(\mathbf{Y})=\text{rank}({\overline{R}}_{k}(\mathbf{Y})).$$

(6)

See Section Appendix A for additional details on producing optimal ranks under weighted SEL.

Henceforth, we drop dependency on **θ** and omit conditioning on **Y** whenever this does not cause confusion. For example, *R _{k}* stands for

$${P}_{k}={R}_{k}/(K+1);\phantom{\rule{thinmathspace}{0ex}}{\widehat{P}}_{k}={\widehat{R}}_{k}/(K+1),\phantom{\rule{thinmathspace}{0ex}}\text{etc}.$$

(7)

normalize large sample performance and aid in communication. For example, Lockwood et al. (2002) showed that mean square error (MSE) for percentiles rapidly converges to a function that does not depend on *K*; the same normalization strategy applies in the loss functions below.

*L ^*(Equation 4) evaluates general performance without specific attention to identifying the relatively well or poorly performing units. To attend to this goal, for 0 < γ < 1 we investigate loss functions that focus on identifying the upper 100(1 − γ)% of the units, with loss depending on the correctness of classification and, possibly, a distance penalty; identification of the lower 100γ% group is similar. For notational convenience, we assume that γ*K* is an integer, so γ(*K* + 1) is not an integer and in the following it is not necessary to distinguish between (>, ≥) or (<, ≤).

For 0 < γ < 1, let

$$\begin{array}{c}{\mathit{\text{AB}}}_{k}(\gamma ,{P}_{k},{P}_{k}^{\mathit{\text{est}}})\phantom{\rule{thickmathspace}{0ex}}=\phantom{\rule{thickmathspace}{0ex}}{I}_{\{{P}_{k}>\gamma ,{P}_{k}^{\mathit{\text{est}}}<\gamma \}}={I}_{\{{R}_{k}>\gamma (K+1),{R}_{k}^{\mathit{\text{est}}}<\gamma (K+1)\}},\hfill \\ {\mathit{\text{BA}}}_{k}(\gamma ,{P}_{k},{P}_{k}^{\mathit{\text{est}}})\phantom{\rule{thinmathspace}{0ex}}=\phantom{\rule{thinmathspace}{0ex}}{I}_{\{{P}_{k}<\gamma ,{P}_{k}^{\mathit{\text{est}}}>\gamma \}}={I}_{\{{R}_{k}<\gamma (K+1),{R}_{k}^{\mathit{\text{est}}}>\gamma (K+1)\}.}\hfill \end{array}$$

(8)

*AB _{k}* and

For *p, q, c* ≥ 0 define,

$$\begin{array}{cc}\tilde{L}(\gamma ,p,q,c)\hfill & =\frac{1}{K}{\displaystyle \sum _{k}\{|\gamma -{P}_{k}^{\mathit{\text{est}}}{|}^{p}{\mathit{\text{AB}}}_{k}(\gamma ,{P}_{k},{P}_{k}^{\mathit{\text{est}}})+c|{P}_{k}^{\mathit{\text{est}}}-\gamma {|}^{q}{\mathit{\text{BA}}}_{k}(\gamma ,{P}_{k},{P}_{k}^{\mathit{\text{est}}})\},}\hfill \\ {L}^{\u2020}(\gamma ,p,q,c)\hfill & =\frac{1}{K}{\displaystyle \sum _{K}\{|{P}_{k}-\gamma {|}^{p}{\mathit{\text{AB}}}_{k}(\gamma ,{P}_{k},{P}_{k}^{\mathit{\text{est}}})+c|\gamma -{P}_{k}{|}^{q}{\mathit{\text{BA}}}_{k}(\gamma ,{P}_{k},{P}_{k}^{\mathit{\text{est}}})\},}\hfill \\ {L}^{\u2021}(\gamma ,p,q,c)\hfill & =\frac{1}{K}{\displaystyle \sum _{k}\{|{P}_{k}-{P}_{k}^{\mathit{\text{est}}}{|}^{p}{\mathit{\text{AB}}}_{k}(\gamma ,{P}_{k},{P}_{k}^{\mathit{\text{est}}})+c|{P}_{k}^{\mathit{\text{est}}}-{P}_{k}{|}^{q}{\mathit{\text{BA}}}_{k}(\gamma ,{P}_{k},{P}_{k}^{\mathit{\text{est}}})\},}\hfill \\ \hfill {L}_{0/1}(\gamma )& =\frac{{\displaystyle {\sum}_{k}\{{\mathit{\text{AB}}}_{k}(\gamma ,{P}_{k},{P}_{k}^{\mathit{\text{est}}})+{\mathit{\text{BA}}}_{k}(\gamma ,{P}_{k},{P}_{k}^{\mathit{\text{est}}})\}}}{K}=2\frac{{\displaystyle {\sum}_{k}{\mathit{\text{AB}}}_{k}(\gamma ,{P}_{k},{P}_{k}^{\mathit{\text{est}}})}}{K}\hfill \\ \hfill & =\frac{\#(\text{misclassifications})}{K}=\tilde{L}(\gamma ,0,0,1)={L}^{\u2020}(\gamma ,0,0,1)={L}^{\u2021}(\gamma ,0,0,1).\hfill \end{array}$$

(9)

The loss functions confer no penalty if the pair of estimated and true unit-specific percentiles, $({P}_{k}^{\mathit{\text{est}}},{P}_{k})$, are either both above or both below the γ cut point. If they are on different sides of γ, penalizes by an amount that depends on the distance of the estimated percentile from γ, *L*^{†} by the distance of the true percentile from γ and *L*^{‡} by the distance between the true and estimated percentiles. Parameters *p* and *q* adjust the intensity of the penalties; *p* ≠ *q* and *c* ≠ 1 allow for different penalties for the two kinds of misclassification. *L*_{0/1}(γ) counts the number of discrepancies and is equivalent to setting *p* = *q* = 0, *c* = 1; we use the relation ${\sum}_{k}{\mathit{\text{AB}}}_{k}(\gamma ,{P}_{k},{P}_{k}^{\mathit{\text{est}}})={\displaystyle {\sum}_{k}{\mathit{\text{BA}}}_{k}(\gamma ,{P}_{k},{P}_{k}^{\mathit{\text{est}}}})$ in its definition. In practice, *L*^{†} and L^{‡} would be harder to use than because their penalizing quantities depend on unknown *P _{k}*. However, inclusion of them in our investigation does help to calibrate the robustness of other estimators.

Our mathematical analyses apply to the general loss functions (9), but our simulations are conducted for *p* = *q* = 2, *c* = 1. Within this setting, we denote the first three loss functions in (9) as (γ), *L*^{†}(γ) and *L*^{‡}(γ).

We do not investigate the “all or nothing, experiment-wise” loss function with zero loss, when all units are correctly classified as above γ or below γ and penalty 1 if any unit is misclassified. While this loss function is one of the most fundamental and provides framework in many multiple comparison methods, it does not provide a good guideline in our ranking problem. Finding the optimal classification is challenging in computation and there will be many nearly optimal solutions. Furthermore, in the spirit of computing the false detection rate, loss functions that compute average performance over units or a subset of units are more appropriate in most applications.

We find ranks/percentiles that optimize *L*_{0/1} and and study an estimator that performs well for *L*^{‡}, but is not optimal. First, note that optimizers for the loss functions in (9) and are equal when the posterior distributions are stochastically ordered (the *G _{k}*(

*L*_{0/1} loss is minimized by

$$\begin{array}{cc}{\tilde{R}}_{k}(\gamma )\hfill & =\phantom{\rule{thinmathspace}{0ex}}\text{rank}\{\text{pr}({P}_{k}\ge \gamma |\mathbf{Y})\},\hfill \\ \hfill {\tilde{P}}_{k}(\gamma )& =\phantom{\rule{thinmathspace}{0ex}}{\tilde{R}}_{k}(\gamma )/(K+1).\hfill \end{array}$$

(10)

These are not unique optimizers.

See Section Appendix B.

The _{k}(γ) optimize . In Section Appendix C, we show in detail that the _{k}(γ) are also optimal for more general loss functions with the distance penalties $|\gamma -{P}_{k}^{\mathit{\text{est}}}{|}^{p}\phantom{\rule{thinmathspace}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}c|{P}_{k}^{\mathit{\text{est}}}-\gamma {|}^{q}$ replaced by any nondecreasing functions of $|\gamma -{P}_{k}^{\mathit{\text{est}}}|$. The proof has three steps: first, classify the units into (above γ)/(below γ) groups; second, inside each group, rewrite the posterior risk as the inner product of the discrepancy vector and the misclassification probability vector; third, repeatedly use the rearrangement inequality (Hardy et al. 1967) to minimize the inner product.

Normand et al. (1997) proposed using the posterior probability pr(θ_{k} > *t*| **Y**) and ranks based on it to compare the performance of medical care providers. The cut point *t* is an application-relevant threshold. Using this approach, we define ${P}_{k}^{*}(\gamma )$ with properly chosen cut point *t* and show that ${P}_{k}^{*}(\gamma )$ is essentially identical to * _{k}*(γ).

**Definition of** ${\mathit{P}}_{\mathit{k}}^{*}(\gamma )$: Let

$${\overline{G}}_{\mathbf{Y}}(t)=\frac{1}{K}{\displaystyle \sum _{j=1}^{K}\text{pr}({\theta}_{j}\le t|\mathbf{Y}),}$$

(11)

and define ${P}_{k}^{*}(\gamma )$ as the percentiles produced by ranking the pr(θ*k* ≥ ${\overline{G}}_{\mathbf{Y}}^{-1}$ (γ)|**Y**). Section 6.2 gives a relation among _{k}, _{k}(γ) and ${P}_{k}^{*}$. Theorems 5 and 6 show that _{k}(γ) is asymptotically equivalent to ${P}_{k}^{*}(\gamma )$.

By making a direct link to the original θ scale, ${P}_{k}^{*}(\gamma )$ is straightforward to explain and interpret. Furthermore, for a desired accuracy, computing ${P}_{k}^{*}(\gamma )$ is substantially faster than computing _{k}(γ), since the former requires only accurate computation of individual posterior distributions and of _{Y}.

Section Appendix D presents an optimization procedure for the case *p* = *q* = 2, −∞ < *c* < ∞. However, other than use of brute force (complete enumeration), we have not found an algorithm for general (*p*, *q*, *c*). As for *L*_{0/1}, performance depends only on optimal allocation into the (above γ)/(below γ) groups. Additional criteria are needed to specify the within-group order.

We have not found a simple means of optimizing this loss function, but Section Appendix E develops a helpful relation.

Traditional rank estimators include ranks based on maximum likelihood estimates, those based on posterior means of the θs and those based on hypothesis testing statistics (Z-scores, P-values). MLE-based ranks are monotone transform invariant, but the others are not. As shown in Liu et al. (2004), MLE-based ranks will tend to give units with relatively large variances extreme ranks, while hypothesis testing statistics based ranks will tend to place units with relatively small variances at the extremes. Though not an optimal solution to this problem, modified hypothesis testing statistics moderate this shortcoming by reducing the ratio of the variances (Tusher et al. 2001; Efron et al. 2001).

Ranking estimator _{k}(γ) optimize the (above γ)/(below γ) misclassification loss *L*_{0/1} and _{k} optimize the , which penalizes on the distance |*P ^{est}_{k}*

**Definition of** $\widehat{\tilde{P}}(\gamma )$: Use _{k}(γ) to classify into (above γ)/(below γ) percentile groups. Then, within each percentile group order the estimates by _{k}.

$\widehat{\tilde{P}}(\gamma )$ minimizes *L*_{0/1} and conditional on this (above γ)/(below γ) classification, produces optimal SEL estimates within the two groups.

See Section Appendix F.

Furthermore, it is straightforward to show that for ${\widehat{L}}_{0/1}^{w}(\gamma )$ there exists a *w*_{*} > 0 such that for all *w* ≤ *w*_{*}, $\widehat{\tilde{P}}(\gamma )$ is optimal and there exists a *w** < 1 such that for all *w* ≥ *w**, _{k} is optimal.

In this section we develop relations among estimators using ranks or percentiles depending on the context for convenience.

Let *ν* = [γ*K*] and define

$${R}_{k}^{+}(\nu )=\frac{K(K+1)}{2(K-\nu +1)}\text{pr}({R}_{k}\ge \nu ).$$

Then the ranked ${R}_{k}^{+}(\nu )$ equal the ranked pr(*R _{k}* ≥ ν) and so each generates the

$$\sum _{k}\frac{K(K+1)}{2(K-\nu +1)}\text{pr}({R}_{k}\ge \nu )={\displaystyle \sum _{k}{R}_{k}^{+}(\nu )=\frac{K(K+1)}{2}={\displaystyle \sum _{k}{R}_{k}.}}$$

_{k} is a linear combination of the ${R}_{k}^{+}(\nu )$ with respect to ν and so for any convex loss function the _{k} outperform the ${R}_{k}^{+}(\nu )$ for at least one value of ν = 1,…,*K*. For SEL, _{k} dominates for all ν. As shown in Section Appendix A, the _{k} = rank(_{k}) also dominate rank $({R}_{k}^{+}(\nu ))=\tilde{{R}_{k}}(\gamma )$ for all ν.

See Section Appendix G.

From (3), (5) and (11), we have that,

$$\begin{array}{ccc}{\overline{G}}_{\mathbf{Y}}({\theta}_{k})\hfill & =\hfill & \mathrm{E}[{R}_{k}|{\theta}_{k}/K,\hfill \\ \hfill {\overline{R}}_{k}& =\hfill & \mathrm{E}[{R}_{k}]=K\mathrm{E}[{\overline{G}}_{\mathbf{Y}}({\theta}_{k})]=\mathrm{E}\{\mathrm{E}[{R}_{k}|{\theta}_{k}]\}.\hfill \end{array}$$

The ${R}_{k}^{*}(\gamma )$ are generated by ranking the pr $({\theta}_{k}\ge {\overline{G}}_{\mathbf{Y}}^{-1}(\gamma ))$, which is equivalent to ranking pr(_{Y} (θ_{k}) ≥ γ). By the foregoing, it is equivalent to ranking the pr(E[*R _{k}*|θ

Assume that ${\theta}_{k}\stackrel{\mathit{\text{iid}}}{\sim}G,\phantom{\rule{thinmathspace}{0ex}}{Y}_{k}|{\theta}_{k}\stackrel{\mathit{\text{ind}}}{\sim}f({Y}_{k}|{\theta}_{k})$ and that the posterior cumulative distribution function (cdf) of each θ_{k} is continuous and differentiable at *G*^{−1}(γ). If *G*_{k}(· |**Y**) has a universally bounded second moment, then for *K* → ∞, ${P}_{k}^{*}(\gamma )$ is equivalent to _{k}(γ).

See Section Appendix H.

Assume that ${\theta}_{k}\stackrel{\mathit{\text{iid}}}{\sim}G,\phantom{\rule{thinmathspace}{0ex}}{Y}_{k}|{\theta}_{k}\stackrel{\mathit{\text{ind}}}{\sim}f({Y}_{k}\phantom{\rule{thinmathspace}{0ex}}|\phantom{\rule{thinmathspace}{0ex}}{\theta}_{k},{\zeta}_{\mathit{k}})$ and that the posterior cumulative distribution function (cdf) of each θ_{k} is continuous and differentiable at *G*^{−1}(γ). Furthermore, assume that the empirical distribution function (edf) of the ζ_{k} converges to a probability distribution. If *G*_{k}(·|**Y**, ζ) has a universally bounded second moment, then for *K* → ∞, ${P}_{k}^{*}(\gamma )$ is equivalent to _{k}(γ).

Regard ζ_{k} as part of the observed data and use ${Y}_{k}^{\prime}=({Y}_{k},{\zeta}_{k})$ in Theorem 5.

Theorems 5 and 6 imply that ${P}_{k}^{*}(\gamma )$ is asymptotically optimal for and provides a loss function basis for the Normand et al. (1997) estimates.

We provide a unified approach to loss function based ranking. To this end, we define a non-negative, nondecreasing **scoring function** *S*(*P*) : (0, 1) → [0, 1]. The function can be regarded as the scores (reward) a unit would get if its percentile was *P*. It relates percentiles, *P*, to “consequences” *S*(*P*). These relations can help in eliciting an application-relevant loss function and in interpreting loss-function based percentiles. We use SEL for *S*(*P*) to produce percentiles and ranks, specifically:

$${L}_{s}=L({\mathbf{S}}^{\mathbf{\text{est}}},S(P(\mathit{\theta})))=\frac{1}{K}{\displaystyle \sum _{k}{({S}_{k}^{\mathit{\text{est}}}-S({P}_{k}(\mathit{\theta})))}^{2}.}$$

(12)

The optimal **S ^{est}** satisfy ${S}_{k}^{\mathit{\text{est}}}=\mathrm{E}[S({P}_{k})|\mathbf{Y}]$ and we use the ranks and percentiles based on them.

When *S*(*P*) = *aP* + *b*, *a* > 0, i.e. the reward is linear in the estimated percentile, we have _{s} = *a*^{2} and so _{k} is associated with linear rewards. When *S*(*P*) = I_{{P>γK}}, i.e., the reward only depends on whether the percentile of a unit is beyond the threshold γ, there is no constraint on the rankings of units inside each of the (above γ)/(below γ) groups. With this setting, there exist many optimizers and _{k}(γ) is one of them. For the two stage ranking estimator ${\widehat{\tilde{P}}}_{k}(\gamma )$, let *S*(*P*) = *aP* + I_{{P>γ}} and so ${S}_{k}^{\mathit{\text{est}}}=a{\overline{P}}_{k}+\text{pr}({P}_{k}>\gamma )$. When *a* is sufficiently close to zero, $\widehat{\tilde{P}}(\gamma )$ is optimal. More *S*(*P*) are given in Section Appendix I.

In a Bayesian model, an estimator’s performance can be evaluated by using the posterior distribution (data analytic evaluations) and by using the marginal distribution of the data (pre-posterior evaluations). For example, preposterior SEL performance is the sum of expected posterior variance plus expected squared posterior bias.

We provide evaluations relative to the loss function used to produce the estimate and other potentially relevant loss functions. For example, performance with respect to SEL should be computed for the SEL optimizer and for other estimators. These comparisons help to determine the efficiency of an estimator that optimizes one loss function when evaluated for other loss functions. Procedures that are robust to the choice of loss functions will be attractive in applications.

For (above γ)/(below γ) classification, plots of the posterior probability of exceeding γ versus estimated percentiles are informative (see Figure 4). Such plots can be summarized by the *a posteriori* operating characteristic (OC). For any percentiling method, define,

$$\begin{array}{ccc}\mathit{\text{OC}}(\gamma )\hfill & =\hfill & \text{pr}({P}_{k}<\gamma |{P}_{k}^{\mathit{\text{est}}}>\gamma ,\mathbf{Y})+\text{pr}({P}_{k}>\gamma |{P}_{k}^{\mathit{\text{est}}}<\gamma ,\mathbf{Y})\hfill \\ \hfill & =\hfill & \text{pr}({P}_{k}>\gamma |{P}_{k}^{\mathit{\text{est}}}<\gamma ,\mathbf{Y})/\gamma =\frac{{\mathrm{E}}_{\theta |\mathbf{Y}}{L}_{0/1}(\gamma )}{2\gamma (1-\gamma )},\hfill \end{array}$$

(13)

with the last equality following from identity ${\sum}_{k}{\mathit{\text{BA}}}_{k}(\gamma ,{P}_{k},{P}_{k}^{\mathit{\text{est}}})={\displaystyle {\sum}_{k}{\mathit{\text{AB}}}_{k}(\gamma ,{P}_{k},{P}_{k}^{\mathit{\text{est}}}})$. *OC*(γ) is the sum of two misclassification probabilities and so is optimized by _{k}(γ). It is normalized so that if the data provide no information on the θ_{k}, then for all γ, *OC*(γ) 1. Evaluating performance using only one of the probabilities, e.g., pr $({P}_{k}>\gamma |{P}_{k}^{\mathit{\text{est}}}<\gamma ,\mathbf{Y})$ is analogous to computing the false discovery rate (Benjamini and Hochberg 1995; Storey 2002, 2003).

For loss functions that sum over unit-specific components, performance can also be evaluated for individual units and, in a frequentist evaluation, for individual **θ** vectors. These evaluations are in Section 9.3.

We evaluate pre-posterior performance for the Gaussian sampling distribution with *K* = 200 using 2000 simulation replications. We compute pr(*R _{k}* = |

We evaluate estimators for the Gaussian/Gaussian, two-stage model with a Gaussian prior and Gaussian sampling distributions and allow for varying unit-specific variances. Without loss of generality we assume that the prior mean is µ = 0 and the prior variance is τ^{2} = 1. Specifically,

$$\begin{array}{ccc}\hfill {\theta}_{k}& \mathit{\text{iid}}\hfill & N(0,1)\hfill \\ \hfill {Y}_{k}|{\theta}_{k}& \sim \hfill & N({\theta}_{k},{\sigma}_{k}^{2}).\hfill \end{array}$$

This derives:

$${\theta}_{k}\phantom{\rule{thinmathspace}{0ex}}|\phantom{\rule{thinmathspace}{0ex}}{Y}_{k}\phantom{\rule{thinmathspace}{0ex}}\mathit{\text{ind}}\phantom{\rule{thinmathspace}{0ex}}N\phantom{\rule{thinmathspace}{0ex}}({\theta}_{k}^{\mathit{\text{pm}}},(1-{B}_{k}){\sigma}_{k}^{2}),$$

where ${\theta}_{k}^{\mathit{\text{pm}}}=(1-{B}_{k}){Y}_{k}\phantom{\rule{thinmathspace}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}{B}_{k}={\mathrm{\sigma}}_{k}^{2}/({\mathrm{\sigma}}_{k}^{2}+1)$. When unit-specific variances $({\sigma}_{k}^{2}\equiv {\sigma}^{2})$ are all equal, the posterior distributions are stochastically ordered and all ranking methods we investigate are identical. Evaluation for this case provides a baseline performance with respect to the set of loss functions. In practice, the $\{{\sigma}_{k}^{2}\}$’s can vary substantially and we evaluate this situation using two departures from the ${\sigma}_{k}^{2}\equiv {\sigma}^{2}$ case. In each case, the equal variance scenario is produced by *rls* = 1:

**log uniform**: Ordered, geometric sequences of the $\{{\sigma}_{k}^{2}\}$ with ratio of the largest σ^{2}to the smallest $\mathit{\text{rls}}={\sigma}_{K}^{2}/{\sigma}_{1}^{2}$ and geometric mean $\mathit{\text{gmv}}=\mathit{\text{GM}}({\sigma}_{1}^{2},\dots ,{\sigma}_{K}^{2})$.**two clusters**: The first half of the ${\sigma}_{k}^{2}\equiv {(\mathit{\text{rls}})}^{-\frac{1}{2}}$; for the second half, ${\sigma}_{k}^{2}\equiv {(\mathit{\text{rls}})}^{\frac{1}{2}}$. Here, $\mathit{\text{rls}}={\sigma}_{K}^{2}/{\sigma}_{1}^{2}$ and*gmv*= 1.

In both cases the variance sequence is monotone in *k*, but simulation results would be the same if the indices were permuted. These variance sequences — constant, uniform in the log scale, clustered at the extremes of the range — triangulate possible patterns, though specific applications can, of course, have their unique features.

This prior is a mixture of two Gaussian distributions with the mixture constrained to have mean 0 and variance 1:

$${\theta}_{k}\stackrel{\mathit{\text{iid}}}{~}\phantom{\rule{thinmathspace}{0ex}}(1-\epsilon )N\phantom{\rule{thinmathspace}{0ex}}\left(-\frac{\epsilon \mathrm{\Delta}}{A},\frac{1}{{A}^{2}}\right)+\epsilon N\phantom{\rule{thinmathspace}{0ex}}\left(\frac{(1-\epsilon )\mathrm{\Delta}}{A},\frac{{\xi}^{2}}{{A}^{2}}\right)$$

where

$${A}^{2}={A}^{2}(\epsilon ,\mathrm{\Delta},\xi )=(1-\epsilon )+\epsilon {\xi}^{2}+\epsilon (1-\epsilon ){\mathrm{\Delta}}^{2}$$

We present results for ε = 0.1, Δ = 3.40, ξ^{2} = .25, γ = 0.9 and compute the preposterior risk for estimators that are computed from the posterior produced by this mixture and for estimators that are based on a standard, Gaussian prior.

Table 1 documents *SEL* () performance for * _{k}*, the optimal estimator, for percentiled

The posterior mean of *e*^{θk} is presented to assess performance for a monotone, non-linear transform of the target parameters. For *rls* = 1, the posterior distributions are stochastically ordered and the four sets of percentiles are identical, as is their performance. As *rls* increases, performance of *Y _{k}*-derived percentiles degrades, those based on the ${\theta}_{k}^{\mathit{\text{pm}}}$ are quite competitive with

Table 2 reports results for * _{k}*,

Simulated preposterior risk for *gmv* = 1. All values are 10000×(Loss). The first block is for the Gaussian-Gaussian model; the second for the Gaussian mixture prior assuming the mixture; the third for the Gaussian mixture prior, but with analysis **...**

When *rls* = 1, ${\tilde{P}}_{k}(\gamma )\equiv {\widehat{P}}_{k}\equiv \widehat{\tilde{P}}(\gamma )$ and so differences in the *SEL* results in the first and seventh rows quantify residual simulation variation and Monte Carlo uncertainty in computing the probabilities used in equation (1) to produce the * _{k}*(γ). Results for other values of

Similar relations among the estimators hold for the two component Gaussian mixture prior and for a “frequentist scenario” with a fixed set of parameters and repeated sampling only from the Gaussian sampling distribution conditional on these parameters.

Results in Table 2 are based on *gmv* = 1. Relations among the estimators for other values of *gmv* are similar, but a look at extreme *gmv* is instructive. Results (not shown) indicate that for *rls* = 1, the risk associated with *L*_{0/1} is of the form *a*(*gmv*)γ(1 − γ), where *a*(*gmv*) is a constant depending only on the value of *gmv*. By identity (13), this implies that the expectation of *OC*(γ) is approximately constant. When *gmv* = 0, the data are fully informative, *Y _{k}* θ

Under *L*_{0/1} * _{k}*(γ) is the optimal and the difference between

Figure 1–Figure 3 are based on the Gaussian-Gaussian model. Figure 1 displays the dependence of risk on *gmv* for the exchangeable model (*rls* = 1). As expected, risk increases with *gmv*. For *rls* = 1, expected unit-specific loss equals the overall average risk and so the box plots summarize the sampling distribution of unit-specific risk.

When *rls* = 1, pre-posterior risk is the same for all units. However, when *rls* > 1, the ${\sigma}_{k}^{2}$ form a geometric sequence and preposterior risk depends on the unit. We study this non-exchangeable situation by simulation. Figure 2 displays loess smoothed performance of * _{k}*,

Loess smoothed, unit-specific performance of _{k}, _{k}(γ) and $\widehat{\tilde{P}}(\gamma )$ under , *L*^{‡}, and *L*_{0/1} as a function of unit-specific variance (${\sigma}_{k}^{2}$).

These apparent anomalies are explained as follows. If γ is near 1 (or equivalently, near 0) and if the ${\sigma}_{k}^{2}$ differ sufficiently (*rls* 1), estimates for the high variance units perform better than for those with mid-level variance. This relation is due to the improved classification of high-variance units into (above γ)/(below γ) groups, with substantial shrinkage of the percentile estimates towards 0.5. For example, with γ = 0.8, *a priori* 80% of the percentiles should be below 0.8. Estimated percentiles for the high variance units are essentially guaranteed to be below 0.8 and so the classification error for the large-variance units converges to 0.20 as *rls* → ∞. Generally, low variance units have small misclassification probabilities, but percentiles for units with intermediate variances are not shrunken sufficiently toward 0.5 to produce a low *L*_{0/1}.

As shown in the foregoing tables and by Liu et al. (2004) and Lockwood et al. (2002), even the optimal ranks and percentiles can perform poorly unless the data are very informative. Figure 3 displays average posterior classification probabilities as a function of the optimally estimated percentile for *gmv* = 0.33, 1, 10, 100 and γ = 0.6, 0.8, 0.9, when *rls* = 1. The pattern shown by the three panels should hold for other γ choices and we use the γ = 0.8 panel as the typical example for further comments. Discrimination improves with decreasing *gmv*, but even when *gmv* = 0.33 (the σ* _{k}* are 1/3 of the prior variance), the model-based, posterior probability of

We conducted investigations analogous to the all of the foregoing for the Poisson sampling distribution with a Gamma prior, with constant or unequal variances (e.g., expected values) for the unit-specific MLEs. Results are qualitatively and quantitatively very similar to those we report for the Gaussian sampling distribution.

The United States Renal Data System (USRDS) uses provider specific, standardized mortality ratios (SMRs) as a quality indicator for its nearly 4000 dialysis centers (Lacson et al. 2001; End-Stage Renal Disease (ESRD) Network 2000) and (United States Renal Data System (USRDS) 2005). Under the Poisson likelihood (last line of model (14)), with *Y _{k}* the observed and

In this “profiling” application, the loss function should reflect the end use of the ranks or percentiles. For example, suppose that the following monetary reward (increased reimbursement) and penalty (increased scrutiny) system is in place:

- A provider either does or does not receive the reward depending on whether its percentile is or is not beyond (for SMRs, below) a γ threshold.
- Providers that do get rewards receive varying amounts depending on their position among those receiving rewards.
- Providers not receiving rewards undergo increased scrutiny from an oversight committee.
- The distance from the threshold γ is used to monetize rewards or intensify scrutiny.

Loss function embodies this system with the values of *p*, *q* and *c* controlling the award/penalty differences within the (above γ)/(below γ) groups. If rewarded providers all get the same amount, then *L*_{0/1}(γ) can be used. Alternatively, rewards and scrutiny can depend on the posterior probability of exceeding or falling below γ, with both * _{k}*(γ) and ${P}_{k}^{*}(\gamma )$ optimizing the evaluation.

Liu et al. (2004) analyzed 1998 data and Lin et al. (2004) extended these analyses to 1998 – 2001 for 3173 centers with complete data using an autoregressive model. We illustrate the new loss functions and performance measures using 1998 data and the model,

$$\begin{array}{ccc}\hfill \xi \stackrel{\mathit{\text{iid}}}{\sim}\phantom{\rule{thinmathspace}{0ex}}N(\mathrm{0,\; 10}),& \hfill & \lambda ={\tau}^{-2}\stackrel{\mathit{\text{iid}}}{\sim}\phantom{\rule{thinmathspace}{0ex}}\text{Gamma}(0.05,0.2)\hfill \\ [{\theta}_{1},\dots ,{\theta}_{K}|\xi ,\tau ]\hfill & \stackrel{\mathit{\text{iid}}}{\sim}\hfill & N(\xi ,{\tau}^{2},{\theta}_{k}=\text{log}({\rho}_{k})\hfill \\ \hfill [{Y}_{k}|{m}_{k},{\rho}_{k}]& \sim \hfill & \text{Poisson}({m}_{k}{\rho}_{k}).\hfill \end{array}$$

(14)

For these data, ${\overline{G}}_{K}^{-1}(0.8)=0.18\phantom{\rule{thinmathspace}{0ex}}(\rho =1.20)$. Table 4 gives the posterior risks. For all loss functions investigated, the MLE based rank has the poorest performance; methods based on the posterior distributions generally perform well. As Theorem 6 indicates, * _{k}*(γ) and ${P}_{k}^{*}(\gamma )$ have almost identical risk.

Posterior, loss function risk for different ranking methods using the USRDS 1998 data. All values are 10000×(Loss). ${\widehat{P}}_{k}^{\mathit{\text{PM}}}$ and ${\widehat{P}}_{k}^{\mathit{\text{MLE}}}$ are percentiles based on the ${\theta}_{k}^{pm}$ and the *Y*_{k} respectively.

Figure 4 displays pr(θ* _{k}* > 0.18 |

Figure 5 displays the relation between * _{k}*(0.8) and

Effective ranking or percentiling should be based on a loss function computed from the estimated and true ranks, or be asymptotically equivalent to such loss function based estimates. Doing so produces optimal or near optimal performance and ensures desirable properties such as monotone transform invariance. In general, percentiles based on MLEs or on posterior means of the target parameter can perform poorly. Similarly, hypothesis test-based percentiles perform poorly.

Our performance evaluations are primarily for the fully parametric model with a Gaussian sampling distribution, though we do investigate departures from the Gaussian prior. Simulations for the Poisson/Gamma model produce relative performance very similar to those for the Gaussian. The * _{k}* that optimize (SEL) are “general purpose” with no explicit attention to optimize the (above γ)/(below γ) classification. The optimal (above γ)/(below γ) ranks are asymptotically equivalent to the “exceedance probability” procedure proposed in Normand et al. (1997). This near-equivalence provides insight into goals and a route to efficient computation.

We report loss function comparisons and plots based on unit-specific performance. These can be augmented by bivariate and multivariate summaries of properties, for example pair-wise posterior distributions or pair-wise operating characteristics.

When posterior distributions are not stochastically ordered and the choice of ranking methods does matter, our simulations show that though * _{k}*(γ) and $\widehat{\tilde{P}}(\gamma )$ are optimal for their respective loss functions and outperform

Performance evaluations for three-level models with a hyper-prior and robust analyses based on the non-parametric maximum likelihood prior or a fully Bayesian nonparametric prior (Paddock et al. 2006) showed that SEL-optimal ranks perform well over a wide range of prior specifications.

Other loss functions and estimates can be considered. Weighted combinations of several loss functions can be used to broaden the class of loss functions. If an application relevant loss function cannot be optimized, our evaluations provide a framework to compare candidate estimators. Our scoring function approach can help practitioners elicit a meaningful loss function with an intuitive interpretation.

Though there are a wide variety of candidate loss functions and, thereby, candidate estimated percentiles, our investigations show that in most applications one can choose between * _{k}* and ${P}_{k}^{*}(\gamma )$ (equivalently,

Whatever percentiling method is used, plots such as Figure 4 can be constructed with those percentiles on the X-axis. In general, the plot will not be monotone unless the ${P}_{k}^{*}(\gamma )$ are used, but use of the * _{k}* produces a nearly monotone plot and very good

Importantly, as do Liu et al. (2004) and Lockwood et al. (2002), we show that unless data are highly informative, even the optimal estimates can perform poorly. It is thus very important to select proper estimates for as good as possible inference, especially when performance differences between estimators are large. Data analytic performance summaries such as *SEL*, *OC*(γ) and plots like Figure 3 and Figure 4 should accompany any analysis.

This work was supported by grant 1-R01-DK61662 from the U.S. NIH, National Institute of Diabetes, Digestive and Kidney Diseases. The authors are grateful to the editors and referees for helpful comments.

*Under weighted squared error loss*:

$$\sum _{k}{\omega}_{k}{({R}_{k}^{\mathit{\text{est}}}-{R}_{k})}^{2},$$

(15)

*the optimal rank estimates are*

$${\overline{R}}_{k}=E({R}_{k}|\mathbf{Y})={\displaystyle \sum _{j}\mathit{\text{pr}}({\theta}_{k}\ge {\theta}_{j}|\mathbf{Y}).}$$

(We drop conditioning on **Y**)

$$\begin{array}{ccc}\mathrm{E}{\displaystyle \sum _{k}{\omega}_{k}{({R}_{k}^{\mathit{\text{est}}}-{R}_{k})}^{2}}\hfill & =\hfill & {\displaystyle \sum _{k}{\omega}_{k}\mathrm{E}{({R}_{k}^{\mathit{\text{est}}}-{\overline{R}}_{k}+{\overline{R}}_{k}-{R}_{k})}^{2}}\hfill \\ \hfill & =\hfill & {\displaystyle \sum _{k}{\omega}_{k}\mathrm{E}[{({R}_{k}^{\mathit{\text{est}}}-{\overline{R}}_{k})}^{2}+{({\overline{R}}_{k}-{R}_{k})}^{2}]}\hfill \\ \hfill & \ge \hfill & {\displaystyle \sum _{k}{\omega}_{k}\mathrm{E}{({\overline{R}}_{k}-{R}_{k})}^{2}}\hfill \end{array}$$

Thus, the * _{k}* are optimal.

When all *w _{k}*

$${\widehat{R}}_{k}=\text{rank of}\phantom{\rule{thinmathspace}{0ex}}({\overline{R}}_{k})$$

optimizes (15) subject to the ${R}_{k}^{\mathit{\text{est}}}$ exhausting the integers (1;…*K*). To see this, if 0 ≤ E(*R _{i}*) =

$$\begin{array}{ccc}\mathrm{E}{({R}_{i}-{r}_{i})}^{2}+\mathrm{E}{({R}_{j}-{r}_{j})}^{2}\hfill & =\hfill & \text{Var}({R}_{i})+\text{Var}({R}_{j})+{({m}_{i}-{r}_{i})}^{2}+{({m}_{j}-{r}_{j})}^{2}\hfill \\ \hfill & <\hfill & \text{Var}({R}_{i})+\text{Var}({R}_{j})+{({m}_{i}-{r}_{j})}^{2}+{({m}_{j}-{r}_{i})}^{2}\hfill \\ \hfill & =\hfill & \mathrm{E}{({R}_{i}-{r}_{j})}^{2}+\mathrm{E}{({R}_{j}-{r}_{i})}^{2}\hfill \end{array}$$

and the * _{k}* are optimal.

For general *w _{k}* there is no closed form solution, but a sorting-based algorithm based on,

$$\begin{array}{cc}\hfill & {\omega}_{i}{({m}_{i}-{r}_{i})}^{2}+{\omega}_{j}{({m}_{j}-{r}_{j})}^{2}<{\omega}_{i}{({m}_{i}-{r}_{j})}^{2}+{\omega}_{j}{({m}_{j}-{r}_{i})}^{2},\text{if}\phantom{\rule{thinmathspace}{0ex}}{r}_{j}>{r}_{i}\hfill \\ \iff \hfill & ({r}_{j}-{r}_{i})((1-\frac{{\omega}_{j}}{{\omega}_{i}})({r}_{i}+{r}_{j}-2{m}_{j})+2({m}_{j}-{m}_{i}))>0,\text{if}\phantom{\rule{thinmathspace}{0ex}}{r}_{j}>{r}_{i}\hfill \\ \iff \hfill & (1-\frac{{\omega}_{j}}{{\omega}_{i}})({r}_{i}+{r}_{j}-2{m}_{j})+2({m}_{j}-{m}_{i})>0,\text{if}\phantom{\rule{thinmathspace}{0ex}}{r}_{j}>{r}_{i}.\hfill \end{array}$$

(16)

guides the optimization. By the above inequality, reversing any two estimated ranks that do not align with * _{k}* results in a smaller squared error.

*Start from any initial ranks and implement the recursion: If inequality (16) is satisfied, switch the position of unit i and unit j, i, j* = 1,…, *K*. *The unique fixed point will minimize weighted squared error loss (15).*

Since each switch will decrease the expected loss and there are at most *n*! possible values of the expected loss, a fixed point will be reached. At the fixed point, no (*i, j*) pair produces inequality (16) and so gives the SEL minimum.

In the standard sorting problem, the quantities to sort do not depend on the current positions of the units, while the quantity in (16) does. For this reason, the convergence of the algorithm can be very slow. After units *i* and *j* are compared and ordered, if unit *i* is compared to some other unit *k* and a switch happens, then unit *i* should be compared to unit *j* again and so this pairwise-switch optimization algorithm is computationally impractical.

Rewrite the loss function as a function of the number of units that not classified in the top (1 − γ)*K*, but that should have been. Then, ${L}_{0/1}=\frac{1}{K}(K-|A\cap T|)$, where *A* is the set of indices of the observations classified in the top and *T* is the true set of indices for which rank(θ* _{k}*) > (1 − γ)

$$\begin{array}{ccc}\mathrm{E}|A\cap T|\hfill & =\hfill & \mathrm{E}{\displaystyle \sum I(k\in A\cap T})\hfill \\ \hfill & =\hfill & \mathrm{E}{\displaystyle \sum _{k\in A}I(k\in T)={\displaystyle \sum _{k\in A}\text{pr}({P}_{k}>\gamma |\mathbf{Y}).}}\hfill \end{array}$$

To optimize *L*_{0/1}, for each θ* _{k}* calculate pr(

The * _{k}*(γ) optimize

*If* *a*_{1} + *a*_{2} ≥ 0 and *b*_{1} ≤ *b*_{2}, then

$${a}_{1}{b}_{1}+{a}_{2}(1-{b}_{2})\le {a}_{1}{b}_{2}+{a}_{2}(1-{b}_{1}).$$

$$\begin{array}{ccc}({a}_{1}+{a}_{2}){b}_{1}\le ({a}_{1}+{a}_{2}){b}_{2}\hfill & \Rightarrow \hfill & {a}_{1}{b}_{1}-{a}_{2}{b}_{2}\le {a}_{1}{b}_{2}-{a}_{2}{b}_{1}\hfill \\ \hfill & \Rightarrow \hfill & {a}_{1}{b}_{1}+{a}_{2}(1-{b}_{2})\le {a}_{1}{b}_{2}+{a}_{2}(1-{b}_{1}).\hfill \end{array}$$

*Rearrangement Inequality* (Hardy et al. 1967): *If* *a*_{1} ≤ *a*_{2} ≤ … ≤ *a _{n} and*

$$\sum _{i=1}^{n}{a}_{i}{b}_{n+1-i}\le {\displaystyle \sum _{i=1}^{n}{a}_{i}{b}_{(i)}\le {\displaystyle \sum _{i=1}^{n}{a}_{i}{b}_{i}.}}$$

For *n* = 2 we use the ranking inequality:

$${a}_{1}{b}_{2}+{a}_{2}{b}_{1}\le {a}_{1}{b}_{1}+{a}_{2}{b}_{2}\iff ({a}_{2}-{a}_{1})({b}_{2}-{b}_{1})\ge 0.$$

For *n* > 2, there exists a minimum and a maximum in all n! combinations of sums of products. By the result for *n* = 2, the necessary condition for the sum to reach the minimum is that any pair of indices (*i*_{1}, *i*_{2}), (*a*_{i1}, *a*_{i2}) and (*b*_{i1}, *b*_{i2}) must have the inverse order; to reach the maximum, they must have same order. Therefore, except in the trivial cases where there are ties inside {*a _{i}*} or {

$$\begin{array}{ccc}E({L}_{{R}_{K}^{\mathit{\text{est}}}}(\gamma ,p,q,c))=\hfill & \hfill & {\displaystyle \sum _{i=1}^{[\gamma (K+1)]}|\gamma (K+1)-i{|}^{p}\text{pr}({R}_{(i)}\ge \gamma (K+1))}\hfill \\ \hfill & +\hfill & {\displaystyle \sum _{i=[\gamma (K+1)]+1}^{K}c|i-\gamma (K+1){|}^{q}(1-\text{pr}({R}_{(i)}\ge \gamma (K+1))).}\hfill \end{array}$$

For optimum ranking, the following conditions are necessary:

- By Lemma 1, for any (
*i*_{1};*i*_{2}) satisfying (1 ≤*i*_{1}≤ [γ(*K*+ 1)], [γ(*K*+ 1)] + 1 ≤*i*_{2}≤*K*), it is required that pr(*R*_{(i1)}≥ γ (*K*+ 1)) ≤ pr(*R*_{(i2)}≤ γ(*K*+ 1)). To satisfy this condition, divide the units into two groups by picking the units with largest (1 − γ)*K*values of pr(*R*≥ γ(_{k}*K*+ 1)) into the (above γ) group. - By Lemma 2
- For the set {
*k*:*R*=_{k}*R*_{(i)},*i*= 1, …, [γ(*K*+ 1)]} , since |γ(*K*+ 1) −*i*|is a decreasing function of i, we require that pr(^{p}*R*_{(i1)}≥ γ(*K*+ 1)) ≥ pr(*R*_{(i2)}≥ γ(*K*+ 1))*if**i*_{1}>*i*_{2}. Therefore, for the units with ranks (1,… γ*K*), the ranks should be determined by ranking the pr(*R*≥ γ(_{k}*K*+ 1)). - For the set {
*k*:*R*=_{k}*R*_{(i)},*i*= [γ(*K*+ 1)] + 1, … ,*K*} since |*i*− γ(*K*+ 1) |is an increasing function of^{q}*i*, we require that pr(*R*_{(i1)}≥ γ(*K*+ 1)) ≥ pr(*R*_{(i2)}≥ γ(*K*+ 1)) if*i*_{1}>*i*_{2}. Therefore, for the units with ranks (γ*K*+ 1,…,*K*), the ranks should be determined by ranking the pr(*R*≥ (_{k}*K*+ 1)).

These conditions imply that the * _{k}*(γ) (

As in the proof of Theorem 2, we begin with a necessary condition for optimization. Denote by *R*_{(i1)}, *R*_{(i2)} the rank random variables for units whose ranks are estimated as *i*_{1}, *i*_{2}, where *i*_{1} < γ(*K* + 1), *i*_{2} > γ(*K* + 1). Let,

$$\text{pr}({R}_{({i}_{1})}\ge \gamma (K+1))={p}_{1},\text{pr}({R}_{({i}_{2})}\ge \gamma (K+1)={p}_{2}.$$

For the index selection to be optimal,

$$\begin{array}{cc}\hfill & \mathrm{E}[{({R}_{({i}_{1})}-\gamma (K+1))}^{2}|{R}_{({i}_{1})}\ge \gamma (K+1)]{p}_{1}+c\mathrm{E}[({R}_{({i}_{2})}-\gamma {(K+1))}^{2}|{R}_{({i}_{2})}<\gamma (K+1)](1-{p}_{2})\hfill \\ \le \hfill & c\mathrm{E}[{({R}_{({i}_{1})}-\gamma (K+1))}^{2}|{R}_{({i}_{1})}<\gamma (K+1)](1-{p}_{1})+\mathrm{E}[{({R}_{({i}_{2})}-\gamma (K+1))}^{2}|{R}_{({i}_{2})}\ge \gamma (K+1)]{p}_{2.}\hfill \end{array}$$

The following is equivalent to the foregoing:

$$\begin{array}{cc}\hfill & \mathrm{E}[{({R}_{({i}_{1})}-\gamma (K+1))}^{2}|{R}_{({i}_{1})}\ge \gamma (K+1)]{p}_{1}-c\mathrm{E}[{R}_{({i}_{1})}-\gamma {(K+1))}^{2}|{R}_{({i}_{1})}<\gamma (K+1)](1-{p}_{1})\hfill \\ \le \hfill & \mathrm{E}[{({R}_{({i}_{2})}-\gamma (K+1))}^{2}|{R}_{({i}_{2})}\ge \gamma (K+1)]{p}_{2}-c\mathrm{E}[{R}_{({i}_{2})}-\gamma {(K+1))}^{2}|{R}_{({i}_{2})}<\gamma (K+1)](1-{p}_{2}).\hfill \end{array}$$

Therefore, with *p _{k}* = pr(

$$\mathrm{E}[{({R}_{k}-\gamma (K+1))}^{2}|{R}_{k}\ge \gamma (K+1)]{p}_{k}-c\mathrm{E}[{({R}_{k}-\gamma (K+1))}^{2}|{R}_{k}<\gamma (K+1)](1-{p}_{k}).$$

This result is useful and different from that of WSEL in Section Appendix A in the sense that we can now successfully get a quantity depend on unit index *k* only. However, as for *L*_{0/1} optimization of *L*^{†} does not induce an optimal ordering in the two groups. A second stage loss, for example SEL, can be imposed within the two groups.

As for optimizing WSEL in Section Appendix A, a pairwise switch algorithm is computationally challenging, since the decision on switching a pair of units depends on their relative position and on their estimated ranks. Thus, in each iteration all pair-wise relations have to be checked. We have not identified a general representation or efficient algorithm for the optimal ranks. However, we have developed the following relation between *L*^{†}, and *L*^{‡}. Note that when either $A{B}_{k}(\gamma ,{P}_{k},{P}_{k}^{\mathit{\text{est}}})\ne 0\phantom{\rule{thinmathspace}{0ex}}\text{or}\phantom{\rule{thinmathspace}{0ex}}B{A}_{k}(\gamma ,{P}_{k},{P}_{k}^{\mathit{\text{est}}})\ne 0$ it must be the case that either ${P}_{k}^{\mathit{\text{est}}}\ge \gamma \ge {P}_{k}\phantom{\rule{thinmathspace}{0ex}}\mathrm{\text{or}}\phantom{\rule{thinmathspace}{0ex}}{P}_{k}\ge \gamma \ge {P}_{k}^{\mathit{\text{est}}}$. Equivalently,

$$|{P}_{k}-\gamma |+|{P}_{k}^{\mathit{\text{est}}}-\gamma |=|{P}_{k}-{P}_{k}^{\mathit{\text{est}}}|\phantom{\rule{thinmathspace}{0ex}}\text{or}\phantom{\rule{thinmathspace}{0ex}}\frac{|{P}_{k}-\gamma |}{|{P}_{k}-{P}_{k}^{\mathit{\text{est}}}|}+\frac{|{P}_{k}^{\mathit{\text{est}}}-\gamma |}{|{P}_{k}-{P}_{k}^{\mathit{\text{est}}}|}=1.$$

Now, suppose *c* > 0, *p* ≥ 1, *q* ≥ 1 and let *m* = max(*p,q*). Then, using the inequality 2^{1−m} ≤ *a ^{m}* + (1 −

Since the (above γ)/(below γ) groups are formed by * _{k}*(γ), $\widehat{\tilde{P}}(\gamma )$ minimizes

$$\mathrm{E}{\displaystyle \sum _{k}{({R}_{k}^{\mathit{\text{est}}}-{R}_{k})}^{2}={\displaystyle \sum _{k}V({R}_{k})+{\displaystyle \sum _{k}{({R}_{k}^{\mathit{\text{est}}}-{\overline{R}}_{k})}^{2}}.}}$$

Nothing can be done to reduce the variance terms. The summed squared bias partitions into,

$$\sum _{k}{({R}_{k}^{\mathit{\text{est}}}-{\overline{R}}_{k})}^{2}={\displaystyle \sum _{k=1}^{\gamma K}{({R}_{k}^{\mathit{\text{est}}}-{\overline{R}}_{k})}^{2}+{\displaystyle \sum _{k=\gamma K+1}^{K}{({R}_{k}^{\mathit{\text{est}}}-{\overline{R}}_{k})}^{2}}}$$

which must be minimized subject to the constraints that $({R}_{1}^{\mathit{\text{est}}},\dots ,{R}_{\gamma K}^{\mathit{\text{est}}})\in \{1,\dots ,\gamma K\}\phantom{\rule{thinmathspace}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}({R}_{\gamma K+1}^{\mathit{\text{est}}},\dots ,{R}_{K}^{\mathit{\text{est}}})\in \{\gamma K+1,\dots ,K\}$. We deal only with the (below γ) group; the (above γ) group is handled in the same manner. Without loss of generality assume that _{1} < _{2} < … < _{γK} and compare SEL for ${R}_{k}^{\mathit{\text{est}}}=\text{rank}\left({\overline{R}}_{K}\right)=k,k=1,\dots ,\gamma K$ to any other assignment. It is straightforward to show that switching any pair that does not follow the * _{k}* order reduces SEL. Iterating this and noting that the

Recall that for a positive, discrete random variable the expected value can be computed as the sum of (1 − cdf) at mass points, where cdf is the cumulative distribution function, so

$$\begin{array}{ccc}{\overline{R}}_{k}\hfill & =\hfill & {\displaystyle \sum _{\nu =1}^{K}\nu \text{pr}[{R}_{k}=\nu ]={\displaystyle \sum _{\nu =1}^{K}\text{pr}[{R}_{k}\ge \nu ]}}\hfill \\ \hfill & =\hfill & {\displaystyle \sum _{\nu =1}^{K}\frac{2(K-\nu +1)}{K(K+1)}\frac{K(K+1)}{2(K-\nu +1)}\text{pr}[{R}_{k}\ge \nu ]}\hfill \\ \hfill & =\hfill & {\displaystyle \sum _{\nu =1}^{K}\frac{2(K-\nu +1)}{K(K+1)}{R}_{k}^{+}(\nu ).}\hfill \end{array}$$

(17)

Relation (17) can be used to show that when the posterior distributions are stochastically ordered, _{k}* _{k}*(γ) because the order of pr[

In this proof we use **Y _{K}** rather than

The posterior independence of θ_{k} is straightforward. Denote **θ** = (θ_{1}, …, θ* _{k}*,…, θ

For the * _{k}*(γ)’s generator:

$$\begin{array}{ccc}\text{pr}({P}_{k}\ge \gamma |{\mathbf{Y}}_{\mathbf{K}})\hfill & =\hfill & \mathrm{E}[I({\theta}_{k}\ge {\theta}_{(\gamma )})|{\mathbf{Y}}_{\mathbf{K}})]\hfill \\ \hfill & =\hfill & \mathrm{E}[I({\theta}_{k}\ge {\theta}_{(\gamma )}^{(-k)}|{\mathbf{Y}}_{\mathbf{K}})]+\mathrm{E}[I({\theta}_{k}\ge {\theta}_{(\gamma )})-I({\theta}_{k}\ge {\theta}_{(\gamma )}^{(-k)})|{\mathbf{Y}}_{\mathbf{K}}]\hfill \end{array}$$

(18)

For the second term in (18)

$$\mathrm{E}[I({\theta}_{k}\ge {\theta}_{(\gamma )})-I({\theta}_{k}\ge {\theta}_{(\gamma )}^{(-k)})|{\mathbf{Y}}_{\mathbf{K}}]=\phantom{\rule{thinmathspace}{0ex}}-\text{pr}({\theta}_{(\gamma )}^{(-k)}\le {\theta}_{k}<{\theta}_{(\gamma )}|{\mathbf{Y}}_{\mathbf{K}})+\text{pr}({\theta}_{(\gamma )}\le {\theta}_{k}<{\theta}_{(\gamma )}^{(-k)}|{\mathbf{Y}}_{\mathbf{k}})$$

We have the inequality $\frac{i-1}{K-1}<\frac{i}{K}\le \gamma <\frac{i+1}{K}$, θ_{(γ)} = θ_{(i,K)} by definition. Consider the relation between $\frac{i}{K-1}$ and γ:

- If $\gamma <\frac{i}{K-1},\phantom{\rule{thinmathspace}{0ex}}\text{then}\phantom{\rule{thinmathspace}{0ex}}\frac{i-1}{K-1}<\frac{i}{K}\le \gamma <\frac{i}{K-1}<\frac{i+1}{K},\phantom{\rule{thinmathspace}{0ex}}{\theta}_{(\gamma )}^{(-k)}={\theta}_{(i-1,K-1)}$ pr(θ
_{(i−1,K−1)}≤ θ_{k}< θ_{(i,K)}|**Y**) = 0 and pr(θ_{K}_{(i,K)}< θ_{k}≤ θ_{(i−1,K−1)}|**Y**) = 0;_{K} - If $\frac{i}{K-1}\le \gamma ,\phantom{\rule{thinmathspace}{0ex}}\text{then}\phantom{\rule{thinmathspace}{0ex}}\frac{i-1}{K-1}<\frac{i}{K}<\frac{i}{K-1}\le \gamma <\frac{i+1}{K},{\theta}_{(\gamma )}^{(-k)}={\theta}_{(i,K-1)}$ pr(θ
_{(i,K−1)}< θ_{k}≤ θ_{(i,K)}|**Y**) = 0 and pr(θ_{K}_{(i,K)}< θ_{k}≤ θ_{(i−1,K−1)}|**Y**) = 0._{K}

Thus the second term in (18) is zero,

$$\begin{array}{ccc}\text{pr}({P}_{k}\ge \gamma |{\mathbf{Y}}_{\mathbf{K}})\hfill & =\hfill & \mathrm{E}[I({\theta}_{k}\ge {\theta}_{(\gamma )}^{(-k)})|{\mathbf{Y}}_{\mathbf{K}}]=\mathrm{E}[\mathrm{E}[I({\theta}_{k}\ge {\theta}_{(\gamma )}^{(-k)})|{\mathit{\theta}}^{(-k)}]|{\mathbf{Y}}_{\mathbf{K}}]\hfill \\ \hfill & =\hfill & \mathrm{E}[\text{pr}({\theta}_{k}\ge {G}^{-1}(\gamma )|{\mathbf{Y}}_{\mathbf{K}})+{g}_{k}({G}^{-1}(\gamma ))({\theta}_{(\gamma )}^{(-k)}-{G}^{-1}(\gamma ))+{o}_{p}({\theta}_{(\gamma )}^{(-k)}-{G}^{-1}(\gamma ))|{\mathbf{Y}}_{\mathbf{K}}]\hfill \\ \hfill & =\hfill & \text{pr}({\theta}_{k}\ge {G}^{-1}(\gamma )|{Y}_{k})+{g}_{k}({G}^{-1}(\gamma ))\mathrm{E}[({\theta}_{(\gamma )}^{(-k)}-{G}^{-1}(\gamma ))+{o}_{p}({\theta}_{(\gamma )}^{(-k)}-{G}^{-1}(\gamma ))|{\mathbf{Y}}_{\mathbf{K}}]\hfill \end{array}$$

(19)

In (19), ${\theta}_{(\gamma )}^{(-k)}$ is the γ_{th} quantile of non-iid *K* − 1 samples from *K* − 1 posterior distributions. By theorem 5.2.1 of David and Nagaraja (2003) and large sample theorem of order statistics from iid sampling, we have ${\theta}_{(\gamma )}^{(-k)}\to {G}^{-1}(\gamma )$ in probability as *K* goes to ∞. Since we assume that θ* _{k}*|

The generator of ${P}_{k}^{*}(\gamma )$ is:

$$\begin{array}{ccc}\text{pr}({\theta}_{k}\ge {\overline{G}}_{{\mathbf{Y}}_{\mathbf{K}}}^{-1}(\gamma )|{\mathbf{Y}}_{\mathbf{K}})\hfill & =\hfill & \text{pr}({\theta}_{k}\ge {G}^{-1}(\gamma )|{\mathbf{Y}}_{\mathbf{K}})+{g}_{k}({\overline{G}}^{-1}(\gamma ))({\overline{G}}_{{\mathbf{Y}}_{\mathbf{K}}}^{-1}(\gamma )-{G}^{-1}(\gamma ))\hfill \\ \hfill & \hfill & +o({\overline{G}}_{{\mathbf{Y}}_{\mathbf{K}}}^{-1}(\gamma )-{G}^{-1}(\gamma ))\hfill \\ \hfill & =\hfill & \text{pr}({\theta}_{k}\ge {G}^{-1}(\gamma )|{Y}_{k})+{g}_{k}({\overline{G}}^{-1}(\gamma ))({\overline{G}}_{{\mathbf{Y}}_{\mathbf{K}}}^{-1}(\gamma )-{G}^{-1}(\gamma ))\hfill \\ \hfill & \hfill & +o({\overline{G}}_{{\mathbf{Y}}_{\mathbf{K}}}^{-1}(\gamma )-{G}^{-1}(\gamma ))\hfill \end{array}$$

(20)

Since ${\overline{G}}_{{\mathbf{Y}}_{\mathbf{K}}}^{-1}(\gamma )\to {G}^{-1}(\gamma )$ by (19) and (20), |pr(*P _{k}* ≥ γ|

For each function *S*(*P*), there will be an optimal SEL ranking estimator. For instance,

$$S(P)=(aP+b)\ast {\mathrm{I}}_{\{P>\gamma \},}a>0$$

indicates that the reward or penalty is the same for all units below the threshold γ; for units above γ the reward/penalty is linearly related to the rank.

We study the (above γ)/(below γ) classification, but more than two ordinal categories can be of interest. For example, educational institutions might be classified into three categories, the poor, the average and the excellent. The following two *S*(*P*) capture this goal. Let *J* ≥ 3 be the number of ordered categories, then

$$S(P)={\displaystyle \sum _{j=1}^{J}{a}_{j}I(P\le {\gamma}_{j}),0<{\gamma}_{1}<\cdots <{\gamma}_{J},{a}_{j}\ge 0}$$

or

$$S(P)={\displaystyle \sum _{j=1}^{J}{a}_{j}I(P\le {\gamma}_{j})+{a}_{0}P,0<{\gamma}_{1}<\cdots <{\gamma}_{J},{a}_{j}\ge 0.}$$

- Austin PC, Tu JV. Comparing Clinical Data with Administrative Data for Producing Acute Myocardial Infarction Report Cards. Journal of the Royal Statistical Society, Series A: Statistics in Society. 2006;169(1):115–126. 916.
- Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B, Methodological. 1995;57:289–300. 925.
- Christiansen CL, Morris CN. Improving the statistical approach to health care provider profiling. Annals of Internal Medicine. 1997;127:764–768. 916. [PubMed]
- Conlon EM, Louis TA. Addressing Multiple Goals in Evaluating Region-specific Risk using Bayesian methods chapter 3. In: Lawson A, Biggeri A, Böhning D, Lesaffre E, Viel J-F, Bertollini R, editors. Disease Mapping and Risk Assessment for Public Health. Wiley; 1999. pp. 31–47. 916.
- Daniels M, Normand S-LT. Longitudinal profiling of health care units based on continuous and discrete patient outcomes. Biostatistics. 2006;7:1–15. 916. [PMC free article] [PubMed]
- David HA, Nagaraja HN. Order Statistics. third edition. Wiley; 2003. 939.
- Devine OJ, Louis TA. A constrained empirical Bayes estimator for incidence rates in areas with small populations. Statistics in Medicine. 1994;13:1119–1133. 916. [PubMed]
- Devine OJ, Louis TA, Halloran ME. Empirical Bayes estimators for spatially correlated incidence rates. Environmetrics. 1994;5:381–398. 916.
- Diggle PJ, Thomson MC, Christensen OF, Rowlingson B, Obsomer V, Gardon J, Wanji S, Takougang I, Enyong P, Kamgno J, Remme JH, Boussinesq M, Molyneux DH. Technical report, Department of Mathematics and Statistics. Lancaster University; 2006. Spatial modelling and prediction of Loa loa risk: decision making under uncertainty. 916. [PubMed]
- Draper D, Gittoe M. Statistical Analysis of Performance Indicators in UK Higher Education. Journal of the Royal Statistical Society, Series A: Statistics in Society. 2004;167(3):449–474. 916.
- DuMouchel W. Bayesian Data Mining in Large Frequency Tables, With An Application to the FDA Spontaneous Reporting System (with discussion) The American Statistician. 1999;53:177–190. 916.
- Efron B, Tibshirani R, Storey JD, Tusher V. Empirical Bayes analysis of a microarray experiment. Journal of the American Statistical Association. 2001;96(456):1151–1160. 922.
- End-Stage Renal Disease (ESRD) Network. Technical report, Health Care Financing Administration. 2000. 1999 Annual Report: ESRD Clinical Performance Measures Project. 930.
- Gelman A, Price PN. All maps of parameter estimates are misleading. Statistics in Medicine. 1999;18:3221–3234. 916. [PubMed]
- Goldstein H, Spiegelhalter DJ. League tables and their limitations: statistical issues in comparisons of institutional performance (with discussion) Journal of the Royal Statistical Society Series A. 1996;159:385–443. 916.
- Hardy GH, Littlewood JE, Polya G. Inequalitites. 2nd edition. Cambridge University Press; 1967. 921, 935.
- Lacson E, Teng M, Lazarus JM, Lew N, Lowrie EG, Owen WF. Limitations of the facility-specific standardized mortality ratio for profiling health care quality in Dialysis. American Journal of Kidney Diseases. 2001;37:267–275. 930. [PubMed]
- Laird NM, Louis TA. Empirical Bayes ranking methods. Journal of Educational Statistics. 1989;14:29–46. 918.
- Landrum MB, Bronskill SE, Normand S-LT. Analytic methods for constructing cross-sectional profiles of health care providers. Health Services and Outcomes Research Methodology. 2000;1:23–48. 916.
- Landrum MB, Normand S-LT, Rosenheck RA. Selection of Related Multivariate Means: Monitoring Psychiatric Care in the Department of Veterans Affairs. Journal of the American Statistical Association. 2003;98(461):7–16. 916.
- Lin R, Louis TA, Paddock SM, Ridgeway G. Ranking of USRDS, provider-specific SMRs from 1998–2001. Technical Report 67, Johns Hopkins University, Dept. of Biostatistics Working Papers. 2004. http://www.bepress.com/jhubiostat/paper67. 931.
- Liu J, Louis TA, Pan W, Ma J, Collins A. Methods for estimating and interpreting provider-specific, standardized mortality ratios. Health Services and Outcomes Research Methodology. 2004;4:135–149. 916, 921, 930, 931, 933. [PMC free article] [PubMed]
- Lockwood JR, Louis TA, McCaffrey DF. Uncertainty in rank estimation: Implications for value-added modeling accountability systems. Journal of Educational and Behavioral Statistics. 2002;27(3):255–270. 916, 918, 930, 933. [PMC free article] [PubMed]
- McCaffrey DF, Lockwood JR, Koretz D, Louis TA, Hamilton L. Models for value-added modeling of teacher effects. Journal of Educational and Behavioral Statistics. 2004;29(1):67–101. 916. [PMC free article] [PubMed]
- McClellan M, Staiger D. The Quality of Health Care Providers. Technical Report 7327, National Bureau of Economic Research, Working Paper. 1999 916.
- Noell GH, Burns JL. Value-added assessment of teacher preparation - An illustration of emerging technology. Journal of Teacher Education. 2006;57(1):37–50. 916.
- Normand S-LT, Glickman ME, Gatsonis CA. Statistical methods for profiling providers of medical care: Issues and applications. Journal of the American Statistical Association. 1997;92:803–814. 916, 921, 923, 932.
- Paddock SM, Ridgeway G, Lin R, Louis TA. Flexible distributions for triple-goal estimates in two-stage hierarchical models. Computational Statistics & Data Analysis. 2006;50/11:3243–3262. 932. [PMC free article] [PubMed]
- Rubin DB, Stuart EA, Zanutto EL. A potential outcomes view of value-added assessment in education. Journal of Educational and Behavioral Statistics. 2004;29(1):103–116. 916.
- Shen W, Louis TA. Triple-goal estimates in two-stage, hierarchical models. Journal of the Royal Statistical Society, Series B. 1998;60:455–471. 916, 917, 918.
- Storey JD. A direct approach to false discovery rates. Journal of the Royal Statistical Society, Series B, Methodological. 2002;64(3):479–498. 925.
- Storey JD. The Positive False Discovery Rate: A Bayesian Interpretation and the q-Value. The Annals of Statistics. 2003;31(6):2013–2035. 925.
- Tekwe CD, Carter RL, Ma CX, Algina J, Lucas ME, Roth J, Ariet M, Fisher T, Resnick MB. An empirical comparison of statistical models for value-added assessment of school performance. Journal of Educational and Behavioral Statistics. 2004;29(1):11–35. 916.
- Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Procedings of National Academy of Sciences. 2001;98(9):5116–5121. 922. [PubMed]
- United States Renal Data System (USRDS) 2005 Annual Data Report: Atlas of end-stage renal disease in the United States. Technical report, Health Care Financing Administration. 2005. 930.
- Wolfe R, Gaylin D, Port F, Held P, Wood C. Using USRDS generated mortality tables to compare local ESRD mortality rates to national rates. Kidney Int. 1992;42(4):991–996. 930. [PubMed]
- Wright DL, Stern HS, Cressie N. Loss functions for estimation of extrema with an application to disease mapping. The Canadian Journal of Statistics. 2003;31(3):251–266. 916.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |