Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2946253

Formats

Article sections

- Abstract
- 1. INTRODUCTION
- 2. METHOD
- 3. DISTRIBUTION OF TEST STATISTICS UNDER THE NULL
- 4.SIMULATIONS
- 5. EXAMPLES
- 6. DISCUSSION
- Supplementary Material
- REFERENCES

Authors

Related links

J Am Stat Assoc. Author manuscript; available in PMC 2010 September 27.

Published in final edited form as:

J Am Stat Assoc. 2010 June 1; 105(490): 612–620.

doi: 10.1198/jasa.2010.tm09386PMCID: PMC2946253

NIHMSID: NIHMS236027

Department of Biostatistics, Vanderbilt University, Nashville, TN 37232.

See other articles in PMC that cite the published article.

We propose a new set of test statistics to examine the association between two ordinal categorical variables *X* and *Y* after adjusting for continuous and/or categorical covariates **Z**. Our approach first fits multinomial (e.g., proportional odds) models of *X* and *Y*, separately, on **Z**. For each subject, we then compute the conditional distributions of *X* and *Y* given **Z**. If there is no relationship between *X* and *Y* after adjusting for **Z**, then these conditional distributions will be independent, and the observed value of (*X, Y*) for a subject is expected to follow the product distribution of these conditional distributions. We consider two simple ways of testing the null of conditional independence, both of which treat *X* and *Y* equally, in the sense that they do not require specifying an outcome and a predictor variable. The first approach adds these product distributions across all subjects to obtain the expected distribution of (*X, Y*) under the null and then contrasts it with the observed unconditional distribution of (*X, Y*). Our second approach computes “residuals” from the two multinomial models and then tests for correlation between these residuals; we define a new individual-level residual for models with ordinal outcomes. We present methods for computing *p*-values using either the empirical or asymptotic distributions of our test statistics. Through simulations, we demonstrate that our test statistics perform well in terms of power and Type I error rate when compared to proportional odds models which treat *X* as either a continuous or categorical predictor. We apply our methods to data from a study of visual impairment in children and to a study of cervical abnormalities in human immunodeficiency virus (HIV)-infected women. Supplemental materials for the article are available online.

Consider the situation where there are two ordinal categorical random variables *X* and *Y*, and we want to examine the association between *X* and *Y* after adjusting for continuous and/or categorical covariates **Z**. This is a common scenario in medical research and social sciences. For example, a recent study collected data on stages of cervical squamous intraepithelial lesions and condom use among human immunodeficiency virus (HIV)-infected women in Zambia (Parham et al. 2006). Cervical dysplastic stage is an ordered categorical variable with four levels according to the commonly used revised Bethesda system: normal, low-grade squamous intraepithelial lesions, high-grade squamous intraepithelial lesions, and squamous cell carcinoma. Condom use was categorized as never, rarely, almost always, and always. Researchers are interested in testing the association between cervix health and condom use after controlling for other variables such as age and CD4 count.

Proportional odds models (also known as ordinal logistic regression) (McCullagh 1980), other cumulative link models (Aitchison and Silvey 1957; Farewell 1982), and continuation ratio models (Läärä and Matthews 1985) are commonly used to examine the association between an ordinal response variable and continuous or categorical predictors. These regression models are useful because they account for the natural ordering of the outcome, but do not treat the outcome as a continuous variable. However, when one of the predictors is ordered categorical, all traditional regression approaches including these for ordered categorical outcomes have to treat the ordinal predictor as either numerical or categorical. The former enforces a linearity assumption and the latter ignores the order information.

When the ordinal predictor is treated as continuous, we assume the effect of moving from level 1 to level 2 is the same as that from level 2 to level 3. Often this assumption is unreasonable. In the previous example, there is little reason to suppose that the effect difference on cervical neoplastic stage between no condom use and rare condom use is the same as that between rare and almost always use. Alternatively, one can assign numbers to the categories or transform the predictor in some manner so that the assigned values reflect a linear relationship with the appropriately transformed expected outcome. The problem with this approach is that such a transformation of the predictor is difficult to choose and may lead to data dredging, ignoring uncertainty in model selection. Splines, a special type of transformation (e.g., Ramsay 1988), have other drawbacks: uncertainty in number and locations of knots, dependence of results on how the categories are coded, nonmonotonic results when a monotonic relationship is expected, and difficulty when there are only three categories.

When the ordinal predictor is treated as discrete, the order information is ignored. When there are many categories, this approach may have low power due to high degrees of freedom. In addition, the effect estimates may be nonmonotonic. Isotonic regression (Barlow et al. 1972) addresses the latter problem by grouping adjacent categories if their relative effect is in reverse of the general trend. However, the categorization is data driven and the results need appropriate adjustment for this source of model selection variability. In addition, the degrees of freedom may still be high, particularly if the original categories already manifest a monotonic relationship.

Another, possibly Bayesian, option can be to perform an analysis using latent variables; the ordinal variables *X* and *Y* can be thought of as some categorization of continuous latent variables, say *U* and *V*. A test of the association between *X* and *Y* conditional on **Z** can then be some test of the correlation between *U* and *V* conditional on **Z**. The primary problem we see with latent variable approaches of this nature is that they require assuming a distribution for the latent variable, which essentially forces one to assign a scale to the ordinal categorical variables.

Some approaches model *X* and *Y* jointly conditional on **Z**. Examples include log-linear, especially linear-by-linear association, models (Goodman 1979; Agresti 2002) and generalized Cochran–Mantel–Haenszel test (Mantel 1963). However, these methods require assigning numeric values to the ordinal categories, and their implementation also requires grouping continuous or multivariable **Z** into strata, generating arbitrary cutoffs on **Z** and losing information.

Other approaches for measuring conditional associations of ordinal categorical data have built on classic two-variable statistics such as Kendall’s concordance–discordance statistic tau, Goodman and Kruskal’s gamma (Goodman and Kruskal 1954), and Spearman’s rank-based correlation (Agresti 2002). Kendall’s partial tau (Kendall 1948) and an extension of Kendall’s partial tau to multivariable **Z** (Hawkes 1971) are a few examples, although the usefulness of these approaches is questionable given that the expectation of these statistics is generally not zero under the null hypothesis that *X* and *Y* are independent conditional on **Z** (Goodman 1959; Agresti 1977; Schemper 1991). Other proposed methods involve stratifying data according to **Z** and computing weighted averages of stratum-specific measures of association between *X* and *Y* (Torgerson 1956; Davis 1967; Agresti 1977). Because these techniques need multiple observations per stratum, these approaches again require collapsing continuous or multivariable **Z** into strata.

Our method is motivated by the following observation. If *X* and *Y* are continuous variables, we can carry out linear regression *Y* = *β*_{0} + *β*_{1}*X* + *γ*_{1}*Z*_{1} + + *γ _{k}Z_{k}* +

Similar to the linear regression setting, our approach is to fit *Y* and *X* on **Z** separately using multinomial models (e.g., proportional odds models), obtain the conditional distributions *Y*|**Z** and *Y*|**Z** for each subject, and use this information to construct test statistics. In Sections 2 and 3 we describe our method. In Section 4 we investigate the performance of our method through simulations. In Section 5 we apply our method to two datasets: one on the association between anisometropia and amblyopia and the other on the association between cervical neoplastic stage and condom use. We discuss our results in Section 6.

Let *Y* and *X* be two ordinal variables with *s* and *t* categories, respectively. The categories are denoted as 1^{Y}, 2* ^{Y}*, …,

$${H}_{0}:P(Y,X\mid \mathbf{Z})=P(Y\mid \mathbf{Z})P(X\mid \mathbf{Z}),$$

that is, conditional on **Z**, *Y* and *X* are independent.

As described in the Introduction, we first carry out a multinomial regression analysis of *Y* on **Z**, and a multinomial regression analysis of *X* on **Z**. A commonly used multinomial regression model for ordinal outcomes is the proportional odds model (McCullagh 1980). Other possible models include cumulative link models (Aitchison and Silvey 1957; Farewell 1982) and continuation ratio models for ordinal outcomes (Läärä and Matthews 1985), and multinomial logit models for categorical outcomes (Mantel 1966). One can choose different multinomial models for the two ordinal variables.

For each subject, the regression analyses of *Y* and *X* on **Z** will provide conditional distributions of *Y* and *X* given **Z**. If there is no relationship between *Y* and *X* after adjusting for the covariates (i.e., under the null), these two conditional distributions will be independent, and the observed value of (*Y, X*) for the subject is expected to follow the product distribution of these two conditional distributions. We consider two ways of testing the null. The first approach adds these product distributions across all subjects to obtain the expected marginal distribution of (*Y, X*) under the null of conditional independence and then contrasts this distribution with the observed marginal distribution for (*Y, X*) (i.e., not conditional on **Z**). Our second approach computes “residuals” from the two multinomial regression models and then tests for correlation between these residuals. We define a new individual-level residual for models with ordinal outcomes. We will describe these two approaches separately.

In this approach, we compare the observed joint distribution between *Y* and *X* with their expected distribution under the null. In general, the joint distribution between *Y* and *X*, *P* = *P(Y, X*) ={*π _{jl}*}, can be written as

$$P(Y,X)={\int}_{z}P(Y,X\mid \mathbf{Z})dP\left(\mathbf{Z}\right).$$

(1)

Under the null of conditional independence, the joint distribution between *Y* and *X* can also be written as

$${P}_{0}(Y,X)={\int}_{z}P(Y\mid \mathbf{Z})P(X\mid \mathbf{Z})dP\left(\mathbf{Z}\right).$$

(2)

Let ${P}_{0}={P}_{0}(Y,X)=\left\{{\pi}_{jl}^{0}\right\}$. Then under the null, *P* = *P*_{0}.

Assume (*X _{i}, Y_{i}*,

To estimate *P*_{0}, we fit separate multinomial models for *P(Y*|**Z**) and *P(X*|**Z**) to estimate the probabilities ${p}_{i}^{j}$ and ${q}_{i}^{l}$, denoted as ${\widehat{p}}_{i}^{j}$ and ${\widehat{q}}_{i}^{l}$. The distribution of **Z** is estimated by its empirical distribution. Plugging these estimates into Equation (2), we obtain the estimate ${\widehat{P}}_{0}=\left\{{\widehat{\pi}}_{jl}^{0}\right\}$, where ${\widehat{\pi}}_{jl}^{0}=\frac{1}{n}{\sum}_{i}{\widehat{p}}_{i}^{j}{\widehat{q}}_{i}^{l}$. Without assuming the null hypothesis, *P* can be estimated empirically as $\widehat{P}=\left\{{\widehat{\pi}}_{jl}\right\}$, where $\left\{{\widehat{\pi}}_{jl}\right\}={n}_{jl}\u2215n$ and *n _{jl}* is the number of subjects with

We then summarize the observed and expected distributions separately. This can be achieved by calculating Goodman and Kruskal’s gamma (Goodman and Kruskal 1954), which for a two-way probability distribution *P* ={*π _{jl}*} is $\Gamma \left(P\right)=\frac{C-D}{C+D}$, where

Note that our test statistic accounts for the order information in *Y* and *X*, whereas a direct goodness-of-fit approach comparing the observed and expected counts using a statistic in the form of ∑_{j,l}(Observed − Expected)^{2}/Expected, ignores the order information in *Y* and *X*.

Another way of constructing test statistics is to mimic the residual-based linear regression analysis described in the Introduction. We wish to calculate “residuals” for multinomial models of ordinal outcomes and test if they correlate. For ordered categorical outcome variables, however, we are not aware of a standard way of calculating individual-level residuals. We first define individual-level residuals for ordinal outcome variables and then present test statistics that are based on these residuals.

In linear regression, to calculate the residual for a subject with outcome *Y* = *y* and input **Z** = **z**, we first obtain the fitted value of the outcome variable given **z**, $\widehat{y}=\mathrm{E}(Y\mid \mathbf{z})$, and then calculate the residual as $y-\widehat{y}$. However, for an ordered categorical outcome variable, we cannot calculate its “fitted” value and need to re-think the derivation of residuals. It should be noted that in addition to providing the fitted value described previously, a linear regression model also gives an estimated distribution of possible outcome values given **z**, say *Y _{fit}* ~

Assume a multinomial model for *P(Y*|**Z**) with model parameters * θ^{Y}*. Let

When the models for *P(Y*|**Z**) and *P(X*|**Z**) are correct, we have $\mathrm{E}({p}_{i,\mathit{high}}\mid {\mathbf{Z}}_{i})=\mathrm{E}({\gamma}_{i}^{{Y}_{i}-1}\mid {\mathbf{Z}}_{i})={\sum}_{j=1}^{s}{p}_{i}^{j}{\gamma}_{i}^{j-1}={\sum}_{{j}_{1}<{j}_{2}}{p}_{i}^{{j}_{1}}{p}_{i}^{{j}_{2}}$, and similarly $\mathrm{E}({p}_{i,\mathit{low}}\mid {\mathbf{Z}}_{i})={\sum}_{{j}_{1}<{j}_{2}}{p}_{i}^{{j}_{1}}{p}_{i}^{{j}_{2}}$. Therefore, E(*Y _{i,res}*|

$$\begin{array}{cc}\hfill \mathrm{E}\left({Y}_{i,\mathit{res}}{X}_{i,\mathit{res}}\right)=& \mathrm{E}\left[\mathrm{E}\right({Y}_{i,\mathit{res}}{X}_{i,\mathit{res}}\mid {\mathbf{Z}}_{i}\left)\right]\hfill \\ \hfill =& \mathrm{E}\left[\mathrm{E}\right({Y}_{i,\mathit{res}}\mid {\mathbf{Z}}_{i}\left)\mathrm{E}\right({X}_{i,\mathit{res}}\mid {\mathbf{Z}}_{i}\left)\right]=0,\hfill \end{array}$$

we have cov(*Y _{i,res}, X_{i,res}*) = E(

Once the residuals for models for *P(Y*|**Z**) and *P(X*|**Z**) are calculated, the sample correlation coefficent between *Y _{i,res}* and

A variation of the previous approach is to compare the observed value of (*Y _{i}, X_{i}*) for subject

We present two approaches to obtaining *p*-values for our test statistics. One is based on empirical distributions generated under the null, the other is based on asymptotic distributions derived from estimating equations.

Let *T* be one of the three test statistics described in the last section. To generate an empirical distribution of *T*, we simulate replicate datasets under the null. To simulate a replicate dataset, we randomly generate one observation from the product distribution ${\left\{{\widehat{p}}_{i}^{j}{\widehat{q}}_{i}^{l}\right\}}_{j,l}$; this is done for *i* = 1,…, *n*. Then we carry out the entire estimating procedure for the replicate dataset to obtain the corresponding statistic, denoted as *T** (i.e., fit separate multinomial models, obtain predicted probabilities, and calculate test statistic). This is then repeated many, say *N ^{emp}*, times, to get an empirical distribution of

$$\#(\mid {T}^{\ast}\mid \ge \mid T\mid )\u2215{N}^{\mathit{emp}}$$

or

$$2\times \mathrm{min}\left\{\#\right({T}^{\ast}\ge T),\#({T}^{\ast}\le T\left)\right\}\u2215{N}^{\mathit{emp}}.$$

From our simulations the results are almost the same for these two *p*-values, so we will present only the first. This procedure is essentially a parametric bootstrap procedure (Efron and Tibshirani 1993).

An alternative approach to computing the *p*-value is to use the asymptotic distributions of our test statistics under the null hypothesis. In general, we define a vector of parameters * θ* of length

$$\sqrt{n}(\widehat{\theta}-\theta ){\to}_{d}\mathrm{N}(0,V(\theta \left)\right),$$

where *V( θ*) =

$$\sqrt{n}\left[g\right(\widehat{\theta})-g(\theta \left){\to}_{d}\mathrm{N}\right(0,{\sigma}^{2}),$$

where ${\sigma}^{2}=\left[\frac{\partial}{\partial \theta}g\right(\theta \left)\right]V\left(\theta \right){\left[\frac{\partial}{\partial \theta}g\right(\theta \left)\right]}^{\prime}$. To estimate *σ*^{2}, we estimate *A( θ*) as $\frac{1}{n}{\sum}_{i}[-\frac{\partial}{\partial \theta}{\Psi}_{i}(\widehat{\theta}\left)\right]$,

We now define * θ, Ψ_{i}(θ*), and

$${\Phi}_{i}\left(\theta \right)=\{\begin{array}{c}\frac{\partial}{\partial {\theta}^{Y}}{l}_{Y}({Y}_{i},{\mathbf{Z}}_{i};{\theta}^{Y}),\hfill \\ \frac{\partial}{\partial {\theta}^{X}}{l}_{X}({X}_{i},{\mathbf{Z}}_{i};{\theta}^{X}),\hfill \end{array}\phantom{\}}$$

where *l _{Y}* and

For ${T}_{1}=\Gamma \left(\widehat{P}\right)-\Gamma \left({\widehat{P}}_{0}\right)$, we define * θ^{T}* = (

$$\psi ({Y}_{i},{X}_{i},{\mathbf{Z}}_{i};{\theta}^{T})=\{\begin{array}{c}{I}_{\{{Y}_{i}=1,{X}_{i}=1\}}-{\pi}_{11},\hfill \\ \vdots \hfill \\ {I}_{\{{Y}_{i}=s,{X}_{i}=t-1\}}-{\pi}_{s,t-1},\hfill \end{array}\phantom{\}}$$

where *I _{a}* is the indicator function of event

For *T*_{2}, the sample correlation between residuals, * θ^{T}* = (

$$\psi ({Y}_{i},{X}_{i},{\mathbf{Z}}_{i};\theta )=\{\begin{array}{c}{Y}_{i,\mathit{res}}-{w}_{1},\hfill \\ {X}_{i,\mathit{res}}-{w}_{2},\hfill \\ {Y}_{i,\mathit{res}}{X}_{i,\mathit{res}}-{w}_{3},\hfill \\ {Y}_{i,\mathit{res}}^{2}-{w}_{4},\hfill \\ {X}_{i,\mathit{res}}^{2}-{w}_{5}.\hfill \end{array}\phantom{\}}$$

By definition, E[*ψ(Y _{i}, X_{i}*,

For ${T}_{3}=\frac{1}{n}{\sum}_{i}({\widehat{C}}_{i}-{\widehat{D}}_{i})$, *θ*^{T} is a single parameter *θ ^{T}* = E(

$$\psi ({Y}_{i},{X}_{i},{\mathbf{Z}}_{i};\theta )=({C}_{i}-{D}_{i})-{\theta}^{T}.$$

By definition, E[*ψ*(*Y _{i}, X_{i}*,

We carried out simulations to investigate the performance of our method and to compare it with four other approaches: (a) proportional odds model with *X* coded as a continuous variable,

$$P(Y\le j\mid \mathbf{Z})={\left[1+\mathrm{exp}(-({\alpha}_{j}^{Y}+\mathbf{Z}{\beta}^{Y}+\eta X\left)\right)\right]}^{-1},$$

and testing if *η* = 0; (b) proportional odds model with *X* coded as a categorical variable (using dummy variables),

$$P(Y\le j\mid \mathbf{Z})={\left[1+\mathrm{exp}\left(-\left({\alpha}_{j}^{Y}+\mathbf{Z}{\beta}^{Y}+{\eta}_{2}{I}_{\{X=2\}}+\cdots +{\eta}_{t}{I}_{\{X=t\}}\right)\right)\right]}^{-1},$$

and testing if *η*_{2} = = *η _{t}* =0; (c) isotonic proportional odds which

We investigated the performance of these approaches under multiple simulation scenarios. The first scenario was under the null to investigate Type I error rate. Next, we constructed various alternatives in manners so that different modeling assumptions would be favored. In our first alternative scenario, we generated data such that the effect of the ordinal categories (1^{X}, …, *t ^{X}*) was linear, in the sense that had we simply done an analysis treating

The specifics of our four data generating scenarios are as follows: We first generated a covariate *Z* using the standard normal distribution. Then we generated *X* with five categories using the proportional odds model

$$P(X\le l\mid \mathbf{Z})={\left[1+\mathrm{exp}(-({\alpha}_{l}^{X}+{\beta}^{X}Z\left)\right)\right]}^{-1},$$

with ${\alpha}^{X}=({\alpha}_{1}^{X},\dots ,{\alpha}_{4}^{X})=(-1,0,1,2)$ and *β ^{X}* = 1. The outcome variable

$$P(Y\le j\mid Z)={\left[1+\mathrm{exp}\left(-\left({\alpha}_{j}^{Y}+{\beta}^{Y}Z+{\eta}_{1}{I}_{\{X=1\}}+\cdots +{\eta}_{5}{I}_{\{X=5\}}\right)\right)\right]}^{-1},$$

with ${\alpha}^{Y}=({\alpha}_{1}^{Y},{\alpha}_{2}^{Y},{\alpha}_{3}^{Y})=(-1,0,1)$, *β ^{Y}* = −0.5, and

= (0, 0, 0, 0, 0) (the null).**η**= (−0.4, −0.2, 0, 0.2, 0.4) (linear effect).**η**= (−0.30, 0.18, 0.20, 0.22, 0.24) (monotonic nonlinear effect).**η**= (−0.2, 0, 0.2, 0, −0.2) (nonmonotonic effect).**η**

For each simulation scenario, we generated 10,000 datasets, each consisting of 500 subjects. To obtain *p*-values using the empirical distribution of our test statistics, for each dataset we generated 1000 replicates. For all simulation scenarios, the null was rejected if the two-sided *p*-value was less than 0.05.

Simulation results are summarized in Table 1. Under the null hypothesis, the Type I error rate was at the nominal 5% level for all analysis methods except isotonic regression; to achieve the nominal level using isotonic regression we would need to account for model-selection using some sort of re-sampling procedure. Under the linear scenario, the highest power was obtained under the properly specified model with *X* treated as a continuous variable. Also as expected, power was lower when *X* was treated as a continuous variable using splines, or as a categorical variable with or without isotonic regression; this lower power was expected because of the additional degrees of freedom employed by each of these methods. Somewhat surprising was the minimal loss in power seen using any of our new test statistics, ranging around 85%–86%, just below the 87% power seen in the properly specified model.

Under the nonlinear monotonic simulations, our methods had better power than simple analyses treating *X* as continuous or categorical. The most powerful approach expanded *X* using restricted cubic splines; power was similar between our method and isotonic regression (although isotonic regression had inflated Type I error rate, so presumably its power would be smaller had we accounted for model-selection uncertainty). Finally, when the true relationship between *X* and *Y* conditional on *Z* was nonmonotonic, our test statistics had poor power. This was expected, as our methods assume monotonicity. Splines and treating *X* as categorical had higher power under this simulation scenario. Isotonic regression also had inflated power, although given that isotonic regression assumes monotonicity and the U-shaped effect of our simulation scenario, we believe many of these are false positives.

It is worth noting that power was comparable between all of our test statistics (*T*_{1}, *T*_{2}, and *T*_{3}) regardless of how we computed the *p*-value (empirically or asymptotically). Figure 1 shows the distribution of asymptotic *p*-values under the null for *T*_{1}, *T*_{2}, and *T*_{3}; all three are highly correlated, the residual-based test-statistics (*T*_{2} and *T*_{3}) particularly so.

Comparison of *p*-values of *T*_{1}, *T*_{2}, and *T*_{3} under the null. Top row are all *p*-values on log-scale. Bottom row contains *p*-values between 0 and 0.1. Sample size is 500.

We also evaluated the Type I error rate of our method at smaller sample sizes (*n* = 50, 100). As expected, empirical-based *p*-values were more accurate than their asymptotic counterparts, although both approaches yielded Type I error rates close to the nominal 5% level (Table 2).

We also performed a simulation where the model for *P(X|Z*) was incorrectly specified. Specifically, we generated data as before (*n* = 500) except that *X* was generated with *β ^{X}* = 0, 1, 2, 3 for

Finally, we also performed additional simulations with different numbers of categories for *Y* and *X*, and saw results similar to those reported here.

Amblyopia is a leading cause of acquired monocular visual impairment. Anisometropic amblyopia typically presents later than other types of amblyopia. Because improvement in visual acuity with amblyopia treatment depends on the age at which treatment begins, earlier detection of children with anisometropic amblyopia is desired. In a photoscreening program of preschool children, anisometropia (≥1D difference in refractive power between the eyes in any meridian) was detected on 974 preschool children (Leon et al. 2008). Anisometropia magnitude (difference in spherical equivalent <1D, 1 to <2D, 2to <4D, and ≥4D) was measured along with age and visual acuity, which is used to define amblyopia levels (severe, moderate, mild, and no amblyopia). There is interest in testing the association between anisometropia and amblyopia while adjusting for the effect of age.

Investigators carried out ordinal logistic regression with amblyopia as outcome (*Y*) and anisometropia (*X*) and age (*Z*) as continuous input variables (Leon et al. 2008) and found highly significant association between amblyopia and anisometropia after adjustment for age (*p* < 10^{−20}). Applying the methods described in this article with ordinal logistic regression as models for *P(Y*|*Z*) and *P(X*|*Z*) led to an even smaller *p*-value. However, *p*-value comparisons at this magnitude are essentially meaningless as they lead to the same conclusion. Such significant results are partly due to the large sample size of the study. The comparison between statistical methods will be more meaningful if the sample size was smaller. Thus, we generated 50 datasets of 50 subjects randomly selected from the 974 children, and compared the results of our method and the method used by the investigators (Figure 2). Computing *p*-values based on the asymptotic distribution of the test statistic, *T*_{1} yielded smaller *p*-values than the method used in the original study analysis. However, when *p*-values were calculated based on empirical distributions, they were similar to the *p*-values of the method used in the original analysis. This is consistent with our simulation results with *n* = 50 where the asymptotic *p*-values for *T*_{1} had slightly inflated Type I error rates. Results were similar for *T*_{2}. In contrast, *p*-values based on the asymptotic distribution of *T*_{3} tended to be slightly larger than their empirical counterparts, also consistent with our simulation results for *n* = 50 (Table 2).

Results of amblyopia data analysis. The *x*-axis is the *p*-value based on ordinal logistic regression with anisometropia treated as a continuous variable. The *y*-axisisthe *p*-value using *T*_{1}. The left and right plots contain *p*-values based on the asymptotic **...**

The residuals of our method using all subjects are plotted in Figure 3. Note that the four levels of anisometropia are still well separated in the residual plot, while the four levels of amblyopia are overlapping. This indicates that age has a relatively weaker association with anisometropia than with amblyopia.

Cervical specimens for 150 nonpregnant HIV-infected women in Lusaka, Zambia were collected (Parham et al. 2006). Based on cytological analysis, 36 specimens were categorized as normal, 35 as low-grade squamous intraepithelial lesions, 49 as high-grade squamous intraepithelial lesions, and 30 as squamous cell carcinoma. The women also reported condom use: 53 women reported never using condoms, 60 rarely, 13 almost always, and 24 always. Researchers were interested in testing for an association between cervical stage and condom use. Kendall’s rank correlation tau was estimated as 0.03 (using R function cor.test with argument method = “kendall”), which was not statistically different from zero (*p*-value = 0.64). However, CD4 T-cell count, age, education, and marital status have been linked to cervical abnormalities and may be associated with condom use, so researchers wanted to test for an association between cervical stage and reported condom use after adjusting for these variables. Our method was applied resulting in *p*-values ranging from 0.76 to 0.91, depending on the test statistic employed, indicating that there was insufficient evidence to conclude that cytological abnormalities were associated with condom use after adjusting for CD4 T-cell count, age, education, and marital status. For the sake of comparison, treating condom use as a continuous variable (1, 2, 3, or 4) and putting it in an ordinal logistic model assuming linearity and expanding using restricted cubic splines with 3 knots yielded *p*-values of 0.52 and 0.48, respectively. When condom use was treated as a categorical variable, the *p*-value was 0.66. The residuals of our method are plotted in Figure 3.

We have developed a new method for testing for associations between two ordinal categorical variables while adjusting for other continuous or categorical variables. In our approach, we separately fit the two ordinal variables on the other covariates using multinomial models such as ordinal logistic regression and then built test statistics based on the predicted probability distributions for these two variables. For our three test statistics, we described approaches to calculate *p*-values based on either empirical or asymptotic distributions. Our methods are simple to implement and simulations showed our new tests are powerful to detect monotonic associations between two ordinal variables while appropriately adjusting for the effects of other covariates.

In the process of constructing test statistics, we defined a new concept of residual for ordinal outcome variables. This residual can be calculated for any multinomial regression model as long as the outcome variable is ordinal. In our definition of residuals, we assigned scores +1, −1, and 0, reflecting the direction of comparison between the observed and expected outcomes; positive (negative) residuals imply the observed level is higher (lower) than the expected. Our residuals are consistent with concordance–discordance statistics such as Kendall’s tau and Goodman and Kruskal’s gamma, which similarly compare the direction between observations, but make no assumption regarding the magnitude of the distance between ordered categories. Our definition results in one residual per subject, which is, therefore, useful for constructing test statistics. Our residuals may also be useful in other ways (e.g., diagnostics), which we are currently studying.

Other types of residuals, such as deviance and Pearson residuals, were defined for logistic regression (Agresti 2002), and they can be extended to multinomial outcomes. However, deviance residuals ignore the order information, and Pearson residuals result in multiple values for each subject when there are more than two levels. Hence, the utility of these residuals for our purposes is not readily apparent. For proportional odds models, McCullagh (1980) described a different concept of residual. However, this residual is defined for each level of the multinomial outcome variable and is “always positive and thus does not indicate the direction of departure of the observed values from the fitted values.”

Our method has some features which may be undesirable in certain scenarios. First, if one of the two ordinal variables can be designated as the outcome variable, a traditional regression analysis with the other variable as a predictor can model interactive effects between the predictor variable and the covariates. In our method, both ordinal variables are treated equally, avoiding the need to pick one as the outcome, but we therefore assume no interaction effects exist between the predictor and other covariates. Second, our method requires explicit modeling of the relationship between *Y* and **Z** and between *X* and **Z**. The consistency of our results, therefore, depends on correct specification of these models. It should be recognized that our method is applicable for any (even different) multinomial regression models of *Y* on **Z** and *X* on **Z**. And our limited simulations suggest results are fairly robust to misspecifications of these models, although this warrants further investigation. Note that a single regression analysis of *Y* on *X* and **Z** does not require modeling the relationship between *X* and **Z**, but requires explicit specification of the pattern of effects of *X* and **Z** on *Y*. Third, our approach is designed for testing the relationship between two ordinal variables while adjusting for other covariates. When the primary interest is the relationship between an ordinal outcome and a continous or categorical variable while adjusting for an ordinal covariate, our methods may not be useful.

Although our presentation focused on hypothesis testing, our test statistics are to some extent interpretable and thus may be used to measure the magnitude of association between *Y* and *X* conditional on **Z**. The test statistic *T*_{1} captures the discrepancy in gamma (the difference in probability of concordant and discordant random pairs) between the observed distribution for (*Y, X*) and the expected distribution under the null. The test statistic *T*_{2} is the correlation coefficient between the residuals of models *Y*|**Z** and *X*|**Z**.

When an ordinal variable has only two levels, it is a binary variable. For a binary outcome variable, logistic regression is often used for association testing. For a binary input variable, treating it as continuous or categorical will result in the same model. It is interesting to evaluate how our method performs when *Y* or *X* or both are binary variables. If *X* is binary, our simulations showed that our approach yields similar results to ordinal logistic regression with *X* treated as a categorical variable (data not shown). If *Y* is binary, our approach yields results consistent with ordinal logistic regression treating *X* as the outcome variable and *Y* as a categorical predictor. Finally, if both *Y* and *X* are binary, our approach yields results consistent with logistic regression with *X* as a dichotomous predictor.

Another possible direction of research is to extend the weighted average of stratum-specific association measures described in the Introduction to continuous or multivariable **Z**. The weighted average approach currently requires grouping a continuous or multivariable **Z** into discrete categories, and computing concordance and discordance only for subject pairs falling in the same category. An alternative is to score concordance and discordance for all subject pairs, but to weight the scores according to the similarity in **Z** between subjects. We are working to evaluate the performance of this approach.

Finally, our methods suggest a potential solution to the general problem in regression when the input variable of interest is ordinal. As we stated in the Introduction, in any regression analysis, ordinal predictor variables have to be treated as continuous or categorical variables, imposing a linearity assumption in the former and ignoring order information in the latter. We defined individual-level residuals for ordinal variables and developed a residual-based method for testing correlation between ordinal *Y* and *X*. We believe a similar approach can be developed when *Y* is another type of variable as long as its individual-level residuals can be calculated. We are also working to evaluate this approach for various variable types for *Y*.

Regression analysis with ordinal input variables has been difficult to deal with appropriately. Our method will be useful for testing for association between two ordinal variables while adjusting for other covariates. Our simulation and data analysis code is available as supplemental materials and also at http://biostat.mc.vanderbilt.edu/OrdinalRegression.

This work was partially supported by the National Institutes of Health grants R01HG004517 (C. L.) and P30AI54999 (B. E. S.). The authors thank Dr. Sean Donahue for providing data on anisometropic amblyopia and Dr. Vikrant Sahasrabuddhe for providing data on cervical neoplastic stage and condom use.

AUTHORS’ NOTE DURING PROOF In our current follow-up work, we found connections between our statistics and traditional ones. When there are no covariates to adjust for, our statistic *T*_{1} is Goodman and Kruskal’s gamma and our statistic *T*_{2} is Spearman’s rank correlation. Proofs will be published with other properties in a separate paper.

SUPPLEMENTAL MATERIALS

**R-functions for data analysis:** Contains R-functions for analyzing data using proportional odds models for *P(Y*|**Z**) and *P(X*|**Z**). COBOT stands for conditional ordinal-by-ordinal test. (COBOT-analysis.r)

**R-functions for simulations:** Contains R code for simulations. (COBOT-simulation.r)

**Other simulation results:** Provides other results not included in article. (ordinal-otherresults.pdf)

- Agresti A. Considerations in Measuring Partial Association for Ordinal Categorical Data. Journal of the American Statistical Association. 1977;72:37–45. [613]
- — — — . Categorical Data Analysis. 2nd ed. Wiley; Hoboken, NJ: 2002. [613,619]
- Aitchison J, Silvey SD. The Generalization of Probit Analysis to the Case of Multiple Responses. Biometrika. 1957;44:131–140. [612,613]
- Barlow RE, Bartholomew DJ, Bremner JM, Brunk HD. Statistical Inference Under Order Restrictions. Wiley; London: 1972. [612]
- Davis JA. A Partial Coefficient for Goodman and Kruskal’s Gamma. Journal of the American Statistical Association. 1967;62:189–193. [613]
- Efron B, Tibshirani RJ. An Introduction to the Bootstrap. Chapman & Hall; London: 1993. [615]
- Farewell VT. A Note on Regression Analysis of Ordinal Data With Variability of Classification. Biometrika. 1982;69:533–538. [612,613]
- Goodman LA. Partial Tests for Partial Taus. Biometrika. 1959;46:425–432. [613]
- — — — Simple Models for the Analysis of Association in Cross-Classifications Having Ordered Categories. Journal of the American Statistical Association. 1979;74:537–552. [613]
- Goodman LA, Kruskal WH. Measures of Association for Cross Classifications. Journal of the American Statistical Association. 1954;49:732–764. [613,614]
- Hawkes RK. The Multivariate Analysis of Ordinal Measures. American Journal of Sociology. 1971;76:908–926. [613]
- Kendall MG. Rank Correlation Methods. Hafner; New York: 1948. [613]
- Läärä E, Matthews JNS. The Equivalence of Two Models for Ordinal Data. Biometrika. 1985;72:206–207. [612,613]
- Leon A, Donahue SP, Morrison DG, Estes RL, Li C. The Age-Dependent Effect of Anisometropia Magnitude on Anisometropic Amblyopia Severity. Journal of AAPOS. 2008;12:150–156. [617] [PubMed]
- Mantel N. Chi-Square Tests With One Degree of Freedom: Extensions of the Mantel-Haenszel Procedure. Journal of the American Statistical Association. 1963;58:690–700. [613]
- — — — Models for Complex Contigency Tables and Polychotomous Dosage Response Curve. Biometrics. 1966;22:83–95. [613] [PubMed]
- McCullagh P. Regression Models for Ordinal Data. Journal of the Royal Statistical Society, Ser. B. 1980;42:109–142. [612,613,619]
- Mosteller F, Tukey JW. Data Analysis and Regression. Addition-Wesley; Reading, MA: 1977. [613]
- Parham GP, Sahasrabuddhe VV, Mwanahamuntu MH, Shepherd BE, Hicks ML, Stringer EM, Vermund SH. Prevalence and Predictors of Squamous Intraepithelial Lesions of the Cervix in HIV-Infected Women in Lusaka, Zambia. Gynecologic Oncology. 2006;103:1017–1022. [612,619] [PMC free article] [PubMed]
- Ramsay JO. Monotone Regression Splines in Action. Statistical Science. 1988;3:425–441. [612]
- Rao CR. Linear Statistical Inference and Its Applications. 2nd ed. Wiley; New York: 1973. [613]
- Schemper M. Non-Parametric Partial Association Revisited. The Statistician. 1991;40:73–76. [613]
- Stefanski LA, Boos DD. The Calculus of M-Estimation. The American Statistician. 2002;56:29–38. [615]
- Torgerson WS. A Non-Parametric Test of Correlation Using Rank Orders Within Subgroups. Psychometrika. 1956;21:145–152. [613]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |