In this section, we propose two MTMs for longitudinal ordinal data, a simple extension of the MTM (OMTM) and an improved extension of the MTM (IOMTM).

Let *Y*_{i} = (*Y*_{i}_{1}*,* …*, Y*_{ini}) be a vector of longitudinal *K*-category ordinal responses on subject *i* = 1, …, *N* at times *t* = 1, …, *n*_{i} (*n*_{i} ≤ T ). We assume that associated exogenous but possibly time-varying covariates, *x*_{it} = (*x*_{it}_{1}, …, *x*_{itr} ), are recorded for each subject at each time and that the regression model properly specifies the full covariate conditional probability such that *P*(*Y*_{it} = *y*_{it} | X_{it} ) = *P* (*Y*_{it} = *y*_{it} | X_{i}_{1}, …, *X*_{iT} ).

The *simple extension of MTM of order p*, an OMTM(*p*), is specified using the following two regressions,

where

*β* is the

*r ×* 1 vector of regression coefficients,

*β*_{01} <

*β*_{02} < … <

*β*_{0}_{K}_{−1},

,

*z*_{it} is a

*q* × 1(subset of

*x*_{it}),

,

*k* = 1, …,

*K* − 1, and

*Y*_{i,t}_{−}_{m,l} is the set of

*K* − 1 indicators with

*Y*_{i,t}_{−}_{m,l} = 1 if

*Y*_{i,t}_{−}_{m} =

*l; Y*_{i,t}_{−}_{m,l} = 0 otherwise for

*l* = 1, …,

*K* − 1;

*m* = 1, …,

*p*. Model (

3) is a cumulative logit model, a common model for ordinal data. However, the link function in the conditional probabilities (

4) is a multinomial logit because we expect the

*K* − 1 conditional probabilities to have different coefficients

depending on

*k* and if cumulative logits were used as in the marginal probabilities (

3), the monotonicity property of the cumulative conditional probabilities would be difficult to satisfy as it involves

*Δ*_{itk}.

The dependence model (

4) based on the multinomial logit link function models the probability of moving from category

*k* at time

*t* to category

*k*′ at time

*t* + 1 under

*p* = 1. This is a natural and interpretable way to model the dependence for longitudinal data. An alternative approach would be to model dependence using GORs (

Dale, 1986;

Heagerty and Zeger, 1996) as discussed in the introduction, which also allows the interpretation of marginal mean parameters to be invariant with respect to specification of the dependence and allows for parsimonious specification of dependence. However, GOR does not directly represent the change over time for a subject, unlike the OMTM, and that is why we choose to specify the dependence using a Markov transition structure.

Because this model is a straightforward extension of the MTM for binary data, it shares many of the same advantages including having the attractive characterization of serial dependence that a transition model provides combined with a marginal regression structure. However, it does have some drawbacks. First, there are a lot of dependence parameters. For example, if

*p* = 2,

*K* = 4, and

*q* = 1 (

*z*_{it} = 1), we need

*p ×* (

*K* − 1)

^{2} *× q* = 18 dependence parameters. Second, the ordering of the categories is not exploited in the conditional probabilities (

4). For these two reasons, we propose a modified version in which (

4) is replaced with

where

*c*(·) is a smooth function of

*y*_{it}_{−1}, …,

*y*_{it}_{−}_{p} parameterized by

*α*^{(}^{k}^{)}. For our methods here, we chose an additive structure for

*c*(·;

*α*^{(}^{k}^{)}), i.e.,

*c*(

*y*_{it}_{−1};

*α* ^{(}^{k}^{)}) +

*c*(

*y*_{it}_{−2};

*α*^{(}^{k}^{)}) + … +

*c*(

*y*_{it}_{−}_{p};

*α*^{(k)}) and quadratic polynomials for each

*c*(

*y*_{it}_{−}_{l};

*α*^{(k)}):

*l* = 1, …,

*p*. The quadratic polynomials should be adequate to explain the smooth change of the conditional probabilities and we hypothesize that there is no dramatic change in conditional probabilities that would necessitate higher-order polynomials or other smooth functions; we discuss a more general specification in the Discussion. Implicit in this specification is the “scoring” of the response in the dependence model, which is not an issue in the (proportional odds) mean models. For example, a quadratic model could be made more reasonable for a data set by changing the values of the responses (but not changing their order). We replace (

5) by

where

and

. We call the model given by

equations (3) and

(6) an

*improved extension of MTM of order p*, an IOMTM(

*p*). The IOMTM(

*p*) has fewer dependence parameters than the OMTM(

*p*). For example, an OMTM has 18 dependence parameters for

*K* = 4,

*p* = 2, and

*q* = 1, whereas an IOMTM(2) has

*p ×* 2

*×* (

*K* − 1) = 12 dependence parameters. In general, for an OMTM(

*p*), the number of dependence parameters increases quadratically with

*K*, the number of categories, whereas in an IOMTM, the number increases linearly in

*K*. The IOMTM also exploits the ordering of the categorical responses in the conditional model (

6).

Clearly, for models (

3)–(

6) to form a coherent probability model, the

*Δ*_{itk} will be constrained. Constraints are needed for (

3), (

4), and (

6) to be a valid probability model. Specifically,

*Δ*_{itk} is found using the following identity

In this article, we consider only first- and second-order (I)OMTM’s. Higher-order models are possible but require more dependence parameters and larger sample sizes.

3.1 First-Order Models: OMTM(1) and IOMTM(1)

In this subsection, we provide some details for OMTM(1) and IOMTM(1), which are often sufficient for short time series. The dependence models for marginalized first-order transition models [OMTM(1) and IOMTM(1)] are specified using the following two regressions,

Note that (

9) is only valid for

*t ≥* 2 because there are no history data available for the initial state (

*t* = 1).

For both the IOMTM(1) and OMTM(1), mean and dependence parameters are orthogonal as given in the following theorem and corollary.

Theorem 1 The marginal mean parameters, (*β*_{0}, *β*) are orthogonal to the dependence parameters, *α* in the OMTM(1).

Corollary 1 The marginal mean parameters, (*β*_{0}, *β*), are orthogonal to the dependence parameters, *α* in the IOMTM(1) when c(y; *α*) is a polynomial in y.

Fitzmaurice and Laird (1993) showed consistency of marginal mean parameters regardless of the dependence structure. This was used by

Heagerty (2002) in binary MTM’s. The following theorem gives the equivalent result for ordinal MTM’s.

Theorem 2 The maximum likelihood estimate, (

_{0,} ), remains consistent for (

*β*_{0},

*β*) even if the dependence model is incorrectly specified.

Proofs of these results can be found in the Web Appendix.

3.2 Maximum Likelihood for the First-Order Models

The maximum likelihood algorithm for these models is described next. Define the marginal cumulative probabilities,

for

*i* = 1, …,

*N; t* = 1, …,

*n*_{i} ;

*k* = 1, …,

*K* and conditional probabilities,

for

*i* = 1, …,

*N ; t* = 2, …,

*n*_{i} ;

*k* = 1, …,

*K*. The OMTM(1) likelihood function consists of two components, conditional probabilities for time

*>*1 and marginal probabilities at time 1,

where

. We propose a Fisher-scoring algorithm to find the MLE of the parameters of interest in the OMTM(1). For estimation of the

*K* − 1 intercepts

*β*_{0}_{k}(

*k* = 1, …,

*K* − 1),

*r* covariate coefficients

*β*, and (

*K* − 1)

^{2} dependence parameters

*α*, we rewrite the log likelihood as

where

*R*_{i1k} =

*Y*_{i11} + … +

*Y*_{i1k},

and

*g*(

*a*) = log{1 + exp(

*a*)}. Here we have used the notation from

McCullagh (1980) for the log-likelihood function for the cumulative logit model for the ordinal data, log

*L*^{(1)}.

To evaluate the likelihood, we need to evaluate both the contribution from the initial state, *L*^{(1)}, and the subsequent contribution from each transition probability, *L*^{(2)}. Let *θ* = (*β*_{0}, *β*, *α*) be the vector of parameters. We obtain

The

*K* − 1 +

*r* + (

*K* − 1)

^{2}*q*-dimensional likelihood equations and details on the Fisher-scoring algorithm used to solve the likelihood equations can be found in

Supplementary Materials.

Estimation for the IOMTM(1) is similar.

3.3 Second-Order Models: OMTM(2) and IOMTM(2)

For the initial states, we assumed a marginal model for

*t* = 1, as we did for the first order models, and an OMTM(1) and IOMTM(1) for

*t* = 2. The marginal models for OMTM(2) and IOMTM(2) are the same as (3) and the dependence models for the OMTM(2) and the IOMTM(2) are given by (4) and (6), respectively, with

*p* = 2. Unlike the first-order OMTM and IOMTM, the orthogonality of marginal and dependence parameters no longer holds (i.e., cannot write it in canonical log-linear form). Thus, consistent estimation of the marginal mean parameters requires correct modeling of the dependence (though marginalized models of this form often offer some robustness; see

Heagerty and Kurland, 2001). Details on MLE for the second-order models can be found in

Supplementary Materials.

3.4 Missing Data

When the data are incomplete due to dropout that is associated with observed outcomes (i.e., MAR), the orthogonality of marginal mean and dependence parameters in the (I)OMTM(1) no longer holds. Thus, the consistency of the marginal mean parameters under MAR relies on correctly specifying the dependence structure; otherwise, the score functions for the mean parameters can be biased.

3.5 Simulation Study

We conducted several simulation studies to examine robustness to misspecification of dependence model when there is ignorable (MAR) dropout. We set

*T* = 8 and

*N* = 500 with an approximately equal sample size for each of the three (treatment) groups. We considered two different marginal mean models, (

3), one with an unstructured mean over time with design vector of the form,

, where the 1 is in the

*t*th slot, and one with a linear trend over time,

, where groupj

_{i} is an indicator of whether the subject was in group

*j*. The two different mean models were included to see if any bias in the presence of dropout differed between the unstructured and structured (over time) means.

For each simulation, we considered the following MAR dropout model,

We simulated data using (

9) with

,

*m* = 1, 2, (

*α*_{10}*, α*_{11}*, α*_{20}*, α*_{21}) = (−1.0, 0.5, −0.5, 0.1) and fit the model assuming

.

presents the point estimates and the percent relative bias. Percent relative biases in the coefficients were small, the largest being about 3%. The coefficients of time-varying covariates in the structured mean model appeared to be more robust to misspecification of dependence and MAR missingness than in the unstructured mean model.

| **Table 1**Bias of the IOMTM(1) MLEs when data are MAR. Displayed are the average regression coefficient estimates and the percent relative bias, 100 × ( − β)/β |

We performed several similar simulations using the same models for the marginal means, but simulating data under an IOMTM(2) and then fitting an IOMTM(1) with and without MAR dropout. In the presence of dependence misspecification and MAR dropout, we saw considerable bias, with relative biases as large as 74% for the unstructured mean (see

Supplementary Materials for the results of these simulations). Under dependence misspecification with no dropout, the biases were quite small despite the orthogonality of the mean and dependence parameters being lost in the IOMTM(2). These simulations further emphasize the importance of correctly specifying the dependence in the presence of missing data. We also did simulations (not shown) to examine small sample bias in the MLE’s with complete data. The bias in the mean parameters was negligible. These results can be found in

Lee (2007).