PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Stat Med. Author manuscript; available in PMC 2017 April 30.
Published in final edited form as:
Stat Med. 2016 April 30; 35(9): 1549–1557.
Published online 2016 January 18. doi:  10.1002/sim.6861
PMCID: PMC4821697
NIHMSID: NIHMS750375

A Hidden Markov Model approach to analyze longitudinal ternary outcomes when some observed states are possibly misclassified

Julia S. Benoit, PhD,1,2 Wenyaw Chan, PhD,2 Sheng Luo, PhD,2 Hung-Wen Yeh, PhD,3 and Rachelle Doody, MD PhD4

Summary

Understanding the dynamic disease process is vital in early detection, diagnosis, and measuring progression. Continuous-time Markov chain (CTMC) methods have been used to estimate state change intensities but challenges arise when stages are potentially misclassified. We present an analytical likelihood approach where the hidden state is modeled as a three-state CTMC model allowing for some observed states to be possibly misclassified. Covariate effects of the hidden process and misclassification probabilities of the hidden state are estimated without information from a ‘gold standard’ as comparison. Parameter estimates are obtained using a modified EM algorithm and identifiability of CTMC estimation is addressed. Simulation studies and an application studying Alzheimer Disease caregiver stress levels are presented. The method was highly sensitive to detecting true misclassification and did not falsely identify error in the absence of misclassification. In conclusion, we have developed a robust longitudinal method for analyzing categorical outcome data when classification of disease severity stage is uncertain and the purpose is to study the process’ transition behavior without a gold standard.

Keywords: Longitudinal Data Analysis, Hidden Markov Model, Disease Progression, Misclassification

1. Introduction

Early disease detection is fundamental in improving medical treatment design and intervention aimed at delaying disease progression, subsequently enhancing quality of life and in some cases reducing disease mortality. Unfortunately, disease staging may be subject to misclassification since true events are not directly observable. Proxy variables help explain unobservable phenomena in medical research but introduce biased estimates and can be especially concerning when the misclassified observable outcome is categorical. Solutions for inaccurate continuous outcomes usually include a random effect or the like in longitudinal settings. Remedies for categorical outcomes potentially observed with error continue to be studied. Further challenging is the lack of a “gold standard” for the targeted process (eg. Alzheimer’s disease staging). Multi-state transitional models are useful for quantifying disease staging and focus on the movement from one category where the interest lies in estimating the transition rates or intensities. Transition modeling approaches exist to examine misclassification in longitudinal studies where the outcome is categorical in both continuous and discrete time settings, specifically hidden Markov models. Discrete–time Hidden Markov Models (HMM) to examine misclassification have been considered by several authors ([15]). Focusing on continuous-time setting, two-state (binary outcome) Markov models accounting for misclassification have been studied using Bayesian [67] and classic EM approaches [8], among others [9]. Multi-state HMMs (more than two states) are more complicated and to date a reversible HMM with an analytical solution to simultaneously estimate transition rates and probability of misclassification (or sensitivity) has not been developed. Special cases of the multi-state CTMC approaches have been developed including semi-Markov [10] and irreversible hidden CTMC models [11].

Due to the complexity of the likelihood function of a general multi-state recurrent CTMC, methodology for exact solutions when the target state is observable is in its infancy. Li and Chan [12] developed a likelihood technique to estimate the transition rates of a three-state CTMC with a binary covariate, thus providing comparisons of transition probabilities between states for two groups. The aforementioned likelihood technique was later extended to include multiple covariates and also a practical interpretation of the process with covariates was provided [13, 14]. This research focuses on a possibly misclassified ternary recurrent outcome observed at irregular and varying time intervals among each individual. We propose methodology that accounts for possible misclassification of the outcomes modeled as a CTMC and further generalize the Baum-Welch algorithm to the three-state continuous-time Markov model with covariates.

In circumstances where data is not directly observable, (e.g., misclassified outcomes), unique parameter estimates may be difficult or impossible to identify (i.e. non-identifiable) in the case of ‘blocked’ or ‘hidden’ information. Parameter non-identifiability has been discussed in the literature with regards to HMMs [8, 15, 16], among others and is addressed in this research. The 3-state CTMC with one state subject to misclassification model and its likelihood are described in the following section (section 2). In section 3 we describe the estimation method implementing a modified EM algorithm for parameter estimation and conduct a simulation study to assess its performance. Applications of our method to analyze Alzheimer’s disease (AD) caregiver stress levels are described and results are presented in section 4. We make remarks on identifiability of CTMC estimation in section 5. This paper concludes with a discussion of our findings in section 6.

2. Methods

2.1 Ternary outcome with possible misclassification

Consider a longitudinal study where Zk(tk,1), Zk(tk,2), …, Zk(tk,nk) is a sequence of observed ternary outcomes recorded as 1, 2, or 3 and measured at times tk,1, tk,2, …, tk,nk, on subject k, and nk is the number of observations on subject k. The observed sequence provides information for the ‘hidden’ sequence Yk(tk,1), Yk(tk,2), …, Yk(tk,nk) at time tk,1, tk,2, …, tk,nk for subject k, assumed to be measured with possible misclassification, which will be modeled as a three-state CTMC with state occupancy valued 1, 2 or 3. We assume dependency of the observed state on the state of the ‘hidden’ process solely at matching time points, not on the previous history of either the observed or hidden processes formulated as:

Pr(Zk(tk,s)|Yk(tk,1),,Yk(tk,s),Zk(tk,1),,Zk(tk,s1))=Pr(Zk(tk,s)|Yk(tk,s))=εY(tk,s),Z(tk,s),s=1,nk.
(1)

Furthermore, Equation 1 defines the probability that the observed state correctly classifies the hidden state of the process (or misclassification probability) {e.g., Pr(Zk(tk,s) = j|Yk (tk,s) = i) = εij and when i = j, εii is the probability of correctly classifying (or identifying) the ’hidden’ outcome.} Also, note that εii=1i=1,il3εil.

The three-state CTMC is fully described by the instantaneous transition rates, qij, the rate at which the process transitions from state ‘i’ to state ‘j’, where i, j = 1,2,3 and ij. The infinitesimal matrix R is formed by these parameters:

R=[(q12+q13)q12q13q21(q21+q23)q23q31q32(q31+q32)]
(2)

From the property of a continuous-time Markov chain, the sojourn time or amount of time a process stays in category ‘i’ before exiting follows an exponential distribution with mean (l=1,li3qil)1 and is generally unobservable. At transition time, the probability of transitioning into state ‘j’ given that the process is currently in state ‘i’ is calculated as Qij=qij(l=1,li3qil)1. The transition rates make up the probability mechanism used to derive the probability of transition over a specific interval of time, Pij(t). Explicit algebraic formulas were derived for three scenarios [12] allowing us to write the likelihood function in terms of the transition probabilities, which are functions of the qij parameters to be estimated.

The distribution of the initial observations, denoted as πi = P{Y(0) = i}, where ‘i’ takes on the values 1, 2, or 3, under our model are also assumed indirectly observable as in Equation 1. We transformed the transition rates via a log link function to examine the linear combination of the covariates xr by estimating coefficients βr and each unique intercept parameter αij for each qij, i.e.,

log qij=αij+r=1pβrxr, for ij,i,j=1,2,3 and r=1,,p.
(3)

By the Markov property of the hidden process and the basic probability, we can construct the likelihood function of Zk(tk,1), Zk(tk,2), (...), Zk(tk,nk) as

P(Z(t1)=z(t1),Z(t2)=z(t2),,Z(tnk)=z(tnk))=all possible y'sεy(t1)z(t1)εy(t2)z(t2)εy(tnk)z(tnk)×P(Y(t1)=y(t1))×Py(t1)y(t2)(t2t1)Py(tnk1)y(tnk)(tnktnk1)
(4)

where εy(t1)z(t1) = 1 and Py(tm)y(tm+1) (tm+1tm) represents the likelihood that the hidden process is Y(tm+1) at time tm+1 given that Y(tm) at time tm. The complete likelihood function for all nk observations for all subjects can be written as

L(Θ)=Πk=1m[all y'sP(Y(tk,1)=y(tk,1)){Πi=1nk(εY(tk,i)Z(tk,i)){Πj=1nk1(PY(tk,j)Y(tk,j+1)(tk,j+1tk,j))}}]
(5)

where tk,j = 0, for j = 1, yk(tk,l) denotes the category of the targeted outcome that would have been observed for the kth individual at time tk,l, πl = P(Y(tk,1) = l), l = 1,2,3 are the true initial distribution that will be treated as nuisance parameters although their estimates will be obtained at each iteration in our estimation process for helping performing EM algorithm. Note that Θ = (α12132123313212123)' is the vector of parameters defined prior to Equation 5. Some components of Θ are hidden in probability expression PY(tk,j)Y(tk,j+1) (tk,j+1tk,j).

2.2 Estimation

The expectation of the complete log likelihood given the observed data and the parameter values of the ν th iteration can be expressed as

Q(θ,θ(ν))=E[log(L)|all z's,θ(ν)]=k=1mE[log{all y'sP(Y(tk,1)=y(tk,1)){Πi=1nk(εY(tk,i)Z(tk,i)){Πj=1nk1(PY(tk,j)Y(tk,j+1)(tk,j+1tk,j))}}}|all z's,θ(ν)].
(6)

For calculation reduction during maximization a modified Baum-Welch [17] algorithm is proposed and derived beginning with the following forward-backward variables as

Aj(i)=P(Z(t1)=z(t1),Z(t2)=z(t2),,Z(tj)=z(tj),Y(tj)=i|θ),
(7a)

Bj(i)=P(Z(tj+1)=z(tj+1),Z(tj+2)=z(tj+2),,Z(tnk)=z(tnk)|θ,Y(tj)=i)
(7b)

Equation 7a expresses the forward variable as the probability of the partially observed sequence until time tj and the true state i at time tj and Equation 7b the backward variable as the probability of the partial observation sequence from tj+1 to the end, given the true state i at time tj and the model, respectively. By routine algebra, conditional probabilities in Equation 6 can be expressed as iteratively calculable terms:

P(Y(tj)=w|all z's,θ(ν))=Aj(w)Bj(w)l=13Ank(l)
(8a)

P(Y(tk,j1)=w,Y(tk,j)=l|all z's,θ(ν))=Ak,j1(w)Pwl(tjtj1)εlz(tj)Bk,j(l)i=13Ak,nk(i)
(8b)

, where

A1(i)=πiεi,z(t1)Aj+1(l)=[w=13Aj(w)Pwl(tj+1tj)]εl,z(tj+1),j=1,,nk1
(9a)

Bj(w)=lPwl(tj+1tj)εl,z(tj+1)Bj+1(l),j=1,,nk1, whenj=nk,Bj+1(l)=1 for all l=1,2,3.
(9b)

Note that Equations 8 and 9 provides similar explanation as that of Baum-Welch algorithm. For example, A1(i) represents the probability that the true initial state is i and the initial observed state is z(t1) and Aj+1(i) represents the probability that the true state at tj+1 is i and all the observed states up to time tj+1, and is expressed in Aj(). Similarly, Bj(w) represents the probability that the true state at tj is w and all the observed states from time tj+1, and is expressed in Bj(). Equations 9a and 9b compute numerical weights for the ‘hidden’ probability components used to update Q(θ, θ(ν)) at each iteration, and πi is substituted by [pi]i in the implementation of this EM process.

3. Simulation Study and numeric estimation procedures

To evaluate the performance of the proposed method, a simulation study is conducted where 1,000 data sets were generated with N=1000 individuals. Assuming that transition rates, thus transition time depend on each individual’s covariates, one binary and one continuous, a three-state recurrent CTMC was simulated for each individual in each replicate. We assume the CTMC state is observable only at integer times 0,1,…,10 not at the actual transition times. Conditional on the hidden state of the process at time tl, the observed outcomes follow a multinomial distribution with parameters ε21 and ε23 denoting the probabilities of misclassifying hidden state ‘2’ as observed state ‘1’ or ‘3’. We perform the simulation under two scenarios: 1) assuming misclassification is equally likely (i.e. ε21 = ε23) and denoted as ε ; 2) ε21 ≠ ε23. Thus, when ε21 = ε23 = ε, it should be clear that 1–2ε is the probability of the observed data correctly classifying the hidden state ‘2’. Probability mechanisms imposed and examined in this study were (i) ε21 = 2%, ε23 = 3%; (ii) ε21 = 0%, ε23 = 0%; (iii) ε = 1%, (iv) ε = 5%, and (v) ε = 10% using the proposed hidden Markov model. Noteworthy is that the possibility of misclassification has been restricted to hidden state ‘2’ for two reasons: 1) to ensure our model is identifiable; 2) to relate our proposed model to the real data setting that stage 2 is more likely to be misclassified than the other two states. The EM algorithm was used to update the likelihood for maximization at each iteration via implementation of the modified Baum-Welch algorithm. Additionally data imposed with misclassification mechanism (iii) ε = 1% was estimated naively for comparison. The choice of sample size and number of observations reflect approximately the number of subjects collected in the Alzheimer’s disease study cohort described in Section 4. The number of visits reached up to 14 for some individuals. True process parameter values were chosen as (α12 = 2.9,α13 = 2.7,α21 = 2.5,α23 = 2.2,α31 = 1.8,α32 = 1.1,β1 = 1,β2 = −.5)' and calculated using a subset of data from the aforementioned dataset with disease staging as the outcome.

To improve the computational efficiency, we derived a data based procedure for calculating initial parameter values for the EM estimation. All possible combinations of individual covariates and their transition rate relationship from Equation 3 provided MLEs for qij and were then post-estimated to obtain regression coefficients for each parameter. By the same notion, starting values for misclassification parameters were obtained via post-estimation following our proposed polytomous logistic regression with covariates to predict the true state, and thus the proportion of misclassification from true state 2 was used as the starting parameter. Quasi-Newton optimization was used to find the MLE of the parameter set of the log-likelihood function and then compared to the true parameters using bias and coverage probability measures. For each parameter, empirical means, biases, standard errors, and coverage probabilities for estimation are presented in Table 1a–2b.

For 1,000 replicates, Tables 1a–2a show the bias of all estimated coefficients are negligible and the coverage probabilities are mostly above 90% for our proposed method under both misclassification scenarios (i.e. ε21 = ε23 and ε21 ≠ ε23) and of varying rates (1%, 5%, and 10%). The biases of the misclassification parameter estimates are small and the coverage probabilities range from roughly 75% to 79%. Additionally, coverage probabilities of the estimated misclassification parameter increase with the rate of misclassification (Tables 1a–1b) suggesting that our method can detect sizeable amounts of misclassified outcomes; however, as the misclassification mechanism increases it may be difficult to capture the true parameters. The naïve analysis (Table 1a) with only a 1% imposed error rate yielded larger bias, poor coverage probability (3%–52%) and only 77% estimable data replicates, confirming that when data truly are misclassified using the naïve approach could lead to inaccurate conclusions. The misclassification modeling scenario where ε21 ≠ ε23 performs about the same as when ε21 = ε23 (Tables 1b & 2a) with respect to both estimated coefficients and misclassification rate estimates. Finally Table 2b demonstrates that when categorical outcomes are classified correctly our proposed method does not falsely provide a significant misclassification rate (aka specificity).

We have also conducted (results not presented) simulation studies to test the robustness of heterogeneities in progression rates. Specifically, we switched the direction of the process moving from state 3 to have a higher weight of moving to ‘2’ instead of ‘1’. Additionally, we have conducted a simulation study to assess the method with a small sample size. The results are similar. That means the proposed model is robust to at least some changes of parameters.

4. Longitudinal Alzheimer’s Disease Study-Caregiver Stress Levels

The proposed method’s ability to assess sensitivity of longitudinal outcome data without a “gold standard” is presented here. Data collected from January 1990 to September 2011 were extracted from the Baylor Alzheimer’s Disease and Memory Disorders Center. Patients referred and self-referred to the center with probable AD, defined using criteria from the National Institute of Neurological and Communicative Disorders and Stroke [18], were used in this study. Patients underwent comprehensive evaluation and socio-demographic information such as age, sex and years of education, medical history, and estimates of symptom duration [19] were collected at baseline. Further details regarding the baseline assessment, follow-up, and outcome diagnosis have been described elsewhere [20]. Intervals of time between visits varied among individuals as did the number of visits themselves. Patients were neuropsychologically evaluated at baseline and annually or on an as needed basis for medication management. The Mini-Mental State Examination (MMSE) [21] was among the tests implemented and aids in identifying dementia progression and severity, focusing on memory, attention and language. Scores range from 0 to 30 with lower scores indicating severe dementia. Additionally, the caregiver, a family member or friend spending the most time with the patient, provided information on their health and well-being. In this study, longitudinal self-rated stress levels of each caregiver were modeled as a continuous-time Markov chain with three categories (mild, moderate, and severe). Covariates baseline MMSE and caregiver relationship to the patient (spouse v. other) were examined to better understand the movement between stages of AD patient care giver stress levels. Self-reported stress levels were based on a construct of a caregiver questionnaire. At every time point past baseline, each prior measurement introduces bias to the subsequent measurements. Thus, baseline measures are treated as ‘true’ and all others are subject to misclassification. Patients with complete information on baseline MMSE, relationship to the caregiver, and whose caregivers provided at least two self-rated stress levels (i.e. at least one possible transition) were included in this analysis. Inter-observation time was calculated as the duration between two consecutive observations. This model’s ability to measure transitions over time while accounting for uneven intervals and number of observations allows the inclusion of patients with intermittent and monotone missing data under the assumption of missing completely at random.

A total of 952 patients had at least two caregiver self-rated stress levels and were included in the analysis. The average age of the patient was 74, ranging from 44–93, and the majority (68%) was female. The median number of visits was 3, ranging from 2–14, and baseline stress levels were distributed as 33% mild, 46% moderate, and 21% severe levels of stress. Three analyses were conducted to reflect the three models presented in Section 3. Note that for the proposed Model I we constrained q13, q31 = 0 due to the relative few transitions taking place from 1 (‘mild’) to 3 (‘severe’) and from 3 to 1 between two visits.

Table 3 displays the parameter estimates obtained from the two proposed methods and from the naïve method that ignores the possibility of incorrectly observing a 1 or 3. The main finding in this analysis is the significance of the misclassification parameters and the variations of the intensity parameter estimates between the proposed and naïve methods. The overall estimate of correctly classified data differs between proposed methods M1 and M2 (M1: 1-.05-.10; M2: 1–2*.05), by about 5% (85% and 90%, respectively). When using all three approaches to estimate the movement between stages of AD patient care giver stress levels, parameters reflecting movement from state ‘1’ were not significant. After adjusting for patient relationship and baseline MMSE score, the proposed method suggests that at the time of change a caregiver who is moderately stressed is more likely to reach to increase to a ‘severe’ stress level than revert to mildly stressed.

Table 3
Comparison of Proposed and Naïve approaches for estimating the parameters of movement between stages of AD patient care giver stress levels.

Note that the probability of this type of transition is .74 {exp(−2.43)/exp(−2.43)+exp(−3.48)} and .47 for the naïve approach. It is noteworthy to point out that the frequency of transitions from ‘mild’ to ‘severe’ and ‘severe’ to ‘mild’ is 1.8% and 2.4%%, respectively which is very few. The proposed method-M1 suggests that caregivers accurately report their stress levels 85% of the time and those patients moderately stressed are only slightly less likely to over-report their stress rather than under-report it.

5. Identifiability

Due to the nature of the process, a CTMC model may be non-identifiable when outcomes are recorded at pre-specified times with or without misclassification. If the sojourn time and state of change is recorded, the model will be fully identifiable. If the mean sojourn time is much longer than the inter-observation interval the non-identifiable problem will be almost negligible. A more likely scenario is that the mean sojourn time interval (1/qii) is shorter than the inter-observation time interval and stage changes could be missed between observational periods. In other words, an observed transition (outcome) could be reached by at least two different paths. The complexity of this problem increases when a misclassification parameter is added to the model. Under certain conditions the model specified in this paper can achieve a working level of identifiability. First when state 2 is observed it is not misclassified. So, an observed 2-to-1 transition can be used to compare with a 2-to-2 transition. If the true transited state is 2, the dynamic behavior of the observed 2-to-1 transition should be similar to that of the 2-to-2 transition. A similar comparison can also be applied to the observed 2-to-3 transition. Second, in our model, covariates X1 and X2 are not misclassified and are used to link the transition rates of the hidden states via a regression model. Thus, covariates can help amplify the inter-transition time and assist in identifying misclassification status. In other words, they can improve parameter identifiability. However, if all three states are possibly misclassified, this effect may be eliminated.

6. Discussion

We have proposed a method to estimate parameters of a hidden three-state CTMC with covariates when some observed outcomes are potentially misclassified and to estimate the probabilities of misclassification (sensitivity rates) in the absence of ‘gold standard’ information. We allow movement between all possible states. This estimation method was able to recover the hidden parameters with varying levels of misclassification imposed (up to a total of 20%). Simulation studies revealed the disruption of naively analyzing even a small amount of misclassified data with uneven observational schedules among and within patients (Table 1a). Another attractive feature of the proposed method is that it detected almost negligible error rates when the data were correctly classified and does not falsely provide a significant misclassification rate (aka specificity). Finally, we have shown, without presenting the results, that this method is robust to heterogeneities in the progression rates, to smaller sample sizes (n=350), and shorter chains (reduced to 6 observation times).

The complexity of statistical methodology with potentially misclassified outcomes and the drawbacks of ignoring this scenario are well known. For irreversible multi-state processes, approaches exist to account for potential misclassification but to date no other methods take into account the more general case of recurrent multi-state processes subject to misclassification. Our analytical likelihood approach has been developed to confront this. The impact of this research contributes substantially to the situation where the outcome is truly hidden and the validity of the observed data is suspect yet also provides a flexible approach when the researcher is unsure of the nature of whether the true state is misclassified or not.

We have shown that our proposed method gives very different results from the naïve method in terms of significance and interpretation of covariate effects when using the naïve approach to estimate the movement between stages of AD patient care giver stress levels. One of the major difficulties in analyzing longitudinal categorical data using a Markov model is to mathematically prove the future distribution of the data depends on the present not the past at any time point. In our model, although the hidden process is assumed to be a CTMC, the observed data does not constitute a CTMC and hence, it is not possible to validate the Markov property of the underlying process.

Finally, this method can be applied to any longitudinal discrete data with or without possible misclassification if the purpose is to look at transition behavior. Diseases are often referred to in terms of progression (irreversible disease processes) and is a special case of our method. Thus, our method can handle both general and special cases of the 3-state CTMC with some observed states potentially misclassified.

Acknowledgments

Julia Benoit was supported by NIH grant 2T32GM074902-06.

REFERENCES

1. Yeh H, Chan W, Symanski E. Intermittent missing observations in discrete-time hidden Markov models. Communications in Statistics-Simulation and Computation. 2012;41:167–181.
2. Poskitt DS, Zhang J. Estimating components in finite mixtures and hidden Markov models. Aust. N. Z. J. Stat. 2005;47(3):269–286.
3. Altman RM, Petkau AJ. Applications of hidden Markov models to multiple sclerosis lesion count data. Stat. Med. 2005;24:2335–2344. [PubMed]
4. Le Strat Y, Carrat F. Monitoring epidemiologic surveillance data using hidden Markov models. Stat. Med. 1999;18:3463–3478. [PubMed]
5. Pfeffemann D, Skinner C, Humphreys K. The estimation of gross flows in the presence of measurement error using auxiliary variables. J. R. Statist. Soc. A. 1998;161(1):13–32.
6. Smith T, Vounatsou P. Estimation of infection and recovery rates for highly polymorphic parasites when detectability is imperfect, using hidden Markov models. Stat. Med. 2003;22:1709–1724. [PubMed]
7. Rosychuk RJ, Thompson ME. A semi-Markov model for binary longitudinal responses subject to misclassification. The Canadian Journal of Statistics. 2001;29(3):395–404.
8. Bureau A, Shiboski S, Hughes JP. Applications of continuous time hidden Markov models to the study of misclassified disease outcomes. Stat. Med. 2003;22:441–462. [PubMed]
9. Rosychuk RJ, Sheng X, Stuber JL. Comparison of variance estimation approaches in a two-state Markov model for longitudinal data with misclassification. Stat. Med. 2006;25:1906–1921. [PubMed]
10. Van den Hout A, Matthews FE. Multi-state analysis of cognitive ability data: a piecewise-constant model and a Weibull model. Stat. Med. 2008;27:5440–5455. [PubMed]
11. Jackson CH, Sharples LD, Thompson SG, Duffy SW. Multistate Markov models for disease progression with classification error. The Statistician. 2003;52(2):193–209.
12. Li YP, Chan W. Analysis of longitudinal multinomial outcome data. Biometrical Journal. 2006;48(2):319–326. [PubMed]
13. Mhoon KB, Chan W, Del Junco DJ, Vernon SW. A continuous-time Markov chain approach analyzing the stages of change construct from a health promotion intervention. JP Journal of Biostatistics. 2010;4(3):213–226. [PMC free article] [PubMed]
14. Ma J, Chan W, Tsai C, Xiong M, Tilley B. Analyze Trans-theoretical model of health behavioral changes in a nutrition intervention study- A continuous time Markov chain model with Bayesian approach. Statistics in Medicine. 2015 (in press) [PMC free article] [PubMed]
15. Rosychuk RJ, Thompson ME. Parameter identifiability issues in a latent Markov model for misclassified binary responses. Journal of the Iranian Statistical Society. 2004;3:38–57.
16. Chen B, Yi GY, Cook RJ. Analysis of interval-censored disease progression data via multi-state models under ignorable inspection process. Statistics in Medicine. 2010;29:1175–1189. [PubMed]
17. Baum LE, Petrie T, Soules G, Weiss N. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Statist. 1970;41(1):164–171.
18. McKhann G, Drachman D, Folstein M, Katzman R, Price D, Stadlan EM. Clinical diagnosis of Alzheimer's disease: report of the NINCDS-ADRDA work group under the auspices of department of health and human services task force on Alzheimer's disease. Neurology. 1984;34(7):939–944. [PubMed]
19. Doody RS, Dunn JK, Huang E, Azher S, Kataki M. A method for estimating duration of illness in Alzheimer's disease. Dementia and Geriatric Cognitive Disorders. 2004;17(1–2):1–4. [PubMed]
20. Doody R, Pavlik V, Massman P, Kenan M, Yeh S, Powell S, et al. Changing patient characteristics and survival experience in an Alzheimer’s center patient cohort. Dementia and Geriatric Cognitive Disorders. 2005;20(2–3):198–208. [PubMed]
21. Folstein MF, Folsetein SE, McHugh PR. “Mini-mental state”: A practical method for grading the cognitive state of patients for the clinician. Journal of Psychiatric Research. 1975;12(3):189–198. [PubMed]