Multi-State Models for Panel Data : The msm Package for R

Panel data are observations of a continuous-time process at arbitrary times, for example, visits to a hospital to diagnose disease status. Multi-state models for such data are generally based on the Markov assumption. This article reviews the range of Markov models and their extensions which can be fitted to panel-observed data, and their implementation in the msm package for R. Transition intensities may vary between individuals, or with piecewise-constant time-dependent covariates, giving an inhomogeneous Markov model. Hidden Markov models can be used for multi-state processes which are misclassified or observed only through a noisy marker. The package is intended to be straightforward to use, flexible and comprehensively documented. Worked examples are given of the use of msm to model chronic disease progression and screening. Assessment of model fit, and potential future developments of the software, are also discussed.

1. Markov multi-state models for panel data

Definitions
A multi-state model describes how an individual moves between a series of states in continuous time.Suppose an individual is in state S(t) at time t.The movement on the discrete state space 1, . . ., R is governed by transition intensities q rs (t, z(t)) : r, s = 1, . . ., R. These may depend on time t, or, more generally, also on a set of individual-level or time-dependent explanatory variables z(t).The intensity represents the instantaneous risk of moving from state r to state s = r: q rs (t, z(t)) = lim δt→0 P(S(t + δt) = s|S(t) = r)/δt.
The q rs form a R×R matrix Q whose rows sum to zero, so that the diagonal entries are defined by q rr = − s =r q rs .An example is the general model for disease progression (Figure 1), in which individuals can advance or recover between adjacent disease states, or die from any state.

Panel data
The other articles in this issue focus on fitting multi-state models of this type to continuouslyobserved processes, where the state S i (t) of each individual i = 1, . . ., M is known at all times t in the study period.Survival analysis is the simplest such example, a two-state model where individuals remain alive until an observed or censored time of death.
This article focuses on multi-state models for panel data, in which the state S i (t) is only known at a finite series of times t = (t i1 , . . ., t in i ).Fitting multi-state models to panel data generally relies on the Markov assumption, that future evolution only depends on the current state.That is, q rs (t, z(t), F t ) is independent of F t , the observation history F t of the process up to the time preceding t.See, for example, Cox and Miller (1965) for a thorough review of continuous-time Markov chain theory.In a time-homogeneous Markov model, in which the q rs are also independent of t, the sojourn time in each state r is exponentially-distributed with mean −1/q rr .The probability that an individual in state r moves next to state s is −q rs /q rr .

The msm package
This article describes the msm package for R (R Development Core Team 2010), available from http://CRAN.R-project.org/package=msm.msm can be used to fit a Markov model with any number of states and any pattern of transitions to panel data, and includes several extensions such as hidden Markov models and models whose transition intensities vary with individual-specific or time-varying covariates.msm was motivated by studies of chronic diseases in medicine, and is frequently used in this area (Jackson et al. 2003;Sharples et al. 2003;Gani et al. 2007;Sweeting et al. 2006;Buter et al. 2008;Skogvoll et al. 2008), but it has been widely used in other fields, for example geology (Aspinall et al. 2006), zoology (Gautrais et al. 2007) and econometrics (Rummel 2009).

Likelihood for panel data
The Markov model for panel data was first described by Kalbfleisch and Lawless (1985) and Kay (1986).The likelihood for this basic model, used in msm, is calculated from the transition probability matrix P (u, t + u).The (r, s) entry of P (u, t + u), p rs (u, t + u), is the probability of being in state s at a time t + u, given the state at time u is r.P (u, t + u) is calculated in terms of Q using the Kolmogorov differential equations (Cox and Miller 1965).If the transition intensity matrix Q is constant over the interval (u, t + u), as in a timehomogeneous process, then P (u, t + u) = P (t) and the equations are solved by the matrix exponential of Q scaled by the time interval, P (t) = Exp(tQ).
Figure 1: General model for disease progression.Individuals advance between adjacent stages of disease severity, and optionally recover to an adjacent less severe state or die from any state.
X 3 /3!+ ... as the scalar exponential, except that each term X k in the series is defined by matrix products, not element-wise scalar multiplication.For simpler models, an analytic expression for each element of P (t) can be calculated in terms of entries of Q by hand or by using symbolic algebra software.Otherwise, msm uses eigensystem decomposition, or, if there are repeated eigenvalues, the method of Padé approximants (Moler and van Loan 2003).
The full likelihood is then the product of probabilities of transition between observed states, over all individuals i and observation times j: Each component L i,j is the entry of the transition matrix P (t) at the S(t ij )th row and S(t i,j+1 )th column, evaluated at t = t i,j+1 − t ij .The likelihood L(Q) is maximized in terms of log(q rs ) to compute the estimates of q rs , using standard optimization algorithms, as implemented in the optim function in R. Standard errors are computed from the Hessian at the optimum.Some of these optimization algorithms make use of the derivatives of the likelihood, which were given by Kalbfleisch and Lawless (1985).
The likelihood (1) for this and all models in msm assumes that the sampling times are ignorable.That is, the fact that a particular observation is made at a certain time does not implicitly give information about the value of that observation.Sampling times are ignorable if they are fixed in advance, or otherwise chosen independently of the outcome of the process.Grüger et al. (1991) also showed that the sampling times are ignorable under a "doctor's care" sampling scheme, where the next observation time (such as a visit to a doctor) is chosen on the basis of the current state.Basing the current observation time on the current state would be a non-ignorable sampling scheme.To avoid bias, non-ignorable sampling times should be modelled as part of the likelihood (Sweeting et al. 2010).

Exact death times
In observational studies of chronic diseases, it is common that the time of death is known, but the state immediately before death is unknown.If S(t i,j+1 ) = D is such a death state, then the contribution to the likelihood at this time is summed over the unknown state m at the instant before death: Continuously-observed processes msm allows Markov models to be fitted to processes which are continuously-observed.However, the assumption of exponential sojourn times inherent in Markov models is restrictive, and more flexible models can be fitted to such data with other software.For example, proportional hazards models with non-parametric baseline intensities can be fitted using the mstate package (de Wreede et al. 2011Wreede et al. , 2010) ) Generally, msm allows a dataset to be an arbitrary mixture of observations such that states are panel-observed, continuously-observed, or "exact death times".

Using msm for a basic Markov model
The package is illustrated with a set of data from monitoring heart transplant recipients, which is provided with msm.Sharples et al. (2003) studied the progression of coronary allograft vasculopathy (CAV), a post-transplant deterioration of the arterial walls, using these data.The dataset can be made available to the current R session using the command data("cav").

Specifying the Markov model and initial values
We assume that the patient can advance or recover from consecutive states while alive, and die from any state, as in Figure 1 with R = 4 states, giving a transition intensity matrix of −(q 12 + q 14 ) q 12 0 q 14 q 21 −(q 21 + q 23 + q 24 ) q 23 q 24 0 q 32 −(q 32 + q 34 ) q 34 0 0 0 0 As Section 4 will explain, this model is not strictly medically realistic, but we fit it here for illustration.Note that this matrix represents transitions in an instant rather than the transitions over an interval summarized by statetable.msm-the 4 individuals who moved from state 3 to state 1 in successive observations are assumed to have travelled via state 2, therefore q 31 = 0 but q 32 , q 21 = 0.
To tell msm what the allowed transitions of our model are, we define a matrix twoway4.q of the same size as Q, containing zeroes in the off-diagonal positions where the entries of Q are zero.All other off-diagonal positions contain an initial value for the corresponding transition intensity.Any diagonal entries q rr supplied are ignored, as these are constrained to be minus the sum of all the other entries in the row.The rows and columns of twoway4.q are given informative names which will be used when presenting the estimates.

Running msm and interpreting results
The maximum likelihood estimate of Q is computed by the msm() function, as below, starting from the supplied initial values.The argument death=4 indicates that entry times into state 4 are observed exactly but the state on the instant before is unknown (Section 1.5).The optimization in this example takes about 20 seconds on a typical current computer.Printing the object cav.msm returned by msm() displays the estimated transition intensity matrix with 95% confidence intervals.We see patients are about three times as likely to develop CAV than die without CAV (first row).After onset of mild disease, progression to severe CAV is about 50% more likely than recovery, and death from the severe disease state is rapid (mean of 1 / 0.41 = 2.4 years in state 3).
R> cav.msm <-msm(state ~years, subject = PTNUM, data = cav, + qmatrix = twoway4.q,death = 4) R> cav.msmCall: msm(formula = state ~years, subject = PTNUM, data = cav, qmatrix = twoway4.q,death = 4) To display the fitted transition probability matrix P (t) over an interval of t = 1 year, the function pmatrix.msm() is used.This suggests a 9%, 15% and 4% probability that in one year's time, an individual currently free of CAV will have mild CAV, severe CAV or be dead, respectively.The option ci="normal" computes a confidence interval for P (t) by repeated sampling from the asymptotic normal distribution of the maximum likelihood estimates of the log(q rs ).The output below is based on the default 1000 samples, and has converged to within 2 significant figures.Alternatively, intervals can be computed using nonparametric bootstrap resampling (ci="boot").The dataset of M i=1 n i serially-correlated state observations from M individuals is rearranged as a dataset of M i=1 (n i −1) independent transitions between pairs of states.Bootstrap datasets of transitions are drawn with replacement and the model refitted repeatedly to estimate the sampling uncertainty surrounding the estimates.This method is more accurate but much slower due to the need to refit the model for each resample.

Controlling numerical optimization
The optimization may occasionally converge to a local rather than a global maximum of the likelihood surface.Therefore to ensure that the global maximum has been found, it is recommended to run msm() with diverse sets of initial values.However, if values too far from the optimum are chosen then the algorithm may not converge.To improve convergence, the optimization in msm() can be fine-tuned using all the options available to the R function optim().For example, the number of iterations can be increased with maxit, and the loglikelihood can be rescaled during optimization (fnscale).
But if over-complex models are applied with insufficient data, then the parameters of the model will not be identifiable.The fixedpars option to msm() is useful for profiling likelihoods.This allows any parameters to be fixed at their initial values.The model must of course be realistic.In Markov models for panel data, it is not usually feasible to estimate a model where instantaneous transitions are allowed between every pair of states.For example, in chronic disease applications, transitions are generally only plausible between "adjacent" states of a disease -a patient who is observed as "well" at t j , and "severe" at t j+1 must have gone through "mild" in the interval (t j , t j+1 ).

Individual-level covariates
The effect of a vector of explanatory variables z ij on the transition intensity for individual i at time j is modelled using proportional intensities, replacing q rs with q rs (z ij ) = q (0) rs exp(β rs z ij ).
The likelihood is then maximized over the q (0) rs and β rs .In the CAV example, the age of the heart transplant donor (variable dage) and the primary diagnosis, or reason for transplantation (variable pdiag), are suggested to affect the rate of onset and progression of CAV.We fit a model in which the intensities are different according to donor age and a primary diagnosis of ischaemic heart disease (IHD), after creating a binary variable ihd representing IHD from the categorical pdiag.A "formula" in standard R linear modelling syntax, ~dage + ihd, is supplied as the covariates argument to msm().To facilitate convergence, the "BFGS" quasi-Newton optimization algorithm is used (see the documentation for the R function optim()), and the maximum number of iterations is increased to 10000.The -2× log-likelihood is also divided by 4000, since it takes values around 4000 for plausible ranges of the parameters.This ensures that optimization takes place on an approximate unit scale, to avoid numerical overflow or underflow.

Model comparison
Likelihood ratio tests between nested models fitted in msm can be performed conveniently using the function lrtest.msm.Comparing a likelihood ratio statistic of 59 to a χ 2 distribution with 14 degrees of freedom shows that the model with covariates (cav.cov.msm)fits significantly better than the model without covariates (cav.msm).
R> lrtest.msm(cav.msm,cav.cov.msm) -2 log LR df p cav.cov.msm58.5785 14 2.079552e-07 Covariate effects may be constrained to equal between different intensities, using the constraint argument to msm().For example, in a disease progression model, the effect of a covariate on all progression rates may be equal.constraint is a list of vectors, one for each covariate.In the model cav.cov2.msmfitted below, dage = c(1, 2, 3, 1, 2, 4, 2) indicates that the effect of dage on the 1st and 4th intensities are constrained to be equal, as is the effect on the the 2nd, 5th and 7th intensities.The parameters are assumed to be ordered by reading across the rows of the transition matrix, starting at the first row: (q 12 , q 14 , q 21 , q 23 , q 24 , q 32 , q 34 ), so that in the model cav.cov2.msm, the effect on the CAV onset rate q 12 equals the effect on the CAV progression rate q 23 , and the effects on all death rates q 14 , q 24 , q 34 are constrained to be equal.However, a likelihood ratio test indicates that the bigger model cav.cov.msmwithout constraints fits significantly better.

Time-inhomogeneous models
In general, the transition probability matrix P (u, t + u), hence the likelihood for panel data, cannot be calculated in closed form if Q varies over the interval (u, t + u).An exception is if Q is piecewise-constant.The effect of time-dependent variables, including time itself, on the transition intensities can be modelled in msm under this assumption.For example, suppose a covariate varies continuously through time, but is only observed at the same times as the state of the Markov process.The approximate effect of that covariate can be estimated assuming that it is constant in between the times that it is observed, so that P (u, t + u) = P (t).More generally, time-inhomogeneous Markov models can be constructed in which piecewise-constant covariates change at times other than (t i1 , . . ., t in i ).This is accomplished by summing the likelihood over the unknown observed state at the times when the covariates change (Equation 2, Section 3.4).msm provides a convenient facility for constructing time-inhomogeneous models in which intensities change at the same times for every individual.A vector of change points is specified in the pci argument to msm().The following command fits an inhomogeneous model to the CAV data in which all intensities change 5 years after transplantation.This constructs a model with a single binary covariate called timeperiod, a factor in R, with levels (-Inf, 5] (the baseline) representing the first time period, and [5, Inf), representing the second time period.A likelihood ratio test against the time-homogeneous model suggests significant timeinhomogeneity.The estimated hazard ratios from this fitted model show an increased onset rate of mild CAV in the second period, though no significant time effect on other transitions.There is weak information about the effect of time on the death rate from mild CAV.

Censored states
In the CAV example, some patients were known to be alive but in an unknown disease state at the end of the study.We say that the disease state is censored, meaning that the exact value is unknown, but known to be in a certain set.Unlike in survival analysis, here it is the state, not the event time, which is censored.If the patient were alive at the end of the study but with a known state, then the standard likelihood (1) would apply.
msm allows the state observation at any time to be censored, that is, known only to be in an arbitrary subset of the state space.Suppose the 1, 2, . . .n i th observations from individual i are known only to be in the sets C 1 , C 2 , . . ., C n i respectively.The likelihood for this individual is a sum of the likelihoods of all possible paths through the unobserved states.
This likelihood is used in msm to fit general time-inhomogeneous models with piecewiseconstant intensities, as described in Section 3.3, where the state is not observed at times when the intensities change.
Suppose the variable state in the data cav were to contain observations coded 99 on occasions where the patient is alive but in an unknown state, which could be state 1, 2 or 3.The standard Markov model could be fitted to such data using the censor and censor.statesoptions to msm(), as follows.

Hidden Markov models
In a hidden Markov model (HMM), the states of the Markov chain are not observed.The observed data y ij are governed by some probability distribution conditionally on the unobserved state S ij .This class of model is commonly used in areas such as speech and signal processing (Juang and Rabiner 1991) and the analysis of biological sequence data (Durbin et al. 1998), with a discrete-time underlying Markov chain.Applications of HMMs in medicine, where continuous-time processes are usually more suitable, include Satten and Longini (1996); Bureau et al. (2003); Jackson and Sharples (2002); Jackson et al. (2003).These models can represent chronic staged diseases which can only be diagnosed by an error-prone marker.

Likelihood
The msm package can fit continuous-time hidden Markov models to panel-observed data with a variety of distributions for the outcome conditionally on the hidden state.HMMs are fitted in msm by direct maximization of the likelihood, as in Satten and Longini (1996), though Bureau et al. (2000) describe an alternative EM algorithm for fitting the same class of models.
The contribution of individual i to the likelihood is where the sum is taken over all possible paths of underlying states S i1 , . . ., S in i .Assume that the observed states are conditionally independent given the values of the underlying states.Also assume the Markov property, P(S ij |S i,j−1 , . . ., S i1 ) = P(S ij |S i,j−1 ).Then the contribution L i can be written as a product of matrices, as follows.To derive this matrix product, decompose the overall sum in Equation 3 into sums over each underlying state.The sum is accumulated over the unknown first state, the unknown second state, and so on until the unknown final state: where P(y ij |S ij ) is the probability density of the outcome conditional on the hidden state (also called the "emission" distribution), and P(S ij |S i,j−1 ) is the transition probability of the hidden Markov chain, calculated as in Section 1.4.
msm allows most common distributions to be used as HMM outcome models.The modular design of msm allows new outcome distributions to be added easily, as described in the package documentation.These must be univariate, and msm is restricted to situations where only one observation is made conditionally on an underlying Markov process.
In practice, the outcome distribution may vary between individuals and through time, as well as with the hidden state.msm allows one location parameter for each class of outcome distribution to depend on covariates, for example, a linear model for the mean of a normal outcome distribution.The transition rates of the hidden Markov chain may also vary with covariates, just as for non-hidden Markov models (Section 3.1).
The distribution P(S i1 ) of the initial state may be estimated from the data, or fixed at plausible values.This distribution may also depend on covariates through a multinomial logistic regression.

Application of hidden Markov models: FEV 1 after lung transplants
A dataset of repeated measurements of FEV 1 , forced expiratory volume in 1 second, in recipients of lung transplants (Jackson and Sharples 2002) is provided with msm as data("fev").
FEV 1 measurements are used to diagnose bronchiolitis obliterans syndrome (BOS), a chronic deterioration in lung function.FEV 1 is measured as a percentage of a baseline value for each individual, determined in the first six months after transplant, and defined to be 100% baseline at six months.Figure 2 shows a series of FEV 1 measurements from a typical patient from this dataset.BOS is modelled as a staged disease, with stages defined by No BOS (≥ 80% baseline FEV 1 ).

Death.
As FEV 1 is subject to high short-term variability due to acute events and natural fluctuations, the exact state at each observation time is difficult to determine, making it difficult to model the natural history of BOS as defined.Instead, we represent the BOS progression by a hidden Markov model for FEV 1 , conditionally on underlying BOS states.Discrete states are considered to be an appropriate alternative to representing the underlying disease status as continuous, as the onset of BOS is often sudden.q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q 0 2 4 6 8 10 Here we describe a three-state "illness-death" hidden Markov model, with states representing no BOS, BOS and death, and a transition intensity matrix of The distribution of percentage of baseline FEV 1 is Normal(µ 1 , σ 2 1 ) in state 1 and Normal(µ 2 , σ 2 2 ) in state 2. State 3, representing death, is observed without error and is given a label of 999 in the data.The death time is known exactly.More sophisticated four and five-state models for the FEV 1 data, using outcome distributions which separate measurement error and natural variation in the response, are described by Jackson and Sharples (2002).

Fitting hidden Markov models with msm
To fit this hidden Markov model using the msm() function, the argument hmodel is used.This is a list of objects representing the outcome distribution for each state, returned by constructor functions.Each constructor function has arguments giving initial values for the parameters of the outcome distribution.In this example, hmmNorm(mean = 100, sd = 16) indicates initial values of 100 for µ 1 and 16 for σ 1 .hmmIdent(999) represents the identity distribution, in other words, state 3 is observed without error, and is indicated by a value of 999 in the data.Initial values for the Markov transition intensities are given in an object called three.q,used as the qmatrix argument to msm() as before.
The most likely true series of states underlying the data can be estimated using the Viterbi algorithm (Viterbi 1967) through the function viterbi.msm.Figure 2 shows the most likely time at which the individual passed from state 1 to state 2 -the time when their decline below 80% of baseline became sustained.

Misclassification models
An important special case of HMMs is the multi-state model with misclassification, where the observed data are states, assumed to be misclassifications of the true, underlying states (Jackson et al. 2003).In the CAV example, it is not medically realistic for patients to recover from a diseased state to a healthy state, as in the model of Section 2. Progression of coronary artery vasculopathy is thought to be an irreversible process.The angiography observations are actually subject to error, which leads to some false measurements of CAV states and apparent improvements in state.Thus a more realistic Markov intensity matrix Q would be as given in Figure 1, but with q r+1,r = 0 for each r, −(q 12 + q 14 ) q 12 0 q 14 0 −(q 23 + q 24 ) q 23 q 24 0 0 −q 34 q 34 0 0 0 0 .
We also assume that true state 1 (CAV-free) can be classified as state 1 or 2, state 2 (mild/moderate CAV) can be classified as state 1, 2 or 3, while state 3 (severe CAV) can be classified as state 2 or 3. Recall that state 4 represents death.Thus the matrix of misclassification probabilities is 1 − e 12 e 12 0 0 e 21 1 − e 21 − e 23 e 23 0 0 e 32 1 − e 32 0 0 0 0 0 where e rs is the probability of observing state s conditionally on occupying true state r.
These are hidden Markov models with a categorical outcome distribution, and as such may be fitted in msm using a hmmCat() outcome distribution for each underlying state.However msm provides a convenient shorthand for fitting models of this form.An ematrix argument to msm() is given a matrix of initial values for the misclassification probabilities, with zero in positions where misclassifications cannot occur.In the CAV example we initialize the four unknown misclassification parameters to 0.1, and set the initial values oneway4.qfor Q to the approximate maximum likelihood estimates from the model without misclassification.obstrue=firstobs specifies that observations indicated by the binary variable firstobs in the data are not misclassifications, but observations of the true state.In the CAV data, these are the dates of transplantation, at which patients are known to be CAV-free, in state 1.
The misclassification probabilities may also be modelled in terms of covariates, using multinomial logistic regression.This is accomplished with the misccovariates argument to msm().
For example, a disease screening test may be more sensitive for different types of individuals.

Model assessment
Titman and Sharples (2010a) reviewed methods for assessing the fit of Markov models to panel data.In particular, the Markov property and homogeneity of transition rates, both between individuals and through time, can be restrictive assumptions.

Diagnostic plots
One simple diagnostic compares model predictions of the entry time into a particular state with nonparametric estimates, for example Kaplan-Meier curves.If the entry time is not observed exactly, then the nonparametric estimate is an approximation.In Figure 3, the fit of four multi-state models to the exactly-observed survival times in the CAV data is assessed in this way.
Another common approach to multi-state model assessment is to compare observed prevalences of states with expected prevalences under the model at a series of times.This can be done in msm using the functions prevalence.msm() and plot.prevalence.msm().To compute observed prevalences precisely, all individuals should be observed at these times.If individuals are observed at different times, this relies on approximations such as assuming transitions occur only at observation times (Gentleman et al. 1994) or at midpoints between observation times.Figure 4 presents a plot of this type for the best-fitting model cav.pci2.msmfor the CAV data.As time elapses, the proportions of individuals predicted to have died appear to be underestimated by the model, and the proportions alive and in states "well" and "mild" are overestimated.However, the Kaplan-Meier estimate in Figure 3 gives a more accurate estimate of the "observed" survival probability in this case.The observed prevalence of a state is simply calculated as the number of individuals known to be in that state, divided by the number of individuals whose state is known at that time, which ignores the information from individuals censored at earlier times.

Formal goodness-of-fit test
The previous plots are informal diagnostics to suggest potential model improvements.A formal goodness-of-fit test for the hypothesis that panel data were generated by a fitted Markov model was developed by Aguirre-Hernandez and Farewell (2002).This test was extended by Titman and Sharples (2008) to handle exactly-observed death times and misclassified states.This is implemented in msm as the function pearson.msm().The test compares observed and expected numbers of transitions between pairs of states for a series of transition starting times, transition time intervals and covariate categories, giving a Pearson-type contingency table test statistic.
The null distribution of the statistic is not exactly χ 2 , with a complex form for general panel data (Titman 2009).For simpler models without covariates, Aguirre-Hernandez and Farewell (2002) showed by simulation that the χ 2 approximation was adequate.The pearson.msm function provides theoretical upper and (unless there are exact death times) lower bounds for the test p value.In general cases, the null distribution of the statistic can be estimated by the parametric bootstrap procedure of repeatedly sampling from the fitted model, refitting the model and recomputing the test statistic, resulting in an accurate p value.If the resulting contingency table is sparse, then the number of observation time, time interval or covariate categories may need to be reduced to improve the χ 2 approximation, though the power of the resulting test may be low.See the pearson.msmhelp page in the package for further details.
The Pearson-type test is performed for the four models illustrated in Figure 3.The upper p value bounds indicate that none of these models give an adequate overall fit.This suggests that even though the time-inhomogeneous model cav.pci2.msmfits well to survival (Figure 3), it discriminates less well between the states of CAV severity (Figure 4).A more complex pattern of time-dependence, or allowing the transition intensities to depend on covariates, would be expected to yield a better fit.Since the method of Titman and Sharples (2008) to handle exactly-observed death times involves multiple imputation of the next scheduled observation time, these statistics and p values include some simulation error.The default 100 imputations in this example ensures the statistics have converged within 2 significant figures and the p values to within an order of magnitude.

Other issues in model assessment
The influence of each individual on the maximized likelihood can be computed and illustrated by score residuals, using the function scoreresid.msm.Titman and Sharples (2010a) also discussed the assessment of multi-state models with misclassification, criticising in particular the assumption of independence of the observed outcome conditionally on the underlying state.

Extensions of Markov models and limitations of msm
The msm package was designed to fit any Markov model structure to panel-observed multistate data.Because of this aim of generality, there are limitations in handling more complex models which are only practicable for specific patterns of observations or allowed transitions.

Continuously-observed processes
For example, if the data are continuously-observed, msm is limited to exponential or piecewiseexponential sojourn times.More flexible models, for example, with Weibull-distributed sojourn times, are relatively easy to fit to such data.The mstate package (de Wreede et al. 2010Wreede et al. , 2011) ) implements multi-state models with nonparametric baseline hazards and proportional hazards regression.
When the model is progressive, for example, a model as in Figure 1 but with all reverse transition rates q r+1,r = 0, the number of possible pathways taken by an individual through the states is finite, so that likelihood calculations are simpler.For example, the "illness-death" model has only one disease state, and no recovery allowed from "well" to "disease".The data for such a model may only be interval-censored, that is, the transition to illness is known to have occurred between two observations, but at an unknown time.Flexible, non-parametric methods are possible in this case (Frydman 1995;Frydman and Szarek 2008).This is simpler than panel data, where both the type and number of transitions occurring between adjacent observations are unknown in general.

Non-Markov models
Relaxing the Markov assumption with panel data presents more difficulties.Semi-Markov models with piecewise-constant intensities are only feasible to estimate for simpler model structures (Titman 2008).Foucher et al. (2010) used numerical integration to compute the likelihood for 3 or 4 state progressive semi-Markov models.Titman (2008) described an Monte Carlo EM algorithm for fitting progressive semi-Markov models to panel data.All these methods would be very difficult to implement for a general Markov model structure.In msm, an approximate non-Markov model might be fitted by creating artificial time-dependent covariates representing aspects of the process history, though this approach would require very frequent observations to be sufficiently accurate.A more promising approach to semi-Markov models is the phase-type model, in which the exponentially-distributed time spent in each state r is replaced by a series of exponential sojourns (or "phases") in hidden states r 1 , . . ., r k (Titman and Sharples 2010b).In principle, these models may be implemented as hidden Markov models in msm, but certain parameter constraints (currently not implemented) may be necessary for identifiability.

Random effects and Bayesian methods
Unexplained heterogeneity in transition intensities between individuals may be represented by random effects models, though these are not implemented in msm.Their likelihood for panel data is intractable, except for specific cases such as the "tracking" model (Satten 1999) in which the random effect acts on all intensities simultaneously, or a discrete random effects distribution (Cook et al. 2004).
The msm package is limited to maximum likelihood estimation.Multi-state models can be fitted to panel data from a Bayesian perspective using MCMC simulation (Sharples 1993), which is particularly suited to hierarchical models with random effects.Random effects Markov models with simple state structures have been implemented using the WinBUGS (Lunn et al. 2000) software for Bayesian analysis (Pan et al. 2007;van den Hout and Matthews 2009).Welton and Ades (2005) describe how to implement general multi-state structures using the WBDiff (Lunn 2004) differential equation solving interface to WinBUGS to calculate P (t), while the JAGS implementation of the BUGS language (Plummer 2003) allows general Markov model structures to be fitted to panel data via a distribution dmstate().

Discrete-time models
msm was designed for continuous-time models, but discrete-time Markov and hidden Markov models can be fitted to discrete-time data using msm, assuming that there is a continuous process underlying the data.The fitted transition probability matrix in one time unit, P (1), is then equivalent to the transition probability matrix P of the discrete-time model.But since a discrete-time Markov model is equivalent to a series of multinomial models for each observation conditionally on the previous observation, these may be fitted more efficiently using software for multinomial logistic regression, for example, the function multinom() in the R package nnet (Venables and Ripley 2002).Currently there are several available R packages which can fit discrete-time hidden Markov models of various forms, for example HiddenMarkov (Harte 2010), hsmm (Bulla et al. 2008) and mhsmm (O'Connell and Hojsgaard 2009).

Further information
This article gives an overview of the msm package for fitting continuous-time Markov and hidden Markov models to panel data.Detailed references for all the functions for model fitting and output presentation are available as help pages in the installed package.The doc subdirectory of the package also contains a user guide in PDF format, which presents much of the material in this article in greater detail.
The examples in this article were run using version 1.0 of msm, available from http://CRAN.R-project.org/package=msm.
agement and bug reports.The author is funded by the UK Medical Research Council (grant U.1052.00.008).

Figure 2 :
Figure 2: Measurements of lung function (FEV 1 ) from a lung transplant recipient and fitted BOS states from a hidden Markov model.

Figure 3 :
Figure 3: Comparison of observed and fitted survival for three multi-state models for the CAV data.

Figure 4 :
Figure 4: Comparison of observed and expected prevalence from the time-inhomogeneous model cav.pci2.msmfor the CAV data.
years gives the time of the test in years since the heart transplant.Data are supplied to msm as a series of observations, grouped by patient.This should be a data frame with variables indicating the observed state of the process (state in the CAV data) and the time of the observation (years in the CAV data) If the data come from more than one individual, then a subject identification variable (PTNUM in the CAV data) must also be supplied.This does not need to be numeric, but observations from the same subject must be adjacent in the dataset, and observations must be ordered by time within subjects.The first eleven rows of the data cav give the observation series from the first two patients.Other variables are either individual-specific or time-dependent covariates (see Section 3).
30 observations from 8 individuals with missing primary diagnosis (reason for transplantation, variable pdiag) are dropped from the data, giving a dataset with 2816 state observations from 614 individuals.Approximately each year after transplant, each patient has an angiogram, at which CAV can be diagnosed.The result of the test is in the variable state, with possible values: 1, representing no CAV.
onset, and 1 year of donor age is associated with a 2% greater risk of CAV onset and a 4% greater risk of death without CAV.We can also use qmatrix.msm() to calculate the transition intensity matrix for specified covariate values as follows, in this case, a donor age of 50 years old and a primary diagnosis of IHD.Compared with the fitted intensities for the "average" person from the model without covariates (Section 2.3), we see an approximately doubled risk of CAV onset and death without CAV.(The average donor is 30 years old and about half of heart transplants are due to IHD).
Time-dependent intensities in msm are restricted to piecewise-constant models.More flexible alternatives are discussed in Section 6.
Hubbard et al. (2008)s may vary with time, depending on either the time since the beginning of the process (a time-inhomogeneous model) or time since the previous transition (a semi-Markov model).Time-inhomogeneous models in msm are restricted to piecewise-constant intensities.The choice of change points is unlimited, though in practice the results may be sensitive to this choice.Continuously-changing intensities, for example with a Weibulldistributed time to the next transition, are generally more scientifically plausible and may be more parsimonious.The resulting Kolmogorov differential equations for obtaining P (u, t+u), hence the likelihood for panel data, are analytically intractable, but can be solved numerically in simpler instances.For example, Chen et al. (2004) andHsieh et al. (2002)modelled only one state with a time-varying sojourn distribution in this way.Hubbard et al. (2008)fitted inhomogeneous models by estimating a time transformation under which the inhomogeneous Markov model is homogeneous, assuming the ratio of transition intensities stayed constant through time.