Using Lexis Objects for Multi-State Models in R

The Lexis class in the R package Epi provides tools for creation, manipulation and display of data from multi-state models. Transitions between states are described by rates (intensities); Lexis objects represent this kind of data and provide tools to show states and transitions annotated by relevant summary numbers. Data can be transformed to a form that allows modelling of several transition rates with common parameters.


Introduction
Multi-state data arise from epidemiological follow-up studies in which a participant may go through different disease states during follow-up.The most common examples are the illness-death model and the competing risks model.Illness-death models are used in studies of mortality when an intermediate state ("ill") may occur between the entry state ("healthy") and the mortality endpoint ("dead").The competing risk model is a model with one "healthy" state and a number of mutually exclusive absorbing disease endpoints (causes of death, for example).
The Lexis class in the Epi package (Carstensen, Plummer, Laara, and Hills 2010), for R (R Development Core Team 2010), which is introduced in a companion paper (Plummer and Carstensen 2011), is designed to facilitate long-term follow-up studies.It includes the following extensions to handle multi-state data: Allowing the use of factors for states, as well as numeric codes.This allows the states to have descriptive labels.
Taking account of the ordering of disease states when splitting follow-up time.
Creating new time scale variables to describe the time spent in a state.
These capabilities are described in the current paper.In this paper we are only looking at multi-state data for which the transition between states is known exactly.Panel studies, for which the disease state is only known at certain dates and transition times are not known exactly, are not covered in this paper.

The example data
The dataset we shall use for illustration is the ebmt3 dataset from the R package mstate (de Wreede, Fiocco, andPutter 2010, 2011).It shows the clinical course of bone marrow transplant for 2204 patients.A more detailed introduction to the dataset is given in a tutorial paper by Putter, Fiocco, and Geskus (2007).The major event of interest is "relapse or death", and an intermediate event is "platelet recovery".The time variables in the embt3 dataset are given in days since transplant, they are time to relapse or death in the variable rfstime, and time to platelet recovery in the variable prtime.It is of interest to compare the rates of relapse or death between the states "transplant" and "platelet recovery", but also to model the transition rate from "transplant" to "platelet recovery".The structure of the multi-state model is illustrated in Figure 1.

Statistical model
For each transition between two states (an "event"), we can describe the process by the transition rate: If transition rates are known for all possible transitions in a multi-state set-up, we have a complete probability model, that is a model that would allow simulation of data of the same structure as the observed.
If rates are assumed constant, the likelihood for the model will be proportional to a Poisson likelihood using the number of transitions as Poisson variate and the log person-years as offset in a log-link.
If we assume that rates are merely constant in smaller time-intervals, the likelihood will still be proportional to a Poisson likelihood, but now with one 0/1 contribution from each person and interval as response, and the log of the interval length as offset (Carstensen 2006).
If we make no assumptions at all for the transition rates we can fit a model for the cumulative rates: λ(s) ds using e.g. the Nelson-Aalen estimator or using a Cox model to incorporate covariate effects.
An extensive overview of these issues is given in Andersen and Keiding (2002).
The purpose of setting up a Lexis object is to facilitate required manipulations of data for these types of models.

A Lexis object
The required data for this type of analysis is for each person the starting time and state and the transitions and -times and the state and time of end of observation.Depending on availability of covariates, further timescales can be defined (current age, current disease duration, etc.).Essentially what is needed is the allocation of follow-up time to states, and timescales.
ebmt3 is merely a data frame with two time-variables in it, one for each of the events "platelet recovery" and "relapse or death".So if we want to put the follow-up recorded in the embt3 data frame into a Lexis object, we must decide on the timescale(s) we want to use (in this case only one).The first thing we do is to only consider the transition from "transplant" to "relapse or death", and then later incorporate the "platelet recovery".
This example illustrates the use of a factor to represent the states in the model.The numeric codes 0,1 of the rfsstat variable are given informative labels "Tx" and "RD".If no entry state is given, then the Lexis function assumes that everyone starts in the first factor level, which is Tx in this case.This is noted by the Lexis function.
The Lexis object has a single time scaletft -measuring time f rom transplant in years.
If no entry time is given by the argument entry, the Lexis function assumes a default entry time of 0. Since follow-up time begins on the day of transplant, this is the correct entry time on the tft scale.This is also noted by the Lexis function.Rates: To From Tx PR RD Total Tx 0 0.48 0.19 0.67 PR 0 0.00 0.12 0.12 We now have 3373 records instead of 2204, because of the 1169 transitions to PR.We still have the same amount of risk time (5612 person-years) and deaths (841) in total, but now subdivided by state (Tx and PR).
When creating a new state with the cutLexis function, it is necessary to specify which of the existing states in the model are precursor states.Precursor states will be overridden by the new state in the newly created Lexis object.In this example, if a person originally exited the study in state Tx (a precursor state), then an entry to the new state (PR) will mean that the person exits in the new state, whereas if the person exited the study in state RD (a non-precursor state) the exit from the study will still be to this (original) state.
In order to understand which states are precursor states, it is helpful to draw a graph of the multi-state model, such as the examples in Figure 1.Most multi-state disease models are progressive models, in which the disease can only increase in severity and never regresses.Such models are represented by directed acyclic graphs, which determine a partial ordering among the states.A state A is a precursor of state B if there is a path from A to B in the graph.Absorbing states, such as RD in Figure 1(b) are never precursor states.By default the newly created state is placed after the precursor states nominated, and before all non-precursor states in the ordering of the levels of the factor describing the states.

Duration as a timescale
So far we have only used one common timescale for follow-up, but often we are interested in the effect of the duration of platelet recovery as timescale.This timescale can be generated using the new.scaleargument.If set to TRUE a new timescale will be generated, named PR.dur;alternatively we can set the argument to the name of the new timescale: R> bmtr <-cutLexis(bmt, cut = bmt$prtime/365.25,precursor.states= "Tx", + new.state= "PR", new.scale= "tfPR") R> timeScales(bmtr) [1] "tft" "tfPR" This representation retains the total amount of risk time in the study; it merely adds an extra timescale to the follow-up time after transition to the PR state.Since this timescale is 0 at the entry of the new state this is sometimes referred to as the "clock back" approach.
Since we now have two timescales we can make a two-dimensional Lexis diagram, as shown in Figure 2. The diagram only shows follow-up after PR, since one of the timescales is time since PR, which obviously is not defined before the event.
R> plot(bmtr, xlab = "Time since transplant", + ylab = "Time since platelet recovery") From the plot we see that most platelet recoveries occur within the first year after transplant.

Subdividing states
We may also be interested in subdividing the states subsequent to PR according to whether this event has occurred or not; this is done with the argument split.states:

Rates:
To From Tx PR RD RD(PR) Total Tx 0 0.48 0.19 0.00 0.67 PR 0 0.00 0.00 0.12 0.12 This creates a Lexis object for the model represented in Figure 1(c).
Subdividing states is useful when drawing multi-state models as box-diagrams as shown in the next section.It can also be of interest if simulation-based inference on occupation probabilities for various states are of interest.

Illustrating the model structure
The model structure can be illustrated in a box diagram as shown in Figure 1 using the interactive function boxes.Lexis, which will prompt you for a click for the position of each of the states, and subsequently draws arrows between those states where transitions occur.
If we put boxpos = TRUE (the default is FALSE), the states are put in a circular arrangement which usually is not very nice: Figure 3: Illustration of the transitions between defined states for the ebmt3 data, using the default layout (left) and a hand-tailored using point-and-click.The numbers on the arrows are the number of transitions, and the numbers in the boxes are person-years.

R> boxes(bmtr, boxpos = TRUE)
That can be improved by using the interactive facility, using the default boxpos = FALSE, which will prompt you to click on the screen to place the various boxes.
Frequently one would need the exact same plot with slight modifications, for example with one or more of the arrows in a different color.To this end it is possible to use the argument file to specify an output file that will contain code generating the display.The code in this file can then be modified.
R> boxes(bmtr, file = "bmt-boxes.r")This is the resulting code, where we modified the coordinates of the boxes to make them nicely aligned.The first argument to the boxes function is a Lexis object, and so the default behaviour is to show the risk time (person-years) in those boxes where they are accumulated, and the number of transitions along the arrows.The function has arguments for scaling the person-years and putting rates instead of number of transitions on the arrows.
The first argument to boxes may also be a transition matrix, in which case only the boxes and arrows are drawn, and requires an explicit call of the function boxes.Lexis.

Analysis of transition rates
Traditionally a Cox model is used for the analysis of rates.We can analyse transitions between boxes separately by selecting the appropriate rows and choosing the right event-type.
There is minimal difference in the code for the models for the different transitions; it is only the failure type (using lex.Xst) and the subset (using lex.Cst) that is different: These are exactly the same estimates as those obtained in Table III of Putter et al. (2007).
If some parameters are assumed to be common between (some of the) transitions, the analysis must be based on a dataset that is made by stacking the subsets of the data frame we used in the three separate analyses above.The stacked dataset must have a common failure indicator and a factor classifying the transitions.This dataset can be constructed by using the stack.Lexis function.The effect of stacking the dataset is illustrated by printing the records for person 2 from the Lexis object and the stacked object: The default model matrix set-up for interactions generates interactions assuming an intercept in the model.But there is none in a Cox model (it is relegated to the baseline hazard), so we get aliased parameters when all three levels of dissub have interaction columns with lex.Tr generated.It gives the correct model fit, but the reference level for the factor dissub is the last rather than the first.This can be remedied by explicitly tampering with the model matrix before fitting the model; we just generate the model matrix and then remove the columns relating to the reference level.This has the advantage that we maintain the annotation of the parameters: R> mm <-model.matrix(~lex.Tr:(dissub + age + drmatch + tcd), data = bmts) R> rm <-grep("AML", colnames(mm)) R> mm <-mm[, -c(1, rm)] R> cx <-coxph(Surv(tft, tft + lex.dur, lex.Fail) ~mm + strata(lex.Tr), + data = cbind(bmts, mm), method = "breslow")

Joint parameters between transitions
Of course it is not of much interest to make a joint model which gives the same results as the models fitted separately to each transition.The point comes from reducing this model to one which is more parsimonious yet sensible.One such reduction would be to assume that the two mortality rates, Tx->RD and PR->RD are proportional -or, in plain words, covariate effects are the same for both sets of mortality rates.This means that the Cox model should be changed so that the baseline hazard is the same for the two transitions, which just means that no strata argument is needed.However to avoid a model where we assume that the rates are identical (which would be nonsensical), we must introduce an indicator of being in state PR, but that is just the variable lex.Cst: R> c2 <-coxph(Surv(tft, tft + lex.dur, lex.Fail) ~dissub + age + + drmatch + tcd + I(lex.Cst == "PR"), data = bmts, method = "breslow") Note that the logical expression identifying platelet recovery is wrapped in an "I()" in order to make coxph create an extra column adjacent to the supplied model matrix (if not the model formula will be interpreted as (tcd + lex.Cst)=="PR", which will cause a crash.
We now get one set of rate ratio parameters as before, plus a rate-ratio between the two transitions to death, replacing the two separate baseline hazards with a pair of proportional ones with proportionality factor: R> ci.lin(c2, subset = "PR", Exp = TRUE) Estimate StdErr z P exp(Est.)2.5% 97.5% I(lex.Cst == "PR")TRUE -0.314 0.0725 -4.33 1.51e-05 0.731 0.634 0.842 We cannot use the usual anova machinery to make a test of proportionality by comparing the two Cox models, because the likelihoods from the stacked datasets only are concerned with the regression parameters and the baseline is profiled out.So for this purpose we must resort to fully parametric models to make a formal likelihood ratio test for this.

Analyzing proportionality of rates by Poisson modeling
It is of course of interest to test whether the two transition rates actually are proportional or not, i.e. whether the dependency on time (since transplant) is the same pre and post platelet recovery.This is an interaction, and can be formulated as such in a Poisson model where we assume that rates are constant in (small) intervals, and then invoke a model for the rates in these intervals that assumes the rates are constant in each interval, but that the magnitude of the rates follow some smooth function.This is a slightly different approach from the one taken in the companion paper (Plummer and Carstensen 2011), where the functional form was assumed piecewise constant in larger intervals.
The Epi package provides the time-splitting function for Lexis objects that facilitates this operation: Rates: To From Tx PR RD RD(PR) Total Tx 0 0.48 0.19 0.00 0.67 PR 0 0.00 0.00 0.12 0.12 Note that the intervals that we use are not equally long.We see that the number of events and amount of risk time is unchanged in bmtx; the only difference is that it is distributed across many more records than in bmtr.
As before we can stack the dataset using stack.Lexis.Note that we cannot use splitLexis on a stacked dataset because it is no longer a Lexis object.Hence stacking must be done after splitting data.Finally we are only interested in the two transitions to the relapse/dead (RD) state, and hence use factor to produce a factor with only those levels that actually do occur.
R> bmtxs <-stack(bmtx) R> bmtxs <-bmtxs[grep("->RD", bmtxs$lex.Tr), ] R> bmtxs$lex.Tr <-factor(bmtxs$lex.Tr) We can now repeat the analysis of the two rates and also get a formal test for the proportionality of the two mortality rates as a simple likelihood-ratio test by fitting the model with and without a time×state interaction.There is also an intermediate model that in the Cox model literature is known as the "stratified model" where there is an interaction with time but not with other covariates.
We fit these three models and compare them by the usual asymptotic likelihood ratio test: There is no significant difference between the two first models, so we may assume that covariate effects are the same for both transitions, but the underlying hazards seem to have a different shape.
If we for a moment forget the interaction we can compare the estimated rate-ratio associated with platelet recovery as estimated from the Poisson model and the Cox model: R> rbind(ci.lin(mp,subset = "lex.Tr", Exp = TRUE), + ci.lin(c2, subset = "lex.Cst", Exp = TRUE)) Estimate StdErr z P exp(Est.)2.5% 97.5% lex.TrPR->RD(PR) -0.296 0.0727 -4.07 4.74e-05 0.744 0.645 0.858 I(lex.Cst == "PR")TRUE -0.314 0.0725 -4.33 1.51e-05 0.731 0.634 0.842 We see that the difference in estimates is negligible (1/4 of the estimated s.e.).The advantage of the Poisson-modeling over the Cox model is that we get direct estimates of the baseline rates, and that visual comparison of them is quite straightforward, so that we can inspect the clinical relevance and not only the statistical significance of the interaction ("non-proportional rates").
In order to fish out the estimated log-rates from the model we must multiply the parameters from the ns-term in the model with a matrix so that the estimated log-rates at some prespecified times are recovered.
To this end we construct a "contrast" matrix to multiply with the relevant spline parameters, by setting up the splines for a predefined set of times.We compare to the rates in the Tx state at 1 year after platelet recovery: This matrix can now be applied to the estimates for the splines for the two transitions.The relevant parameters are selected using the subset argument to ci.lin, which both selects and sorts the parameters of interest.First we use ci.lin to show the total set of parameters: Note that we should formally also have included the intercept among the parameters, but in this case it would just cancel out.
The argument Exp=TRUE to ci.lin computes the exponential of the resulting parameter functions with 95% confidence intervals and puts them in columns 5 to 7 of the result: R> head(rr.Tx) 2.99 2.17 4.13 For the state PR level we want the RR relative to time tft==1 for state Tx.Thus we want both the parameters for each of the transitions, but also the parameter giving the extra contribution for the Pr->RD transition relative to the Tx->RD transition: These must be multiplied by columns that give the RR relative to the rates from Tx at time 1: R> rr.PR <-ci.lin(ms,subset = c("PR", "Tx"), ctr.mat = cbind(1, CM, -C1), + Exp = TRUE) [, 5:7] The estimated rate-ratio between these two RRs is found by subtracting the two sets of parameter functions just derived.This means multiplying the PR parameters by cbind(1, CM), and the Tx parameters by -C1 -(CM -C1) = -CM: R> RR <-ci.lin(ms,subset = c("PR", "Tx"), ctr.mat = cbind(1, CM, -CM), + Exp = TRUE) [, 5:7] We can now plot all three sets of curves (estimates with c.i.s): R> matplot(p.pt,cbind(rr.Tx, rr.PR, RR), log = "y", ylim = c(0.1,5), + type = "l", lty = 1, lwd = c(3, 1, 1), las = 1, + col = rep(c("red", "blue", "black"), each = 3), + xlab = "Time since transplant", ylab = "Rate ratio") R> points(1, 1, pch = 16, col = "red") R> abline(h = 1) Clearly, the difference between the rates is confined to the first year, where the RR relative to year 1 in the Tx group is higher than in the PR group.After the first year there is no discernible non-proportionality of rates between the groups.

Connecting to the mstate package
The Epi package has a function that transforms a Lexis object to a msdata object which can be used in the mstate package (de Wreede et al. 2010Wreede et al. , 2011))

Conclusion
The Lexis objects in the Epi package come with a range of facilities to make the modelling of multi-state data easier.Lexis objects simplify the initial exploration of the data structure by displaying the states and transitions, and by showing the follow-up time along different timescales.They also allow simplification and structuring of the code for analysis of rates that may be analysed separately or jointly.

Figure 2 :
Figure 2: Two-dimensional Lexis diagram of the bone-marrow transplant data from ebmt3.Note that this diagram only shows follow-up after PR, as the timescale tfPR is undefined (=missing) for follow-up prior to PR.

Figure 4 :
Figure4: RR relative to year 1 after transplant for persons in states Tx (red) and PR (blue), as well as the RR between the mortality rates in PR and Tx (gray).Thin lines are estimated 95% confidence intervals.
Two extra variables lex.Tr and lex.Fail have been added; lex.Tr is a factor indicating the transition, and lex.Fail is the (logical) failure indicator for the transition.Since the variables lex.Cst and lex.Xst have lost their original meaning, the resulting object is not a Lexis object.