A SAS Program Combining R Functionalities to Implement Pattern-Mixture Models

Pattern-mixture models have gained considerable interest in recent years. Pattern-mixture modeling allows the analysis of incomplete longitudinal outcomes under a variety of missingness mechanisms. In this manuscript, we describe a SAS program which combines R functionalities to ﬁt pattern-mixture models, considering the cases that miss-ingness mechanisms are at random and not at random. Patterns are deﬁned based on missingness at every time point and parameter estimation is based on a full group-by-time interaction. The program implements a multiple imputation method under so-called identifying restrictions. The code is illustrated using data from a placebo-controlled clinical trial. This manuscript and the program are directed to SAS users with minimal knowledge of the R language.


Introduction
Many experiments are concerned with analysis of longitudinal studies with outcomes suffering from missing values.In clinical trials, a main cause of missingness is subject dropout.Outcome values are missing at random (MAR) if the dropout mechanism is independent of missing outcome values, conditionally on the observed ones.If the covariates are fully observed, additional dependence on covariates is allowed for too.When MAR fails to hold, missing outcome values are said to be missing not at random (MNAR).MNAR means that the probability of dropout depends on an outside variable not in the model or is related to unobserved outcome values at the time of dropout and possibly afterwards.The consequence of MNAR is that missing outcome values cannot be reliably imputed using observed measurements (i.e., covariate and outcome values).More precisely, explicit modeling of the dropout mechanism is needed.Pattern-mixture modeling (PMM) is a framework that can be considered when the dropout mechanism is MNAR; see, e.g., Verbeke and Molenberghs (2000).PMM stratifies the sample of subjects based on outcome patterns and formulates distinct models to estimate parameters within each stratum.The program described in this manuscript implements PMM analysis using a multiple imputation (MI) method under socalled identifying restrictions.Patterns are defined based on dropout at every time point.The estimation models incorporate a full group-by-time interaction for fixed effects and an unstructured error covariance matrix.
As the parameters of the estimation models identify each time point, some of these parameters are unidentified in incomplete patterns.These consequences can be overcome by using information available in other patterns.The identifying restrictions simply indicate from which patterns missing information is borrowed.The program described in this manuscript implements PMM analysis under several identifying restrictions.The complete-case missing values (CCMV) approach, introduced by Little (1993), stipulates that missing information is borrowed from completers (i.e., from subjects with complete outcome profile).In the neighboring-case missing values (NCMV) approach, the closest neighboring pattern is used.In other words, missing information at a time point is borrowed from the nearby pattern for which outcome values are observed at this time point, but unobserved later.The availablecase missing values (ACMV) approach offers a compromise between CCMV and NCMV as missing information is borrowed from all available patterns weighted by occurrence of each pattern.ACMV has a particular status since this is the natural counterpart of MAR in the PMM framework.In practice, analysis under MAR can be a starting point for sensitivity analyses under MNAR.Another type of identifying restrictions, termed non-future missing values (NFMV), offers a relevant alternative because the user has full freedom to choose the distributions of the first unobserved values (which are termed the present values), given the previous measurements.Under NFMV, the missingness mechanism depends on the past and the present, but not on future unobserved outcome values.
The American National Academy of Sciences underscored the need to develop programs of analysis assuming MNAR; see National Research Council (2010, p. 114).New features in version 9.4 of SAS PROC MI (SAS Inc. 2014) allow one to generate imputations in the PMM framework under CCMV and NCMV.In earlier work, I-BioStat (2007) developed a SAS program which generates multiple imputations based on a mixed model for repeated measures (MMRM).However, some computations use approximations to ease matrix operations.The program described in this manuscript takes up the main aspects of algorithm but performs exact calculations.To do so, we use new features in SAS PROC IML (SAS Inc. 2011) that enable to call R functions from within the IML procedure.This automatic link is available from version 9.3 of SAS under Windows and Linux.Nevertheless, if another operating system (OS) is used and/or if SAS PROC IML version 9.3 or later is not available, thorough instructions are provided to execute the program anyway.The case study, which is used to illustrate the code, is a placebo-controlled clinical trial to assess the effect of a test drug in the treatment of macular degeneration.
The program is directed to SAS users with minimal knowledge of the R language (R Core Team 2015b).Different aspects of PMM analysis are summarized in the next section.The algorithm and environment requirements are exposed in Section 3. Information to initiate program execution is provided in Section 4, whereas Section 5 describes the different stages of a case study analysis.Section 6 shows results of analyses under all the identifying restrictions available in the program.Additional information to specify other model parameterizations is given in Section 7.

Identifying restrictions
In our context, patterns are defined based on dropout at every time point, except baseline.More precisely, if the tth outcome value is the last observed one and subject drops out after this point, this subject belongs to pattern t, t = 1, . .., T .The rationale for PMM stems from a particular decomposition of the joint distribution of the outcome variable together with the dropout indicator.
The pattern-mixture distribution of complete outcome values y 1 , . .., y T is given by: where α t denotes the proportion of pattern t and f t (y 1 , . .., y T ) stands for f (y 1 , . .., y T |t).
In (1), the distribution of the whole population is expressed in terms of a mixture of the distributions of pattern populations.These in turn can be decomposed as: (2) The first component in (2) is identified from the observed outcome values.The second component is a product of conditional pattern distributions that are unidentified because the values of y s are unobserved.Identification of unidentified parameters is a crucial aspect as the conditional pattern distributions form the basic framework for the imputation process.
The identifying restrictions enable to overcome this problem: Unidentified parameters of incomplete patterns are set equal to appropriate functions of the parameters describing the distributions of other patterns.
We now describe the identifying restrictions available in the program.Further descriptions can be found in Thijs, Molenberghs, Michiels, Verbeke, and Curran (2002).Under CCMV, identification is based on the pattern of completers, which is pattern T .This identification can be formalized as: Under NCMV, the neighboring pattern is used.We then have: Identification can also be based on all identified patterns as specified in the formulation: ω sj f j (y s |y 1 , . .., y s−1 ), s = t + 1, . .., T. (5) We will use ω s as shorthand for the set of positive ω sj 's.Every ω s that sums to 1 provides a valid identification scheme.Note that (5) results in the special case (3) by setting ω sT = 1 and all other ω sj = 0 and to the special case (4) by setting ω ss = 1 and all other ω sj = 0.
Molenberghs, Michiels, Kenward, and Diggle (1998) determined ω s such that (5) corresponds to ACMV.The coefficients are defined as: Alternatively, non-future dependence is an assumption under which missingness is allowed to depend on the past and the present, but given these not on the future.Kenward and Molenberghs (2003) determined the identifying restriction, named NFMV, which corresponds to non-future dependence.Under NFMV, the conditional pattern distributions of present outcome values are left unconstrained.For the sake of clarity, these are noted g t in what follows.
Our program implements two types of NFMV.Under NFMV CC , the g t functions are set equal to their f T counterparts in the spirit of CCMV.In addition, the user has the possibility to introduce a location parameter ∆.An explicit formulation of this is given by: g t (y t+1 |y 1 , . . ., y t ) = f T (y t+1 +∆|y 1 , . . ., y t ).

Multiple imputation method
The program implements a PMM analysis for Gaussian outcome variables, following the standard three-stage MI way, as described in Rubin (1987).

Pattern parameter estimation
Distinct models are formulated to fit the outcome variable within each pattern.The program uses MMRMs with a full group-by-time interaction at every time point for the fixed effects and an unstructured error covariance matrix.Time is treated as a class variable and no random effects are specified.
Let us denote by Y i = (y i,1 , . . ., y i,T ) the complete outcome vector for the ith subject of pattern t and Y i,obs = (y i,1 , . . ., y i,t ) its observed part.The MMRMs per pattern can be expressed as: where i ∼ N(0, Σ t ), Σ t is unstructured, and the i 's are independent.The matrix X i contains the known fixed-effects covariates whereas β t contains the unknown parameters.
This first stage provides the estimates β t , VAR( β t ), Σ t , and VAR(col( Σ t )), where col( Σ t ) is the vector containing the coefficients of the diagonal and the lower part of Σ t .

Imputation
Imputation of missing outcome values is conducted sequentially by value.We describe herebelow how to obtain a run of M imputed values of y i,t+1 .Multiple imputation of y i,t+2 , . .., y i,T follows the same process by considering the previous imputed values as observed ones.
The imputed values of y i,t+1 are drawn from conditional pattern distributions.The selection of the pattern of imputation is driven by the identifying restriction chosen.In our illustration, we first suppose that pattern r (t + 1 ≤ r ≤ T ) is used.Let us introduce µ i,r for the mean of Y i , which is µ i,r = X i β r .Based on appropriate parts of µ i,r and Σ r , we further define the distributions of Y i components as Y i,obs ∼ N(µ i,r,1 , Σ r,11 ) and y i,t+1 ∼ N(µ i,r,2 , Σ r,22 ).Their covariances are denoted Σ r,12 and Σ r,21 .Using 2|1 as notation for y i,t+1 |y i,1 , . .., y i,t , the conditional pattern distribution of y i,t+1 given y i,1 , . .., y i,t is described by: f r (y i,t+1 |y i,1 , . . ., y i,t ) ∼ N (µ i,r,2|1 , Σ r,2|1 ), where Uncertainty pertaining to the pattern parameters β r and Σ r is incorporated through their Bayesian posterior predictive distributions.On the basis of Gaussian distributions and noninformative Jeffreys' priors, the values of β r , m = 1, . . ., M, are respectively randomly drawn from the posterior distributions N( β r , VAR( β r )) and N(col( Σ r ), VAR(col( Σ r ))).After the derivation of µ (m) i,r , the imputed values of y i,t+1 are drawn from the conditional pattern distributions which are expressed by: Under ACMV and NFMV, ( 5) and ( 7) indicate that imputation of a missing y i,t+1 value is based on a sum of conditional distributions of several patterns weighted by occurrence of each pattern.In the MI setting, this weighted summation is handled via random pattern selection over imputations.For each imputation, we calculate the coefficients ω sj in ( 6) and ( 8), which characterize pattern probabilities.Random pattern selection by imputation is based on coefficient values, noted ω (m) sj , and another value U (m) which is drawn from the uniform distribution.So, pattern p is selected if:

Pooled analysis
The complete data sets are fitted using the modeling strategy described in (9) for pattern parameter estimation.The MMRM approach incorporates a full group-by-time interaction at every time point for the fixed effects and an unstructured error covariance matrix.The analysis model can be expressed as: where * i ∼ N(0, Σ), Σ is unstructured, and the * i 's are independent.The vector β * contains the unknown parameters of fixed effects.
The principle of MI is to combine the inferences made on the M imputations into a single one.

Let us define β * (m)
, m = 1, . .., M , the estimators of β * by imputation and β * the pooled estimator of β.In the usual Rubin's formulation 's whereas the pooled variance V is given by where W is the average of VAR( β * (m) ) and B is the between-imputation variance: The pooled estimator β * is consistent under MNAR.The F -distributions of the tests of fixed effects are given in Li, Raghunathan, and Rubin (1991).

Algorithm
The programming strategy was driven by the respective strengths of SAS and R. The use of SAS to handle and analyze data is widespread in some industries, such as the pharmaceutical industry.Consequently, many statisticians are familiar with pre-programmed SAS procedures such as SAS PROC MIXED to fit continuous longitudinal outcomes and SAS PROC MIANALYZE to combine inferences in the MI setting.On the other hand, R provides an optimized environment for operations on matrices and, in our context, is more indicated for the imputation stage which requires many calculations of this type.
The algorithm is described for each of the three tasks of standard MI.For each task, we specify which functionalities of SAS and R are used.The algorithm is described as follows: 1. Pattern parameter estimation: Use SAS PROC MIXED to fit outcome per pattern and estimate parameters.Impute using the patterns specified by the identifying restriction chosen.

Pooled analysis:
Use SAS PROC MIXED to analyze complete data sets.
Use SAS PROC MIANALYZE to combine inferences.

Two important remarks concern the imputation stage:
Intermittent missing values (i.e., values missing not due to dropout) are multiply imputed by drawing values from the conditional distribution of the subject pattern, given all the other available outcome values.So, if a subject belongs to pattern t, imputation is based on parameters of this pattern.Some cases may require a large number of imputations (information about the number of imputations needed is provided in Section 6).This number strongly impacts program execution time because of data handling with R, which slows down the program process as the number of imputations increases.To gain time, multi-step functionality that enables execution of successive batches of imputations has been introduced.The gain becomes rapidly substantial: In the case study, a total of 500 imputations is set by default; the program execution with five batches of 100 imputations is twice faster than with a unique batch of 500 imputations.
The call of R from SAS PROC IML is available from version 9.3 of SAS under Linux and Windows OS.If the user's environment does not enable an automatic link between SAS and R, the structure of the program nevertheless allows for execution.However, the multi-step functionality for imputation is no longer available.To do so, the three stages of the MI procedure must be executed separately as follows: 1. Pattern parameter estimation: Select lines 1-223 in the SAS code and execute.

Imputation:
Select lines 2-374 in the R script and execute.This selection excludes the statements related to the link with SAS PROC IML which are submit / R; and endsubmit;.

Pooled analysis:
Select lines 251-279 in the SAS code and execute to analyze complete data sets.
Select lines 307-310 and execute to combine inferences.
The combination of the SAS and R functionalities and the call of R from SAS was a deliberate choice, as the program is primarily targeted at SAS users.However, the clear separation between the three stages, i.e., (1) pattern parameter estimation, (2) imputation, and (3) pooled analysis, of the program allows alternative implementation strategies.For example, it is quite feasible to invert the initial user's environment and call SAS from R.

Environment requirements
Some environment parameters must be set-up prior to the first program execution.To call R, the SAS system must be launched with the -RLANG command.This command can definitively be inserted in the file SASV9.cfg, which is stored in the folder \nls\en of the user's SAS environment.The -RLANG command allows the execution of the RLANG option in the SAS program using the syntax: proc options option=RLANG;.Under Windows, a reason of failure may be that the path to the R directory (and not to the file R.exe) is not registered as a Windows environment variable.In this case, the user must enter this information manually (e.g., Variable=R_HOME, Value=C:\Program\Files\R R-2.15.1).
The functionalities of SAS PROC IML which enable to call R are available from SAS 9.3.The interface with R until version 2.15.3 is supported in SAS 9.3.However, the interface with R 3.0.x is not supported in SAS 9.3, but is supported in SAS 9.4.
The part of the program written in SAS is contained in the file PMM.sas whereas PMM.R contains the R script.SAS PROC IML calls the R script via the command: \%include "\&Path2WorkDir/PMM.R"; The SAS data files produced throughout program execution are stored in the working directory.Import/export of data files between SAS and R is done using R functions.Note that such functionalities also exist in SAS PROC IML, but only concern numerical data in matrix format.
Program execution requires the three R packages Hmisc (Harrel 2015), foreign (R Core Team 2015a), and MASS (Venables and Ripley 2002;Ripley 2015) to be installed.If these packages are not available in the user's R environment, these can be downloaded from the Comprehensive R Archive Network at http://CRAN.R-project.org/ by removing the hash sign # in the following three lines at the top of the R script: # install.packages("Hmisc")# install.packages("foreign")# install.packages("MASS") We strongly recommend the user to install the packages in the R environment, i.e., without using SAS to launch R, by executing the three commands here-above in the R editor.It is also important to put back the hash signs once the packages are installed.More generally, any substantial modification in the R script should be done in the R environment, using either the R editor or any R interface such as RStudio or Tinn-R (Faria, Grosjean, Jelihovschi, Pietrobon, and Silva Farias 2015).

Initiation of program execution
To initiate program execution, the user must put the three files PMM.sas, PMM.R, and Data .sas7bdat in a working directory, which is for example C:/WorkDir.The SAS file PMM.sas and the R file PMM.R contain the program code, whereas the SAS data file Data contains the data of the case study.In this section, we describe how to initiate execution of the case study.We also provide information to use the program in other circumstances.

Description of the case study
The case study arises from a randomized clinical trial comparing a test drug with a corresponding placebo in the treatment of subjects with age-related macular degeneration.Subjects with macular degeneration progressively lose vision.In the trial, the subjects' visual acuity was assessed through subjects' ability to read lines of letters on standardized vision charts.These charts display lines of 5 letters of decreasing size, which the patient must read from top (largest letters) to bottom (smallest letters).The subjects' visual acuity is the total number of letters correctly read.
Subjects were asked to undergo a pre-randomization visit (baseline) and four post-randomization visits, i.e., visit 1 at 4 weeks, visit 2 at 12 weeks, visit 3 at 24 weeks, and visit 4 at 52 weeks.Treatment-effect inferences on visual acuity at visit 4 is the primary focus of the statistical analysis.

Data file characteristics
Program execution requires a SAS data file with standard variable characteristics.SAS variables should be standardized as follows: Subject: Subject number (positive integer).
Rep: Repetition or occasion number (positive integer).
It is important to note that the name of covariates should start with 'Cov'.The program does not handle data sets with missing values in the first outcome value and/or in covariates, if any.
In the case study, the contents of Data for subjects 1, 2, 3 take the form:

Specification of paths to access files
The program needs to access files.The path to access the working directory must be specified at the top of the SAS file PMM.sas in the SAS macro variable &Path2WorkDir.In the case study, the default path is specified as: %let Path2WorkDir = "C:/WorkDir"; Then, the paths to access the working directory and the file SAS.exe must be specified at the top of the R file PMM.R.This can be done using the R editor or any other text editor (e.g., WordPad).In the case study, the default paths are specified as: Path2WorkDir = "C:/WorkDir" path2SASexe = "C:/Program Files/SASHome/SASFoundation/9.3" Nothing else needs to be specified in the R script.

Specification of model parameters
The user must indicate the fixed-effect parameters in the SAS macro variable &Covariates.
The full group-by-time interaction should be specified with parameters Int1, Int2, . . .for each time point effect and Group1, Group2, . . . .for between-group differences at each time point.Covariates should be specified with Cov1, Cov2, . . . .as for the SAS variables in the data file.
In the case study, the fixed-effect parameters are specified as: The columns of the design matrix of fixed effects, except covariates, can be derived in the data step DefineColumns.In the case study, these are derived as follows: data  A value of &Delta is required if the identifying restriction chosen is either NFMV CC or NFMV NC .Otherwise, &Delta is ineffective.
In the case study, the default values of imputation parameters are set to: %let nSteps = 5; %let nImputations = 100; %let Restriction = NFMV-CC; %let Delta = 2; %let Rounding = 1; This parameterization means that PMM analysis will be conducted under the identifying restriction NFMV-CC with the location parameter ∆ = 2 based on five batches of 100 imputation, that is 500 imputations totally.Missing outcome values will be imputed with a rounding of one decimal.
Nothing else needs to be specified for program execution.SAS PROC MIXED produces the SAS data files Solution, CovB, CovParms, and AsyCov, that respectively contain the estimates β t , VAR( β t ), col( Σ t ), and VAR(col( Σ t )), t = 1, . .., T .These estimates are then stored in the SAS data file Estimates2R.

Analysis of the case study
At the end of this stage, three SAS data files are exported to R: Estimates2R which contains the pattern parameter estimates.

Outcome2R which contains the outcome values and the derived variables:
-MissInter: Indicator of intermittent missing values.
-MissDrop: Indicator of missing values due to dropout.
-Pattern: Subject pattern number.
-nValues: Number of outcome values before subject dropout.
MIparameters2R contains the parameter values for imputation.

Imputation
Imputation is entirely conducted using R.The file PMM.R consists of four parts which are described here-below.Further details about programming rules are given in the script.

Generation of β t values by imputation
The values of β (m) t , m = 1, . .., M , are randomly generated from distributions N( β t , VAR( β t )), t = 1, . .., T , using rMVNorm().Values are then stored in the data frame BetaImput in which β (m) t coefficients are numbered according to the ordering number of the fixed effects specified in &Covariates.
In the case study, the nine fixed effects specified in &Covariates are Cov1 Int1 Int2 Int3 Int4 Group1 Group2 Group3 Group4.For example, Cov1 Int1 Group1 are the three identified parameters in Pattern 1.These correspond to the first, the second, and the sixth fixed effects in &Covariates.This numbering is kept for β

Generation of Σ t values by imputation
The values of col( Σ (m) t ), m = 1, . .., M , are randomly generated from the distributions N(col( Σ t ), VAR(col( Σ t ))), t = 1, . .., T, using rMVNorm().Values are then stored in the data frame SigmaImput in which col( Σ (m) t ) coefficients are numbered according to their ordering number by column by row after transposition into a virtual T × T -dimensional matrix.
In the case study, the virtual matrix is 4 × 4-dimensional and contains T(T+1)/2 = 10 coefficients in its diagonal and lower triangular part.The values and the numbering of the coefficients of col( Σ  , where 2|1 stands for Y i,miss |Y i,obs , are derived using MVCond().Then, outcome values are randomly drawn from the conditional pattern distributions f ), m = 1, . .., M , using rMVNorm().We now describe how the program imputes missing outcome values due to dropout.We illustrate the process with the case study through one of the imputations, say imputation 1 (m = 1), in subject 11.This subject has only one observed outcome value, y 11,1 = 50, and so, the three outcome values y 11,2 , y 11,3 , y 11,4 need to be imputed.Initial information about subject 11 is shown here-below: After each imputation in each subject, the R script produces: A matrix MatOutcomeSI which contains the random draws of missing outcome values in all available patterns (row number indicates outcome value number and column number indicates pattern number).
A vector YSI which contains the observed and the imputed values of outcome.
In the case study, matrices MatOutcomeSI are 4×4 dimensional.In subject 11, the first row of MatOutcomeSI is not filled as y 11,1 is observed.Then, imputation of missing outcome values is conducted sequentially by value.
If the identifying restriction chosen is either CCMV, NCMV, or ACMV, random values of y 11,2 are first drawn in all available patterns from: Results are stored in the two SAS data files: SolutionTotal which contains the parameter estimates of fixed effects.
CovBTotal which contains the variance matrix coefficients.
In the case study, the contents of SolutionTotal for the first imputation takes the form: Then, the pooled analysis is performed using SAS PROC MIANALYZE with the syntax:
Results exhibits a treatment-effect estimate at visit 4 of −4.45 (2.37) with a significance level of p = 0.06.

Results under different identifying restrictions
Any inference should account for the uncertainty attributable to missing data so that the type I error rate is valid under the assumptions made.In clinical trials, the primary statistical approach is often based on the MAR assumption, under which MMRM provides valid inferences.----------------Variance---------------- As shown in Section 4.1, the case study exhibits a moderate dropout rate of 16.8%.The relative increase in variance due to missingness can be used to quantify how missingness contributes to inferential uncertainty.In the SAS output shown in Section 5, the value 0.146 for the treatment-effect parameter at visit 4 characterizes a moderate contribution.This information can be supplemented by the relative efficiency, introduced by Rubin (1987), which indicates the magnitude of the point estimate's standard error of the current run rather than based upon an infinite number of imputations.In our example, the relative efficiency is greater than 0.999 and tends to demonstrate that the default size of 500 imputations is largely sufficient to obtain accurate inferential results.However, the systematic use of such diagnostic tools to determine the number of imputations raises several criticisms because the true value of these quantities is unknown and varies across distinct runs for the same data and analysis; see, e.g., Bodner (2008).Moreover, one can easily execute several times the case study based on 500 imputations and observe that results exhibit some variation across different runs.
A practical alternative to this approach consists of pre-defining accuracy levels on desired inferential parameters in line with the objectives of the study.On the basis of several runs of different sizes, the user selects the number of imputations that guarantees the stability of results at the pre-defined accuracy levels.Such investigation should be conducted under NCMV or NFMV NC , which generate greater between-imputation variances than CCMV or NFMV CC , respectively.
A simple exploratory analysis requires less accuracy than an analysis directed to regulatory bodies.In the case study, a size of 100 to 500 imputations is perfectly conceivable for a quick exploration, whereas 1000 imputations guarantee an accuracy level of 1 decimal to the treatment-effect estimate at visit 4. A size of 10,000 imputations guarantees accuracy levels of two decimals to the estimate and three decimals to the significance level.In the case study, this model implies that SAS PROC MIANALYZE does not provide pooled inferences for the treatment effect at visit 1 since the between-imputation variance is zero.Such a situation is not uncommon in a more general context as there may always be parameters for which there is no missing information.Anyway, treatment-effect inferences at visit 1 can be retrieved from any of the individual imputations, because all complete data sets will provide the same results for that particular parameter.

Use of different models for imputation and analysis
If different models are used for imputation and analysis, the SAS macro variable &Covariates cannot specify both models anymore.If &Covariates is used to specify the imputation model, the analysis model must be specified manually in SAS PROC MIXED, line 272, and in SAS PROC MIANALYZE, line 309.

Pooling of patterns
In applications with many time points, or when sample size by pattern is too low, it may be more reasonable (or necessary due to sparseness) to just define patterns of early, middle, and late dropouts.However, the program handles patterns that are defined based on dropout at every time point, except baseline.To overcome this, pattern numbers must be decoupled from time point numbers.Next, the instruction to calculate the number of available patterns for the imputation of each missing value must be modified in the R script.The default instruction nPatternsSIv = nPatterns -val + 1 appears twice.Line 274, to calculate the number of available patterns for imputation under CCMV, ACMV, and NCMV, the default instruction must be replaced by the syntax: if (val <= 2) { nPatternsSIv = nPatterns -val + 1 } else nPatternsSIv = nPatterns -val + 2 Line 324, to calculate the number of available patterns for imputation under NFMV, the default instruction must be replaced by the syntax: if (val <= 3) { nPatternsSIv = nPatterns -val + 1 } else nPatternsSIv = nPatterns -val + 2

Extension to linear mixed-effects models
In experiments with many time points, MMRM as specified in (9) may be inappropriate to fit outcome values over time.Alternatively, the so-called random-coefficients model toward extends the regression model to longitudinal outcomes.
Let us assume, in the case study, that the outcome values are described by an average trend in function of the time since randomization.We further base the model on the assumption that, for every subject i who belongs to group g, this trend can be modeled by a quadratic regression, but with subject-specific coefficients.This explicitly allows the outcome trajectories to vary by subject.
Denoting the time since randomization at visits by Time k , k = 1, . . ., t (t < T ), one formally obtains where i = ( gi1 , gi2 , . . ., git ) is assumed to be normally distributed with mean vector zero and some covariance matrix Σ i .Because subjects are randomly sampled from a population of subjects, it is natural to assume that the subject-specific coefficients in b i = (b 0gi , b 1gi , b 2gi ) are normally distributed with mean zero and covariance G.
The above model is a special case of the general linear mixed-effects model which assumes that Y i,obs satisfies where X i contains the known fixed-effects covariates and Z i contains the known subjectspecific covariates.We have b i ∼ N (0, G) and i ∼ N (0, Σ i ), where the b i 's and i 's are independent.The error covariance matrix sometimes simplifies to Σ i = σI ni or other structures.To specify model ( 14), times of measurement must be incorporated into X i and Z i .
Implementation of linear mixed-effects models in the PMM framework is straightforward from Section 2.2.If patterns are based on missingness at every time point, PMM can be specified from (15) by replacing the parameters β, G, and Σ i by the pattern-specific parameters β t , G t , and Σ t .If patterns are based on combinations of visits, the decoupling between pattern number and visit number is accommodated in the program (see indications in the Section 7.4).
Missing outcome values are drawn from conditional pattern distributions where patterns are selected by the identifying restriction chosen.As in Section 2.2, we now suppose that a pattern r is used to impute y i,t+1 whereas y i,1 , . . ., y i,t are observed.The mean and variance of conditional pattern distributions are directly and simply obtained by replacing Σ r by V i = Z i G r Z i + Σ r in the formulas yielding (11).

2.
Imputation: Use R to Generate pattern parameter values by imputation.Draw missing outcome values from conditional pattern distributions of all available patterns.
Parameterization of imputation stage The user must specify parameter values for imputation in several SAS macro variables: &nBatches: Number of batches of imputations (positive integer).&nImputations: Number of imputations by batch (positive integer).

5. 1 .
Pattern parameter estimation Pattern parameters are estimated using SAS PROC MIXED which fits longitudinal outcome values with the fixed effects specified in the SAS macro variable &Covariates and an unstructured error covariance matrix.The syntax used for this is: proc Mixed data = OutcomeSort method = ml noclprint noitprint asycov covtest; class Subject Rep; model Outcome = %str(&Covariates) / noint s covb; ods output solutionf = Solution; ods output covb = CovB; ods output covparms = CovParms; ods output asycov = AsyCov; repeated Rep / subject = Subject type = UN r; by Pattern; run;

Four
R functions are defined at the top of the script: rMVNorm(n, Mu, Cov) generates n random vectors which are drawn from a multinormal distribution with mean Mu and variance matrix Cov.MVCond(Y, X, Beta, S, IndicY1, IndicY2) derives the conditional mean and variance matrix of Y vector values given others.X denotes the design matrix, Beta contains the parameter values, and S is the variance matrix of Y. IndicY1 indicates the observed values of Y and IndicY2 the given ones.fMatrix(Col) produces a complete symmetric matrix from the vector Col which contains the coefficients of the diagonal and the lower part of the matrix.Under ACMV and NFMV, fRdraw(OutcomeS, BetaImput, SigmaImput, IndicColS, nPatterns, nPatternsSIv, imput, val) is used to select the pattern of imputation for the imputth imputation of the valth missing outcome value.OutcomeS is a data frame which contains subject information, BetaImput contains the values of the β (m) t 's and SigmaImput the values of the col( Σ (m) t )'s, whereas IndicColS indicates the design matrix columns among the OutcomeS columns.Last, nPatterns is the total number of patterns and nPatternsSIv is the number of patterns available for imputation.To select the pattern of imputation, fRdraw() produces the vector OmegaCumSIv which contains the ascending values of Σ k j=s ω (m) sj and the scalar U which contains the value of U (m) as specified in (12).
SigmaImput are shown here-below: R> subset(SigmaImput, (pattern == 1 | pattern == 2) & imputation == 1) We now show how to reduce the number of patterns in the case study to three by pooling Patterns 2 and 3.The pooling of patterns can be specified in the SAS data step Outcome2R by incorporating first Pattern 3 into Pattern 2 and then Pattern 4 into Pattern 3.This is done, below line 105, by inserting the syntax: data DataFile.Outcome2R; set DataFile.Outcome2R; if Pattern = 3 then Pattern = 2; if Pattern = 4 then Pattern = 3; run;No other modifications are needed in the SAS code.

Table 1 .
Note that 188 of the 226 profiles are complete, which is a percentage of 83.2%, while 16.8% (38 subjects) exhibit monotone missingness.

Table 1 :
Missingness patterns and the frequencies with which they occur.'O' indicates observed and 'M' indicates missing.

Table 2 :
Molenberghs, Beunckens, Sotto, and Kenward (2008)ith the default program parameterization.However, the sensitivity of inferences to departures from MAR should be thoroughly investigated via sensitivity analyses under plausible MNAR mechanisms, although MNAR cannot be tested.This latter aspect has been discussed, demonstrated and exemplified inMolenberghs, Beunckens, Sotto, and Kenward (2008).If the conclusion assuming MAR differs from conclusions under plausible MNAR mechanisms, then careful scrutiny is necessary.In the PMM framework, ACMV is the natural counterpart to MAR whereas the other identifying restrictions describe MNAR mechanisms.Here-below, we provide results of PMM analyses of the case study data under different identifying restrictions available in the program.Analyses are based on 10,000 imputations, which is a large number compared to what is generally recommended.Arguments to justify the number of imputations are given first.The number of imputations is an important issue as it directly influences result accuracy and program execution time.Several aspects need to be considered and there is no obvious formal answer.Beyond the intuitive rule that the greater the number of imputations, the greater is result accuracy, the identifying restriction chosen and the amount of observations per pattern (e.g., majority completer versus spread over many patterns) are two important aspects to Treatment-effect estimates at visit 4 under CCMV, ACMV, NCMV, NFMV CC , and NFMV NC obtained with 10,000 imputations.be considered.CCMV is more precise if there are a lot of completers, as opposed to some neighboring patterns that are rather lightly filled.
Table2shows the treatment-effect estimates at visit 4 under CCMV, ACMV, NCMV, NFMV CC , and NFMV NC obtained with 10,000 imputations.Under NFMV CC and NFMV NC , ∆ was set to 0, 2 and 4.The difference between the result under ACMV and the results under the MNAR mechanisms is moderate, although this difference may have an impact on the statistical conclusion if the significance level is set to 0.05.NFMV assumes that any dropout is associated with an