A Bayesian Non-Stationary Heteroskedastic Time Series Model for Multivariate Critical Care Data

We propose a multivariate GARCH model for non-stationary health time series by modifying the variance of the observations of the standard state space model. The proposed model provides an intuitive way of dealing with heteroskedastic data using the conditional nature of state space models. We follow the Bayesian paradigm to perform the inference procedure. In particular, we use Markov chain Monte Carlo methods to obtain samples from the resultant posterior distribution. Due to the natural temporal correlation structure induced on model parameters, we use the forward filtering backward sampling algorithm to efficiently obtain samples from the posterior distribution. The proposed model also handles missing data in a fully Bayesian fashion. We validate our model on synthetic data, and then use it to analyze a data set obtained from an intensive care unit in a Montreal hospital. We further show that our proposed models offer better performance, in terms of WAIC, than standard state space models. The proposed model provides a new way to model multivariate heteroskedastic non-stationary time series data and the simplicity in applying the WAIC allows us to compare competing models.


Introduction
Advances in medical instrumentation and technology have given researchers the ability to collect large amounts of patient data.In this paper we study patient data arising from multi-channel monitoring in intensive care units (ICUs).There is increasing amount of research [1] [2] that recognizes the importance and difficulty in analyzing real time data collected in the ICU.Modeling such data accurately would help to ensure better patient care and outcome.Time series analysis of the heart rate (HR) allows for a non-invasive way to study the heart rate variability (HRV), which describes the variability of the heart rate of an individual over a period of time.It is well known that the HRV is a reflection of various physiological factors that help modulate the rhythm of the heart.Many papers [3,4,5,6] discuss the significance of HRV in preventing different cardio-vascular pathologies.This type of data usually resembles long structured time series which may be correlated with other physiological processes.Such data, while opening up many new avenues of research, has complicated the nature of the analysis and there is a need to develop suitable multivariate models for these processes.
The time dependent structure of the data naturally leads to the usage of multivariate time series analyses and state space modeling.Various models have been developed for modeling health time series data [6,7,8,9,10], but a major complication is that many series exhibit non-linear and non-stationary dynamics; there is an increasing amount of research into modeling [11] some of these additional complexities.One way of dealing with non-stationary data is to use switching linear dynamical systems to analyze multivariate data and estimate based on approximately stationary intervals [12].
Predicting various outcomes, such as mortality, as well as future evolution of the time series themselves, is also of great interest.Recently, machine learning researchers have tackled the prediction problem using EHR data and have provided a great deal of flexibility in terms of modeling the data.Deep learning, recurrent neural networks, deep neural networks, support vector machines, random forests, have all been applied for a variety of clustering, prediction and modeling tasks based on vital signs data [13,14,15,16,17].These methods, however, have several limitations and drawbacks, for example when dealing with missing data [15,18].While some machine learning models have had a degree of success in predicting outcomes in some settings, these models are typically hard to interpret and their robustness to variations in the data is unclear.It is also not easy to understand how these models represent uncertainty in the data.More concerning is that there seems to be a large range in the capability of these models to reproduce their results [19].Finally, it is not entirely clear that many of these models respect the chronology of events provided by a time series model, instead converting the time series data into some form of static patient data [20].
Our proposed model and analysis are inspired by data obtained from an intensive care unit (ICU) at the McGill University Health Centre (MUHC), Montreal.This motivated us to investigate non-stationary multivariate models for HR and BP.The data set records a large selection of physiological features collected from patients during their stay in the ICU and the readings of each patient are taken at 5 minute intervals over several days.The volume of measurements for each patient varies based on the duration of their stay at the hospital, and as a result for some patients there is a large volume of data while for others there is relatively little.Additionally, data are missing at random (due to various reasons, such as equipment failure).Figure 1 shows two series, one for the HR and the second for BP, for a particular patient.The HR and BP series in Figure 1 are globally non-stationary and heteroskedastic.We are interested in building an interpretable and flexible model to fit the data.A joint model for this data would require modeling its meanvariance structure as well as the correlation structure.We study the HR and BP data under the framework of dynamic linear models (DLM) proposed by West and Harrison [21], which are a class of multivariate state-space models (SSMs) estimated under a Bayesian framework.SSMs allow for the modeling of non-stationary temporal data.We augment these dynamic models with conditionally heteroskedastic error terms, thus modeling the variance level dynamically together with the mean level of the time series.The main elaboration deploys the generalized autoregressive conditional heteroskedasticity (GARCH) model [22].
The proposed model leverages the conditional nature of both DLMs and GARCH type heteroskedasticity.This allows for simultaneous inference on the dynamic volatility of the process and accommodates the non-stationarity in the data, i.e. we can draw inference on time-evolution of the mean level of the HR and BP series and also get estimates on how the variance of each series is evolving.A natural consequence of our proposed model is that the inference procedure allows us to identify possible structural breaks in the data, places where there may have been an increased volatility (we discuss this in the data analysis part in Section 5).Our proposed model is also able to deal with missing data, which provides a significant advantage when using data sets like the one presented here, where missingness occurs at random.Finally, to further test the validity of our model, we use a 4-dimensional GARCH model to estimate data obtained from the MIMIC-III Waveform Database [23,24].The MIMIC-III Waveform Database contains digitized ICU data for approximately 30,000 ICU patients.In particular we give an example of a joint modeling of HR, systolic BP, diastolic BP and respiration series.
The proposed model is novel in the way that it uses conditional heteroskedasticity to explain the persistence of large variations observed in the patient data and focuses on the reconstruction and prediction of the time series values themselves.In the Bayesian setting, with large time series, prediction has its unique challenges [25].There is a vast set of previous research that looks at the Bayesian prediction problem for both the offline and online settings [26,27,28,29].The proposed model is general enough to allow for the incorporation of such predictions, as one need only specify an appropriately tailored structure for the SSM -see Section 6 for an elaboration of the model designed to handle these cases.
The patient-level analysis incorporated in this paper exclusively looks at a retrospective analysis of the patient level data instead of a predictive analysis so as to investigate the performance of the methodology in its ability to model the non-stationarity and conditional heteroskedasticity present in the data.This investigation is also exclusively built to estimate the individual patient-level data over a population-based model.This, in the case of the ICU, appears more pertinent, as each patient admitted to the ICU is done so for their unique reasons, and the utility of a population-based analysis is not immediately apparent.Nevertheless, our analysis can quite readily be extended to involve the building of a hierarchical population-based model, with an appropriate collection of hyper-prior specifications.We discuss this briefly in Section 6.
Overall, the proposed model is for non-linear and heteroskedastic time series data.It is easily interpretable and allows for predictions and population level estimation.It is able to deal with missing data, in an intuitive manner and does not impede analysis when the data are missing.Further, it is based on a more simple parameter space when compared to other machine learning models.Finally, in modeling the data with the proposed model, the results show the model is able to estimate the dynamic variance and the non-linear mean level of the model at each time period and investigate the big regime changes that occur during the patients' stay at the ICU.
The paper is organized as follows.Section 2 gives a brief review of the theory of DLMs/SSMs and the GARCH processes, extends the standard Gaussian SSM to include GARCH errors and discusses model selection.Section 3 develops an inferential procedure for the model.Section 4 describes a simulation study wherein we show that we are able to recover the values of the parameters.Section 5 applies the multivariate model to the MUHC and MIMIC-III data sets.Section 6 provides some discussion and presents future avenues of research.

Proposed Heteroskedastic Model
Our proposed heteroskedastic model combines the DLM and GARCH components to capture both non-stationarity in mean level and time-varying conditional variance.Suppose that {y t } is a real-valued (n × 1) vector time series measured in discrete time.Let Observation equation: State equation: with w t i.i.d.
∼ N (0, W t ).F t is a known matrix (n × r) for all t.The residual error component, z t (n × 1), is assumed to follow a multivariate GARCH process as defined by [30] (details provided in Section 2.1).The state equation describes a smooth evolution of the state parameter vector, θ t (r × 1), at time, t, as a function of the evolution matrix G t (r × r), the previous values of the state vector plus some measurement error, w t (r × 1).This measurement error is assumed to follow a multivariate normal distribution with zero mean and time-varying covariance matrix W t .Let V t , an (n × n) matrix, be the variance of z t (see Section 2.1 for the formulation for the GARCH process).The residual errors of the observation and state equations, z t and w t respectively, are assumed independent across time.In this setup the unknown parameters are the variance parameters, that is, the components of V t and W t .This formulation keeps the model very flexible and changing F t and G t provides many familiar models, including but not limited to seasonal models, linear growth models, linear regression models, and ARMA models.A simple but surprisingly useful model is the 'random-walk plus noise' (or 'local level') model Observation equation: with each equation in the same dimension, which can capture global non-stationarity.In the state equation with n = 2, we have More generally, the observation and state matrices may contain unknown parameters which may be inferred via the implied marginal likelihood for the observable quantities.When the observation and state level variances do not change with time, they are simply referred to as V and W , respectively.
The dlm package in R can fit the standard Bayesian DLM for many of the models indicated above, including models that exhibit mean-reversion, and by changing the form of the matrices F and G at specific time points, can also fit models with structural breaks.However, it does not include the option to fit with conditional heteroskedasticity, and as is observed in Figure 1, there is often visual evidence that time-varying conditional variability is present in the data.In the next section, we describe the specific heteroskedastic model that we use.

Multivariate GARCH model
In (1), z t is assumed to be a Constant Conditional Correlation GARCH process (CCC-GARCH) which was proposed by [30], as given by the following formulation: we have that with D t = diag(σ 1,t , . . ., σ n,t ) and, for i = 1, . . ., n, p , β q ∈ R n + , and Here chol(V t ) is the Cholesky decomposition of the matrix V t .In this model the correlations, ρ i,j , in the symmetric matrix R, remain constant over time.This correlation ρ i,j is the correlation between each series z it and z jt .
The GARCH parameters, α q also must be placed under some joint constraint in order to maintain stationarity for the sequence {V t }.If all the GARCH parameters are zero, the model reverts back to a standard DLM.The correlation between each of the series given by ρ i,j for i, = 1, . . ., n, is also a parameter of interest.
The multivariate GARCH process z t follows a martingale difference sequence and hence we have that, E[z t ] = 0, E[z t z s ] = 0, t = s.The simplest case for this model is when there is full independence of the time series at both the state level and the observation level and independence between the parameters.In this case each series can be fitted separately.However, when there exists correlation between the series, fitting these time series individually will not be appropriate and a multivariate model should be fitted in order to appropriately account for the correlation present.In (11), the formulation that ρ i,j = 0 for all i, j (R is then the (n × n) dimensional identity matrix) corresponds to the series being independent at the observation level.When ρ i,j is not zero and is unknown, the parameter space increases quadratically.For example if z t is (n × 1) dimensional multivariate GARCH(1, 1) then the unknown parameters are α ,j>i , which gives the total number of unknown parameters as 3n+n(n−1)/2.In the bivariate and trivariate cases, there are 7 and 12 unknown parameters respectively.

Mean-covariance structure and model properties
Let D t = {θ 0:t , y 1:t−1 } be the set that contains the sequence of the state vector up to time t.Then E[y t |D t ] = F t θ t and V ar(y t |D t ) = V t , where V t is given in (3).This formulation is similar to a standard GARCH process, where the variance at time t depends conditionally on the square of the noise and the variance at time t − 1.This allows for a dynamic variance at the observation level, thus extending the standard homoskedasticity assumption used in state-space models.As mentioned above, specifying different design matrices F and G allows for the building of many commonly used stationary and non-stationary state space models and these will be more flexible than the standard DLM due to the presence of a dynamic variance.This will later be seen in the bivariate model for HR and BP.

Bayesian inference procedure
Let y 1 , . . ., y T be a set of observed multivariate time series.Let ψ = {φ, θ 0:T }, where φ is the set of unknown parameters (α 1:q are the collection of all the GARCH parameters α for each i = 1, . . ., n.Under the assumption that the residual errors arise from independent processes, and conditional on θ 0:T , the y t are independent and the likelihood function is given by, (y 1 , . . ., where f Y (y t |ψ) is a multivariate normal density with mean θ t and variance V t .Inference procedure is performed under the Bayesian paradigm, which naturally accounts for the uncertainty in the estimation of the parameters.
The complete specification of the model under the Bayesian framework requires specifying an appropriate prior distribution p(φ) for φ.Following Bayes' theorem the resultant posterior distribution for φ is proportional to, We seek efficient computational approaches to compute the posterior for φ as well as to infer the states in the latent model.

Kalman filter formulation
Despite the addition of the GARCH component, the proposed model can still be handled using techniques for the dynamic linear state-space model; see [21] for further details.We have by the Kalman recursions that f Y (y t |y t−1 , φ) is a multivariate Gaussian, with mean f t and variance Q t , where and we have where the filtered values are, m t = a t + K t e t and C t = V t − K t Q t K t , with the Kalman gain and the forecast error being, t and e t = y t − f t respectively.The final element to complete the computation for (7) is the evaluation of the matrix S 2 t = D t RD t where D t = diag(S 1,t , . . ., S n,t ) and R is as defined in equation ( 11) above.Specifically, we have that Where, (i) represents the i th component of the vector and the (i, i) represents the i th diagonal entry of the matrix.The marginal distributions for the one-step ahead forecasts are given by p(y Since there is no closed form for the posterior distribution we use MCMC methods to sample from it.Using the Forward-Filter-Backward-Sampling (FFBS) algorithm proposed by [31] and [32] we can sample θ = θ 1:T from the full conditional posterior distribution for the latent states.Given the latent states and the prior distributions, the conditional posterior for all other parameters is non standard and to obtain samples from their posterior full conditionals the Metropolis-Hastings algorithm is used.For the GARCH parameters, the stationarity constraint makes efficient sampling complex.Furthermore, in the constant correlation model, there is a positive definiteness constraint on the matrix which also requires attention.

Reparametrization of the correlation matrix, R
In the CCC-GARCH model, the correlation matrix, R, must be positive definite and have diagonal elements equal to 1.A simple yet effective reparametrization of the matrix makes its estimation much simpler as will be seen subsequently in the simulation part of Section 4. The parametrization comes from the upper Cholesky decomposition of R, which allows us to write R = U U , where U is an n × n upper triangular matrix with positive diagonal entries.The restriction that R has diagonal elements equal to one means that the row vectors of U are unit vectors.This gives us the following matrix form for U : Notice that u nn = 1 by our formulation of the rows of U being unit vectors and u ii > 0 for all i = 1, . . ., n.This reparametrization avoids having to estimate the components of R from (7) and instead estimates the elements of U since this will still uniquely determine R. Because of this uniqueness, subsequently we will refer to these u i,j as the correlation components of the GARCH process.For more details on fast sampling of correlation matrices, see [33].The only exception in using this parametrization is when n = 2, the bivariate case.Since in this case the only unknown in R is the single correlation coefficient, ρ 1,2 , which has the restriction of being in the range [−1, 1].

Missing data
It is often the case that health time series data are subject to missingness in one or more of the components.An advantage of the Bayesian method over non-Bayesian or non-probabilistic methods is that missing data can be imputed coherently as part of the inference.MCMC-based inference is especially well suited to overcome missingness in the outcome component of the DLM in (1); any missing y t value, or a component of the observation vector, is sampled as part of the MCMC, usually via a Gibbs sampler step.This step is particularly straightforward in a conditional Gaussian model as in (1).Formally, having a missing value at time t is the same as setting some components of F t to zero.Suppose that y t is a p × 1 dimensional vector.There are two possibilities that we need to specify.The first one is when information is missing across the entire vector at time t, i.e. all the components of y t are missing.In this case, the observation matrix F t is zero.This also results in the Kalman gain matrix to be a matrix of 0s.The filtered updates are given as, The second possibility is that only some of the observations at time t are missing.That is, for y t , only some components are missing while the others are present.In this case, we have that the components in the observation matrix F t that correspond to the missing values, are set to 0. This results in the a Kalman gain matrix, whose columns, corresponding to the missing values, contain zeros and the rest of the filtering process follows straightforwardly as before.

Model selection
A key feature of estimating parameters is also deducing redundancies in the models.Model selection allows us to choose the best model given a choice of competing models.In our case the two competing models, a Gaussian-SSM and a GARCH-SSM.One of the main tools for model selection is the information criterion, in particular Akaike's information criteria (AIC), Bayesian (or Schwarz) Information Criteria (BIC) use the maximum likelihood estimates and the log predictive distribution.Both AIC and BIC penalize for model complexity.We refer the reader to [34] for more details.
Following [35] we use an information criteria proposed by [36], known as the Watanabe-Akaike Information criteria also known as the Widely-Applicable Information Criteria (WAIC) to select the model.As mentioned in [35], the WAIC encompasses a Bayesian approach to the out-of-sample prediction performance and assesses the performance of the model using the samples drawn from the posterior distribution.The WAIC is computed as LPPD − p WAIC , where LPPD is the log pointwise predictive density and p WAIC is a bias correction: A higher value of the WAIC indicates a better fit for the model.In our analysis, we compute the WAIC for the model where the latent state components are marginalized out.

Simulation study
In this section we carry out a simulation study for an illustrative 4-d GARCH-SSM.We simulate a GARCH-SSM data using a sample size of T = 1000 and we estimate the parameters using both the standard state space model approach and our GARCH-SSM formulation.The observation matrix, F , and state evolution matrix, G, are constant throughout time and fixed as F = G = I 4 , where I 4 is the 4 × 4 dimensional identity matrix.With the same parameter values we simulate 100 replicate data sets from a 4-dimensional GARCH(1,1)-SSM, thus allowing us to assess the frequentist behaviour in the parameter estimates.The model parameters are the unknown components of the variance, that is the state variance and the components of the GARCH(1,1) error.We seek to estimate these parameters, the latent states and an estimate of the observation variance.The results show that our proposed model is broadly able to retrieve the unknown parameters, estimate the latent state generating the observations and is able to correctly identify a GARCH-model from a standard-SSM one.The full scope of the simulation studies are detailed in the Supplementary Materials provided online.All codes and results are also available and provided in the online Supplementary Materials section.

Choice of priors
To estimate the model using the Bayesian framework we need to specify the priors.For the GARCH parameter α (i) 0 we choose a half-Cauchy prior; for (α 1 ) we choose a joint half-Cauchy prior restricted to the region such that α 1 < 1; for the correlation parameters, u ii , we choose a half-Cauchy prior, for the remainder of the correlation parameters, u ij , we choose a N (0, 1) prior and for the state covariance matrix, W , we choose an Inverse-Wishart prior, IW (10,10), where, i, j = 1, 2, 3, 4. The choice of priors reflects the positivity restriction that we have made for the GARCH parameters and the diagonal components of the correlation components.The half-Cauchy prior has a thicker tail than other priors such as the half-normal and this allows for larger values for α 0 to occur more frequently.This has the effect of allowing the baseline variance for the GARCH process to be high.We have also made a stationarity restriction for the GARCH parameters as given by the support α For the correlation parameters in the matrix U , inference needs to be carried out on the unnormalized version of U which then can normalized so that R = U U has unit diagonal entries.For the diagonal entries of the unnormalized version of U we choose a half-Cauchy prior, for the remainder of the correlation parameters, u ij , we choose a N (0, 1) prior.Our choice of the Inverse-Wishart prior for the state variance W , allows for a conjugate analysis.However, for all other parameters in the model we obtain samples from the posterior distributions using the Metropolis-Hastings algorithm.

Estimation, prediction and residual analysis
For these models, a key interest, other than the parameter estimates, are the estimation of the latent state vector and the dynamic variance to allow the study of heteroskedasticity.We obtain posterior samples of the latent state vector using the FFBS algorithm.Being able to estimate the latent state allows us to study the mean level of the model.Equation (7) gives, conditional on the posterior samples of the parameters, the posterior estimates of the one-step ahead forecast, f t , and its variance, Q t .Similarly being able to estimate the evolution of the variance over time will allows us to study what led to moments of higher or lower variance, which is of interest to researchers performing a retrospective analysis of the data.For illustrations and more details regarding the simulation study we refer the reader to the Supplementary Materials.We are able to accurately recover the latent state, the dynamic variance and the unknown parameters.We also do a residual analysis in the Supplementary Materials section, showing that the heteroskedasticity adjusted residuals return approximately standard-normal error terms as expected under the specifications given by (3).

Model selection
A key objective of the simulation study is to assess whether, at the selected sequence length, our estimated models are correctly selected using the WAIC when the heteroskedasticity is in fact present in the data generating mechanism.In each simulation we calculated the WAIC using from both the GARCH-SSM estimation and the standard-SSM estimation.Since our models were originally generated using a multivariate GARCH(1,1) we would ideally want that our WAIC is always choosing the GARCH model over the standard-SSM.From our simulations we get that the WAIC for the GARCH model is always higher than the WAIC of the standard model, that is we are always accurately choosing the GARCH model over the standard state space model.More details regarding the calculation of the WAIC are provided in Supplementary Materials.

Application to real ICU Data
In this section we will apply our bivariate GARCH-SSM to multivariate ICU data.This data set is a selected patient observed over T = 2029 data points at a frequency of one measurement every five minutes.This data set has missing values in both BP and HR series and also short segments where there are no observations in either series.

Montreal ICU data
As seen from Figure 1, both these series show non-stationarity properties in mean and a time-varying variance.There are also apparent outliers in both series, probably due to some physiological change occurring; this happens roughly at the same time for both series.Given the sudden change in volatility and what seems to be pockets of volatility clustered together, we believe that the proposed GARCH model would be an appropriate model; our GARCH-SSM is based on a bivariate GARCH(1, 1) structure.
We compare the fit from our proposed model to the fit from a standard DLM, which we use as a benchmark.First we fit a standard DLM, specifically, the random walk plus noise model from equation (2) with a constant conditional variance structure, in two dimensions.We assume that the observation and state matrices are constant through time and we choose F and G both as the 2-dimensional identity matrix.In this model our unknown parameters are the observation variance, V , and the state variance, W .For both V and W we specify a Inverse-Wishart prior IW (10, 10), which are conjugate priors and makes sampling from the posterior easier.Posterior samples were obtained by running the MCMC using four chains, each starting from different values.Each chain had a burn-in sample of 20,000 and subsequently every 50 th sampled value was stored.This resulted in a total sample size of 8,000 from the posterior distribution.We checked for convergence of our chains using the ergodic means.
Next, we fit a bivariate model with GARCH errors.The main difference from the previous model is that the observation error, z t , follows a bivariate CCC-GARCH(1,1) process, more specifically, we assume where, α 1 are the components of the observation variance parameters, where i = 1, 2 (which correspond to HR and BP respectively).These need to be estimated as does, W , the state variance parameter.Define ρ s as the correlation of the state errors, that is it is obtained by dividing the off-diagonal element of W (the covariance between the state errors) by the standard deviations of the state errors.For the GARCH-parameters we choose a half-Cauchy prior for α (i) 0 .For (α 1 ) we choose a joint half-Cauchy prior restricted to the region such that α 1 < 1. Define ρ obs as the correlation between the observation errors.For ρ obs , we choose a Uniform(-1,1) prior and for the state covariance matrix, W , we choose an IW (10, 10).The Inverse-Wishart prior for W in this case is also a conjugate prior.However, for all other parameters, as the posterior full conditionals are unknown we use steps of the Metropolis-Hastings algorithm, with random walk proposals.Posterior samples were obtained from starting 4 different chains from random initial values.We discarded the first 20,000 iterations as burn-in and we stored every 50 th samples and collected a total of 8,000 samples from the posterior distribution for each parameter.We again checked for the convergence of our chains using the ergodic mean.
The WAIC values were -24676.52 for the standard state space model and -23958.42for the GARCH-SSM.Since the GARCH-SSM has a higher WAIC than the standard state space model, thus we have that the GARCH-SSM is the best among the fitted models.Table 6, shows the posterior parameter estimates from a standard bivariate DLM.We see that for ρ obs , the 95% posterior credible interval contains 0. However, there seems to be a positive correlation between the state vectors as given by, ρ s .Table 5 gives the parameter estimates of the bivariate GARCH-SSM.We have that for both the HR and BP series, the β 1 parameter estimate is quite high (the posterior mean is 0.691 for the HR series and 0.747 for the BP series).This shows that there is quite a high persistence in volatility.At the same time in terms of the correlation, at the observation level there seems to be a very small correlation with the posterior estimate being ρ obs = 0.098 with the 95% posterior credible interval being (0.05, 0.14).The state level posterior estimation of the correlation is ρ s = 0.069 and the 95% posterior credible interval is (−0.03,0.19).Figure 2 (top row) shows the posterior estimates of the state for each of the HR and BP series along with the 95% credible interval given by the dashed lines.The posterior estimates of the latent state have been superimposed over the true observations to show how the estimated level of the observations is changing over time.Table 2: Montreal ICU data: Bivariate GARCH(1,1)-SSM parameter estimates.ρ is the estimate of the correlation between the observation errors of the HRT series and the BP series and ρ s is the estimate of the correlation between the state errors of both series.The estimates are the posterior mean values and in the parenthesis is below is their posterior standard deviations, and below that the 95% credible interval.

Observation
Figure 2 (bottom row) provides the posterior estimate of the variance of the two different series over time.We see that in both the HR and BP series the notion of either series having a constant variance is likely inaccurate.We see that, especially for the HR series, there are moments with large jumps in the conditional variance, followed by sharp dips.For both the HR and BP series allowing a dynamic variance seems appropriate given the uneven fluctuations in the data.As a consequence of the posterior estimation of the variance, we can thus also gauge and see if there were any large structural changes in the model.We use this to see that for both the HR and BP series, after about the 700 th observation there is a change in the mean level of the variance (this may be caused by a physiological change which then manifests itself in the HR and BP data).The persistence of volatility in both the series can be seen from the panels in Figure 2.  In the analysis presented, the fit of the random-walk plus noise model was investigated, and the fit appears good.However, it is also plausible that a model with a latent stochastic trend could be useful in explaining variation in critical care data.In this model the componentwise state is augmented from θ j,t = µ j,t (the local stochastic mean) to θ j,t = (µ j,t , λ j,t ) (the local stochastic mean and trend), where for j = 1, 2. This which can provide an additional smoothness to the reconstruction.Further extensions would allow for structural breaks in the model and a time varying correlation between multiple series (recall that the proposed model keeps the correlation constant).However, fitting this results in an inferior WAIC compared to the random-walk noise model.

MIMIC ICU Data
We fit our 4-d GARCH model on patient data obtained from the MIMIC-III Waveform Database [23,24].We fit a joint model for HR, systolic BP, diastolic BP and respiration data for a single patient.Each series is complete, with no missingness, and has T = 647 data points.The model implementation was identical to that from section 5.1, namely the random-walk plus noise model, but now in four dimensions.The MCMC implementation details were also identical.The parameter estimates of the model are presented in Table 3.For this data set, the WAIC value of the GARCH model is -12404.2while the WAIC using a DLM is -12440.92.Thus in this case the GARCH model is preferred, and there is strong evidence for correlation in the GARCH errors, with estimated correlation as large as 0.92.From Figure 3 one can see that there are instances when the observed data seem to have some large fluctuations in variance.However, this data set is much more straightforward, and lacks the persistence of volatility present in the data from the Montreal ICU. Figure 4 shows the posterior estimate of the state variable along with the 95% credible intervals.This can further be seen from the estimates of the variance in this model given in Figure 5 which shows that the variance is fairly stable with some predictable fluctuations over time.This also explains the values of the estimates of the GARCH parameters which are much far away from the non-stationary region for this model.Overall the parameter estimates from the model seem reasonable.

Discussion
We proposed a new model to deal with non-stationary and heteroskedastic types of health time series.To the best of our knowledge this is the first type of model combining heteroskedasticity and non-stationarity for Bayesian state-space models in this manner and applying it to the health time series data.This is done by extending the classical DLM with a multivariate GARCH process.The structure of the mean-level model chosen was a mutlivariate random-walk plus noise model.A draw back of this structure is that it precludes any type of forward prediction estimates.However, with another type of mean-level chosen, the ability to forecast ahead is restored.The Bayesian algorithms developed are able to sample efficiently from the unknown posterior distribution using MCMC and they are able to satisfactorily recover the parameters of the model in simulation, the latent state and estimate the dynamic variance.We fitted the proposed model to the ICU dataset and showed an improvement in the prediction accuracy (as measured by WAIC) obtained using our proposed model over the standard multivariate SSMs currently available.
In this paper, we have focussed on a state-space representation where the latent process is continuous valued, and represents the 'noise-free' version of the patient-specific process for the observable quantities.However, it is reasonably standard in the Bayesian literature to also allow for discrete states, in a hidden Markov model, that might be used to represent categorized health-states.In the simplest case that might be useful for an ICU setting, we might propose a binary switching HMM where the states represent 'stable' and 'unstable' status of the patient, on top of which further latent and observable structures may be built.The computation of the Bayesian posterior for such a model is more involved, but still tractable.See the comprehensive coverage in Chapter 13 of [37], which provides several options for Markov switching formulations.
We have studied only individual-level patient data as this is the most urgent issue in ICU care, but the Bayesian framework can easily be extended to a hierarchical analysis of a cohort of patients with different individual characteristics (and state series) but with common structure in terms of heteroskedasticity parameters or state crosscorrelations.The relevance of such an analysis would be most pressingly felt in units where there was a degree of patient homogeneity, such as neo-natal intensive care, and be useful as it would allow common parameters to be more effectively inferred.
We present a simulation study illustrating the performance of the model for a 4-d GARCH-SSM.We simulate a GARCH-SSM data using a sample size of T = 1000 and we estimate the parameters using both the standard state space model (SSM) approach and our GARCH-SSM formulation.The observation matrix, F , and state evolution matrix, G, are constant throughout time and fixed as F = G = I 4 , where I 4 is the 4 × 4 identity matrix.With the same parameter values we simulate 100 replicate data sets from a 4-dimensional GARCH(1,1)-SSM, thus allowing us to assess the frequentist behaviour in the parameter estimates.The model parameters are the unknown components of the variance, that is the state variance and the components of the GARCH(1,1) error.We seek to estimate these parameters, the latent states and an estimate of the observation variance.The structure of F and G in this case represent a random-walk plus noise structure for the mean level.

Choice of priors
To estimate the model using the Bayesian framework we need to specify the priors.For the GARCH parameter α (i) 0 we choose a half-Cauchy prior; for (α 1 ) we choose a joint half-Cauchy prior restricted to the region such that α 1 < 1; for the state covariance matrix, W , we choose an Inverse-Wishart prior, IW (10,10), where, i, j = 1, 2, 3, 4. The choice of priors reflects the positivity restriction that we have made for the GARCH parameters and the diagonal components of the correlation components.The half-Cauchy prior has a thicker tail than other priors such as the half-normal and this allows for larger values for α 0 to occur more frequently.This has the effect of allowing the baseline variance for the GARCH process to be high.We have also made a stationarity restriction for the GARCH parameters as given by the support α 1 < 1 for i = 1, 2, 3, 4. For the correlation parameters in the matrix U described below, we choose priors for the unnormalized version of U and then normalize the proposed quantity so that R = U U has unit diagonal entries.For the diagonal entries of the unnormalized version of U we choose a half-Cauchy prior, for the remainder of the correlation parameters, u ij , we choose a N (0, 1) prior.Our choice of the Inverse-Wishart prior for the state variance W , allows for a conjugate analysis.However, for all other parameters in the model we obtain samples from the posterior distributions using the Metropolis-Hastings algorithm.

Model specification and Bias Estimation:
The 4d-GARCH(1,1)-SSM is given by the following specification.
Observation equation: GARCH error specification : (10) with D t = diag(σ 1,t , . . ., σ n,t ) and, for i = 1, . . ., n, 1 ∈ [0, 1], and and the parametrization U represents the upper Cholesky decomposition of R. In the simulation we used F = G = I n×n so that we have a random walk plus noise model.Below we provide the estimation results from the 100 simulations summarized in the form of bias.The bias is computed based on the posterior median estimates from each of the 100 simulations.First, reported are the bias of the GARCH parameters; next, we report the bias for the estimates of the unknown state Variance matrix, W ; finally, we report the boxplot based on the posterior median estimates for the parameters of U .

Series
The true value of W used to generate the model was W = diag(0.1,0.1, 0.1, 0.1).From (12) we can see that with respect to estimating the unknown state variance W the model is doing well.Next, for the correlation coefficient matrix, U , we see from the boxplots in Figure 6 that the model is also able to estimate this parameter well.Table 4 shows that the GARCH parameters are being estimated reasonably well.Overall owing to the latent structure and the multidimensionality of the model it becomes difficult to estimate the GARCH parameters.It must be noted again that the latent state also must be estimated in order to calculate the GARCH paramaters.
Finally, it still remains to be checked that for each of these simulated data, the GARCH-SSM is being selected as the most appropriate model based on our WAIC selection criteria.Figure 7 shows the boxplots of the WAIC estimated when each of the simulated data sets are estimated using a GARCH-SSM versus a standard-SSM.We see that the WAIC is correctly selecting the GARCH-SSM over the standard-SSM.This can be understood from the fact that the WAIC of the GARCH model is greater than that of the standard-SSM.The panel on the right in Figure 7 is a histogram of the difference in the GARCH and the standard-SSM based WAICs computed for each data set.This also reinforces the finding that the WAIC is correctly selecting the appropriate model, since for almost all of the simulated data sets we have that the difference in the WAICs is positive, thus showing that the GARCH based WAIC is greater than the standard SSM based WAIC.8 Model Estimation and Residual Analysis

Model Estimation
For illustration, the following results are for a single 4-dimensional series of length T = 1000.Figure 8 shows the posterior estimates of the one-step-ahead forecast, f t .We can see that this basically traces out the mean level of the observational series for each series.Figure 9 shows the posterior estimates of the latent state for each of the series in thick black and the true state vector in the broken lines.We can see that we are able to estimate and recover the unobserved states accurately and the true value of the unobserved state almost always lies in the 95% credible interval.In the data analysis of real data, when we apply this model to the HR and BP data, the posterior estimates of the latent state will allow us to study the variability of the HR and BP over the time period of the data.
The panels in Figure 10 shows the posterior estimates of the standard deviation in black and in gray we have the true standard deviation.This plot shows flexibility our proposed model provides over the standard Gaussian state-space model.We know that for the data the variance is not constant and being able to estimate it properly allows us to draw better inference on the process at a particular time, t.We can see that our estimate for the dynamic standard deviation is quite close to the actual standard deviation of the observed series and the true values are recovered.
The 95% credible intervals of both the estimated state vector and the variances contains the true value of the state vector and the dynamic variances.This shows that our model is able to accurately capture both latent state and the unknown dynamic variance.

Residual Analysis
Next, we carry out a residual analysis.Recall that by assumption from (10) that the GARCH errors when normalized with respect to the dynamic variance should result in a standard normal error distribution.Replicating this result empirically would be evidence that our model is able to capture the correct heteroskedastic structure of the errors.From Figure (10) we can see that for the second series, there is quite a bit of heteroskedasticity.Thus one would suspect that compared to the quantiles of the standard normal distribution, this series would show large deviations.And this is exactly what can be seen in Figure 11 (top right panel).Consequently, normalizing all the series with their true heterskedasticity should result in normalized errors that adhere strongly to the quantiles of the standard normal distribution.This second result can be seen in Figure 12.The following is an analysis from a third patient.The estimates and plots are provided below.In this case, based on the WAIC we see that the GARCH model is the most appropriate.

Figure 1 :
Figure 1: Montreal ICU data for a single patient.Left: Heart rate series of a patient; Right: Blood pressure series of same patient.Time axis is number of seconds since start of monitoring

Figure 2 :
Figure 2: Montreal ICU data.Top row: posterior estimates of the mean level of each series (HR left, BP right), with the credible intervals given by the dashed lines and the observed data in gray.Bottom row: posterior mean estimates of the dynamic standard deviation (HR left, BP right).

Figure 3 : 4 Figure 4 :
Figure 3: MIMIC data: In black, the posterior estimates of the one step ahead forecasts of each series.In gray are the observed data.

Figure 5 :
Figure 5: MIMIC data: The dynamic standard deviation in black.The shaded gray area is the 95% posterior credible interval of the estimated standard deviation.

Figure 6 :Figure 7 :
Figure 6: Boxplot: Distribution of the correlation components of U based on posterior median estimates of the 100 simulated data sets.The bold black solid circles represent the true values.

Figure 8 : 4 Figure 9 :Figure 13 : 2 Figure 14 :
Figure 8: Simulated data: the posterior estimates of the one step ahead forecasts of each series.In gray are the observed data.

Figure 15 :
Figure 15: Posterior mean estimates of the dynamic standard deviation (HR left, BP right).

Table 1 :
Montreal ICU data: Parameter estimates of the bivariate Gaussian-SSM.Here ρ obs is the estimate of the correlation between the observation errors of the HRT series and the BP series and ρ s is the estimate of the correlation between the state errors of both series.The values displayed are the posterior means of the parameters and the values in the parenthesis are the posterior standard deviations.

Table 3 :
MIMIC data: 4-d GARCH(1,1)-SSM parameter posterior median estimates (Std.Error)The posterior estimate of the state variance matrix W and the GARCH correlation matrix R are respectively,

Table 4 :
GARCH Parameter Estimates and Biases

Table 7 :
Montreal ICU data: Bivariate GARCH(1,1)-SSM parameter estimates.ρ is the estimate of the correlation between the observation errors of the HRT series and the BP series and ρ s is the estimate of the correlation between the state errors of both series.The estimates are the posterior mean values and in the parenthes is below is their standard error and below that the 95% credible interval.

Table 8 :
Montreal ICU data: Parameter estimates of the bivariate Gaussian-SSM.Here ρ obs is the estimate of the correlation between the observation errors of the HRT series and the BP series and ρ s is the estimate of the correlation between the state errors of both series.The values displayed are the posterior means of the parameters and the values in the parenthesis are the standard errors.