1 Introduction

Time series data arise in a variety of different areas including finance (Taylor 2007), biology (Bar-Joseph et al. 2003) and energy (Alvarez et al. 2011; Doucoure et al. 2016). The collection and recording of time series can be interrupted due to various reasons, including human error or technical faults with the recording equipment, inducing missingness within the time series. Little and Rubin (2002) describe the occurrence of missing values in data through a number of “missingness mechanisms”:

  1. 1.

    Missing completely at random (MCAR)—The probability of missingness is the same for all units, i.e. the missing value is not dependent on other variables.

  2. 2.

    Missing at random (MAR)—The probability of missingness depends only on available information, i.e. the missing value depends on other variables.

  3. 3.

    Not missing at random (NMAR)—The missingness probability depends on the variable itself, i.e. the missing observation depends on other missing values.

Regardless of the type of missingness present, further analysis of the time series such as autocovariance or spectral estimation can be difficult without first replacing the missing data with appropriate estimates. This estimation process is called imputation.

There exists a rich literature dedicated to the imputation of missing values within stationary time series; see Pratama et al. (2016) for an recent review of this literature. For univariate time series, explicit consideration of the temporal dependence (autocorrelation properties) is key for successful imputation. Moritz and Bartz-Beielstein (2017) discuss various univariate time series imputation approaches, ranging from simple methods that replace missing values with the mean to more advanced approaches that involve spline interpolation or model fitting combined with the use of a Kalman filter.

Popular techniques suitable for multivariate data such as Multiple imputation (see, for example, Rubin (1987) or Audigier et al. (2016)), hot-deck (Ford 1983) and expectation–maximisation (EM) (Dempster et al. 1977) make use of inter-variable correlations to estimate missing data. Most variants of the EM approach within the literature are based on assumptions of Gaussianity (see, for example, Junger and de Leon (2015), Honaker and King (2010) or Honaker et al. (2011)). Alternative methods have also been developed, combining EM with other modelling procedures such as PCA fixed-effects models (Caussinus 1986) and Gaussian Mixture Models (Ghahramani and Jordan 1994). Other model-based methods for imputation within time series make use of restrictive classes of statistical models to infer missing values, such as those based on purely autoregressive processes (Sridevi et al. 2011).

Other approaches to imputation include those based on heuristics such as genetic algorithms (Lobato et al. 2015; Tang et al. 2015), or those incorporating machine learning methodology, ranging from support vector machines (Wu et al. 2015) and random forests (Stekhoven and Bühlmann 2011), to more advanced techniques such as recurrent neural networks (Cao et al. 2018; Che et al. 2018) or adversarial networks (Luo et al. 2018; Yoon et al. 2018). The drawback of many of these approaches is that they often require training on previously seen complete data. However, in many contexts, we only have access to a single observed multivariate series.

An alternative strategy for coping with missingness is to estimate the spectral information of the series in some way. A range of methods has been developed for spectral estimation in stationary time series with missing values or irregularly sampled observations within the time series and signal processing literature. The Lomb–Scargle periodogram (Lomb 1976; Scargle 1982) estimates the Fourier spectrum from the irregularly sampled data but can be subject to strong bias which hinders its ability to describe slopes within the spectrum. Variants of this approach have been applied in a range of different fields including astronomy (Wen et al. 1999), biology (Van Dongen et al. 1999) and biomedical engineering (Laguna et al. 1998). Other widely used techniques involve fitting time series models directly to the unequally spaced data and using this to estimate spectral information for stationary processes (Jones 1980; Bos et al. 2002; Broersen 2006).

In practice, however, the assumptions imposed by modelling an observed time series as stationary can be restrictive and unrealistic. Nonstationary time series, i.e. series with time-varying second order structure, are observed in various fields including finance (Stărică and Granger 2005; Fryzlewicz et al. 2006), life sciences (Cranstoun et al. 2002; Hargreaves et al. 2019) and oceanography (Killick et al. 2013). Hence, the ability to impute missing values in multivariate, nonstationary time series has potential widespread benefits. Many techniques have been developed for modelling and analysing complete multivariate, nonstationary data including time domain approaches (Molenaar et al. 1992; Wu and Zhou 2019), the locally stationary Fourier model (Dahlhaus 2000), the smooth localised complex exponential (SLEX) model (Ombao et al. 2005) and the multivariate locally stationary wavelet (mvLSW) framework (Park et al. 2014) but the literature on how to deal with missingness within such data is sparse. In the univariate setting, Knight et al. (2012) propose a method for estimating spectral information of a LSW process containing missing values where information is estimated at the observed time locations. However, the problem of spectral estimation within multivariate, nonstationary time series has not been widely studied.

In this article, we address the challenging problem of imputation in the multivariate locally stationary time series setting and, more specifically, where additional training data are unavailable. We adopt a wavelet-based approach to this challenge. Whilst there are many time domain imputation techniques as noted above, we prefer a frequency domain approach due to the ability of wavelet-based models to capture a range of localised nonstationary behaviour (via the wavelet spectrum) and provide more accurate estimation and detection of complex structures within series, see, for example, Nason et al. (2000); Barigozzi et al. (2018). Our strategy involves first estimating the local wavelet spectral matrix of a mvLSW process with missing observations before forecasting and backcasting the missing values of the time series using a multivariate extension of the wavelet forecasting approach of Fryzlewicz et al. (2003). As a final step, we average the estimates obtained from the forward and backward pass to obtain an overall estimate of the time series. Through the use of simulated examples and a case study, we demonstrate that our method performs well for a range of realistic missingness scenarios in both the stationary and nonstationary setting.

This work is organised as follows. Within Sect. 2, we review existing methods for modelling locally stationary time series and forecasting within this context. In Sect. 3, we introduce the proposed imputation method. Section 4 contains a simulation study evaluating the performance of the proposed imputation method using synthetic examples. We also describe a case study using a dataset arising from a carbon capture and storage facility. Finally, Sect. 6 includes some concluding remarks.

2 Background

In this section, we provide a brief overview of recent work modelling locally stationary time series. For a comprehensive review of nonstationary time series more generally, see Dahlhaus (2012). The section is organised as follows; we first review existing methods for modelling locally stationary time series using the LSW framework both in a univariate and multivariate context in Sect. 2.1 before looking at one-step ahead forecasting in this setting in Sect. 2.2.

2.1 Modelling locally stationary wavelet processes

Within the univariate setting, the locally stationary wavelet (LSW) framework introduced by Nason et al. (2000) provides a flexible model for nonstationary time series that captures the changing second-order structure of such series. Nason et al. (2000) define a LSW process \(\left\{ X_{t,T}\right\} _{t=0,\ldots ,T-1}\), \(T=2^{J}\ge 1\) to be a sequence of (doubly-indexed) stochastic processes that can be represented as

$$\begin{aligned} X_{t,T}=\displaystyle \sum _{j=1}^{\infty } \displaystyle \sum _{k} W_{j}(k/T)\psi _{j,t-k}\xi _{j,k}. \end{aligned}$$
(1)

The vectors \(\psi _{j} = \left\{ \psi _{j,0},\psi _{j,1},\ldots , \psi _{j,N_{j}-1}\right\} \) are discrete non-decimated wavelets associated with a low-/high-pass filter pair, \(\left\{ \mathcal {H},\mathcal {G}\right\} \). The elements of \(\psi _j\) can be calculated using the following expression:

$$\begin{aligned} \psi _{1,n}&= \sum _{k} g_{n-2k}\delta _{0,k} = g_{n}&\text {for}~n = 0,1,\ldots ,N_{1}-1,\\ \psi _{j+1,n}&=\sum _{k} h_{n-2k}\psi _{j,k}&\text {for}~n =0,1,\ldots ,N_{j+1}-1. \end{aligned}$$

In the equations above, \(\delta _{0,k}\) is the Kronecker delta function and \(N_{j}=\left( 2^{j}-1\right) \left( N_{h}-1 \right) +1\) where \(N_{h}\) is the number of nonzero elements of the filter \(\mathcal {H}=\left\{ h_{k}\right\} _{k\in \mathbb {Z}}\). \(\left\{ \xi _{j,k}\right\} _{j,k}\) is a sequence of zero-mean, orthonormal random variables and \(\left\{ W_{j}(k/T)\right\} \) is a set of amplitudes on which a number of assumptions are imposed to control the behaviour of the LSW process (see Nason et al. (2000) for further details).

Park et al. (2014) introduced the multivariate locally stationary wavelet (mvLSW) processes as a multivariate extension to the LSW framework. Following Park et al. (2014), a P-variate locally stationary wavelet process \(\left\{ \mathbf {X}_{t,T}\right\} _{t=0,1,\ldots ,T-1}\), \(T=2^{J} \ge 1\) has the following representation

$$\begin{aligned} \mathbf {X}_{t,T}=\displaystyle \sum _{j=1}^{\infty } \displaystyle \sum _{k} \mathbf {V}_{j}(k/T)\psi _{j,t-k}\mathbf {z}_{j,k} \end{aligned}$$
(2)

where \(T=2^{J} \ge 1\) \(\{\psi _{j,t-k}\}_{j,k}\) is a set of discrete non-decimated wavelets constructed according to Nason et al. (2000) and \(\mathbf {V}_{j}(k/T)\) is the transfer function matrix. The random vectors \(\mathbf {z}_{j,k}\) are uncorrelated and have mean vector \(\mathbf {0}\) and variance–covariance matrix equal to the \(P\times P\) identity matrix. The transfer function matrix consists of Lipschitz continuous functions with Lipschitz constants, \(L_{j}\), that satisfy \(\sum _{j=1}^{\infty } 2^{j}L_{j}^{(p,q)} < \infty \) for each pair of channels (pq). The conditions imposed upon the transfer function matrix control the time-varying contribution to the variance made by each channel at a particular scale.

The local wavelet spectral (LWS) matrix is an important quantity within the mvLSW framework as it provides a scale-dependent decomposition of the variance and cross-covariance between channels at a particular (rescaled) time z. Given a mvLSW signal \(\mathbf {X}_{t}\) with transfer function matrix \(\mathbf {V}_{j}(k/T)\), the LWS matrix is given by

$$\begin{aligned} \mathbf {S}_{j}(z)=\mathbf {V}_{j}(z)\mathbf {V}_{j}(z)^{\top } \end{aligned}$$
(3)

for scale j and rescaled time z. Following Park et al. (2014), the local auto-covariance and cross-covariance between channels p and q are defined as

$$\begin{aligned} c^{(p,p)}(u,\tau )&= \displaystyle \sum _{j=1}^{\infty } S_{j}^{(p,p)}(u)\varPsi _{j}(\tau ), \end{aligned}$$
(4)
$$\begin{aligned} c^{(p,q)}(u,\tau )&= \displaystyle \sum _{j=1}^{\infty } S_{j}^{(p,q)}(u)\varPsi _{j}(\tau ) \end{aligned}$$
(5)

where \(S_{j}^{(p,p)}(u)~\text {and}~S_{j}^{(p,q)}(u)\) denote the spectra and cross-spectra, respectively, of the series, and \(\varPsi _{j}(\tau )\) is the discrete autocorrelation wavelet defined by \(\varPsi _{j}(\tau )=\sum _{k}\psi _{j,k}\psi _{j,k-\tau }\) for \(j \in \mathbb {N}\) and \(\tau \in \mathbb {Z}\) (Eckley and Nason 2005).

In practice, the LWS matrix and local auto- and cross-covariance are unknown for an observed multivariate series and need to be estimated. In the complete data case, where there are no observations missing, the LWS matrix of a multivariate signal can be estimated by first calculating the empirical wavelet coefficient vector \(\mathbf {d}_{j,k} = \sum _{t} \mathbf {X}_{t} \psi _{j,k}\). The raw wavelet periodogram matrix is then defined as \(\mathbf {I}_{j,k} = \mathbf {d}_{j,k} \mathbf {d}_{j,k}^{\top }\).

Park et al. (2014) show that the raw wavelet periodogram is a biased and inconsistent estimator of the true LWS matrix, \(\mathbf {S}_{j}(z)\). The bias within the raw wavelet periodogram can be removed using the inverse of the inner product matrix of discrete autocorrelation wavelets, \(\mathbf {A}\). The elements of \(\mathbf {A}\) are given by \(\mathbf {A}_{j,\ell } = \sum _{\tau } \varPsi _{j}(\tau ) \varPsi _{\ell }(\tau )\) where \(\varPsi _{j}(\tau ) = \sum _{k} \psi _{jk}(0) \psi _{jk}(\tau )\) (see Eckley and Nason (2005) or Nason et al. (2000) for further details). To obtain consistency, the resulting estimate must be smoothed in some way, for example using a rectangular kernel smoother (Park et al. 2014). The local auto- and cross-covariance can then be estimated by substituting the estimated LWS matrix \(\hat{\mathbf {S}}\) into Eqs. (4) and (5), respectively. The LWS matrix and the local auto- and cross-covariance structure are important quantities within the imputation method we propose in Sect. 3 as they are used within the prediction step to estimate missing values. See Taylor et all. (2019) for further details and an introduction to software that implements the mvLSW estimation scheme.

In the missing data setting, there exist some time points for which we do not have an observed value, \(\mathbf {X}\), for one or more channels of the P-variate series. Due to this, we cannot directly apply the above approach for estimating the wavelet periodogram and have to modify the method to allow us to obtain estimates of the wavelet coefficients at all scales and locations. This will be discussed further in Sect. 3.

2.2 Forecasting multivariate locally stationary wavelet processes

As outlined in Sect. 1, a key element of the imputation scheme that we introduce in Sect. 3 is the ability to perform one-step ahead forecasts in a locally stationary setting. Fryzlewicz et al. (2003) introduce a one-step ahead predictor in the univariate LSW setting which uses the autocovariance structure of an LSW process to form generalised Yule-Walker equations. Our approach is a straightforward extension of the foregoing work to the multivariate setting that makes use of the mvLSW model outlined in Sect. 2.1.

Due to the separable structure of the mvLSW model, we can form one-step ahead prediction equations for each channel combination (pq) using the local auto- and cross-covariance defined in Eqs. (4) and (5), respectively. The multivariate prediction equations are defined by

$$\begin{aligned} \sum _{s=0}^{t-1} b_{t-1-s,T}^{(p,q)}~c^{(p,q)} \left( \frac{s{+}n}{2T},s-n\right) {=}c^{(p,q)}\left( \frac{n{+}t}{2T},t-n\right) . \end{aligned}$$
(6)

As Fryzlewicz et al. (2003) describe in the univariate setting, the coefficients \(\mathbf{b} _{t}^{(p,q)}\) that solve the prediction equations can be shown to minimise the mean square prediction error

$$\begin{aligned} \text {MSPE} \big (\hat{X}_{t,T}^{(p,q)}, X_{t,T}^{(p,q)} \big )&= \text {E} \big (\hat{X}_{t,T}^{(p,q)} - X_{t,T}^{(p,q)} \big )^2 \\&= \mathbf{b} _{t}^{(p,q) \top }\varvec{\varSigma }^{(p,q)}_{t,T} \mathbf{b} _{t}^{(p,q)}, \end{aligned}$$

where \(\mathbf{b} _{t}^{(p,q)} = (b_{t-1,T}^{(p,q)},\ldots ,b_{0,T}^{(p,q)},-1)\) and \(\varvec{\varSigma }^{(p,q)}_{t,T}\) is the covariance matrix of \(X_{0,T}^{(p,q)}, \ldots , X_{t,T}^{(p,q)}\).

The one-step ahead predictor of \(\mathbf {X}_{t,T}\) given previous multivariate observations \(\mathbf {X}_{0,T}, \ldots , \mathbf {X}_{t-1,T}\) is then given by the m-observation clipped predictor

$$\begin{aligned} \hat{X}_{t,T}^{(p)} = \sum _{q \in \{1,\ldots ,P\}} \sum _{s=t-m}^{t-1} b_{t-1-s;T}^{(p,q)} X_{s,T}^{(q)} \end{aligned}$$
(7)

for \(p \in \{1,\ldots ,P\}\), where \(m\) is the number of recent observations used in prediction.

Our proposed imputation approach (introduced in the next section) uses the one-step ahead prediction in the mvLSW setting outlined above to replace missing values in a multivariate locally stationary time series.

3 Imputation for multivariate locally stationary wavelet processes

In this section, we introduce our multivariate imputation method which uses the local auto- and cross-covariance structure of a nonstationary time series to estimate missing observations. The key challenge in this context is that the usual mvLSW spectral estimation process cannot be used due to the presence of missingness. For this reason, the first step of the algorithm involves estimating the wavelet periodogram of a mvLSW process with missing observations, and this will be discussed in Sect. 3.1. Using the estimate of the LWS matrix, we then form the local auto- and cross-covariance structure and carry out a forward pass of the data where we forecast missing values. To obtain more accurate estimates of the time series at missing locations, we also implement a backward pass of the data where we backcast the missing values. We then average the series obtained from the forward and backward pass in order to get an overall estimate. The forecasting and backcasting steps will be described in Sects. 3.2 and 3.3, respectively. A complete overview of the steps carried out in one iteration of the method can be found in “Appendix A” section.

3.1 Spectral estimation

Suppose we have a P-variate time series of length \(T=2^{J}\) containing missing values which we wish to impute. The first step of the mvLSWimpute algorithm involves estimating the LWS matrix of the time series. The presence of missing values means that we cannot use the usual estimation procedure and have to modify our approach.

First, we calculate the empirical wavelet coefficient vector, \(\mathbf {d}_{j,k}\), for the time series ensuring that any wavelet coefficients at scales and locations affected by the initial missing data will also be missing. The Haar wavelet is used within the calculation of the empirical wavelet coefficient vector as this will ensure that more levels of the wavelet transform will contain information. From this, we can form the raw wavelet periodogram matrix as described in Sect. 2.1. Since the raw wavelet periodogram will also have entries missing, we need to perform an intermediate step to fill in these missing values before smoothing and correcting as described in Sect. 2.1 to obtain an estimate of the LWS matrix. We note here that other wavelets could be used in the calculation of the coefficients; for further discussion of the choice of wavelet in the LSW context, we refer the reader to Gott and Eckley (2013).

In order to fill in the missing values, for each spectra and cross-spectra, we linearly interpolate the periodogram values at those levels which feature missing values, but which contain periodogram values available to use in the interpolation. For the coarsest levels of the periodogram where the coefficients are all missing, we replace them in a different manner. To do this, we take the coarsest level of the periodogram that contains information and recursively apply the wavelet filter equations. This process generates coefficients that allow us to replace the values in the coarsest levels of the periodogram.

To obtain an estimate of the LWS matrix, we correct the periodogram by multiplying by \(\mathbf {A}^{-1}\) and then smooth the result using a running mean smoother with window length \(\lfloor \sqrt{T} \rfloor \), implemented in the mvLSW R package (Taylor et al. 2017). This estimate can then be substituted into Eqs. (4) and (5) to form the local auto and cross-covariance which are used in the forecasting and backcasting steps of the algorithm.

3.2 Forecasting

In order to replace missing data in the time series, we first carry out a forward pass of the series where we use the one-step ahead multivariate wavelet forecasting approach outlined in Sect. 2.2. A missing index is defined to be a time point at which one or more channels of the P-variate time series have missing values present.

For each missing index i, we forecast the missing values sequentially in the following way. First, calculate the local auto- and cross-covariance using the estimated spectra from time 1 to time \(i-1\) and Eqs. (4) and (5).

For each channel combination (pq) where \(p, q \in \{1,\ldots , P\}\), form the prediction equations using the local auto- and cross-covariance at certain locations and lags, as in Eq. (6). Solving the prediction equations allows us to obtain \(\mathbf{b} ^{(p,q)}\) vectors used to predict the values of the series at time i using the one-step ahead predictor defined in Eq. (7). The channels of the multivariate time series that contain missing data at time i are then replaced by the corresponding predicted values from the forecasting step.

It is important to note that, for efficiency, we use a clipped predictor in the forecasting step in which only the most recent \(m\) observations are used in the prediction, similar to Fryzlewicz et al. (2003) in the univariate setting.

Fig. 1
figure 1

Example realisations of generating processes for the different scenarios used in the simulation study. (a), (c) Slowly evolving dependence, class changes at time 150 and 300; (b) Rapidly changing dependence structure, class changes at time 100, 200, 300 and 400; (d) Stationary signal, no changes in the generating coefficient matrices of the process

3.3 Backcasting

After carrying out the forward pass of the data, the next step is to backcast the missing values sequentially. This backcasting step is included in order to improve the accuracy of the imputation method since this allows us to incorporate information from both sides of the missing observation in our estimation. Similar to the approach of Trindade (2003), we can form backward Yule–Walker equations in the mvLSW setting by beginning at time T and again using the multivariate wavelet forecasting approach from Sect. 2.2, but ensuring to order the spectral values in this case. Note that the backward pass is carried out independently to the forward pass and does not depend on the imputed time series obtained in the previous step.

For each missing index i (considered in descending order), we proceed as in the forecasting case and form the local auto- and cross-covariance using the estimated spectra from time T to \(i+1\). As in the forward pass, for each channel combination (pq), we can solve the prediction equations using the local auto- and cross-covariance to obtain the \(\mathbf{b} ^{(p,q)}\) vectors. However, the one-step ahead predictor has a slightly different form in the backcasting step:

$$\begin{aligned} \hat{X}_{t,T}^{(p)} = \displaystyle \sum _{q \in \{1,\ldots ,P\}} \displaystyle \sum _{s=t+m}^{t+1} b_{t+1-s;T}^{(p,q)} X_{s,T}^{(q)} \end{aligned}$$
(8)

for \(p \in \{1,\ldots ,P\}.\) The one-step ahead predictor in Eq. (8) is then used to backcast the value of the time series at index i, and then, any missing entries within channels are replaced using their corresponding predicted values.

After carrying out the forward and backward pass independently, we obtain two imputed time series which are then averaged to get an overall estimate of the time series. The process can then be iterated but from the second iteration onwards the spectral estimation step no longer requires linear interpolation and the LWS matrix can now be estimated using Eq. (3). The forecasting and backcasting steps remain the same, and we again average to obtain an updated estimate of the time series.

4 Simulated performance of mvLSWimpute

We now assess the performance of our proposed multivariate imputation method through a range of simulated data examples. The generating series used within the simulation study exhibit varying degrees of nonstationarity and dependence. These have been chosen to test the ability of our method to impute missing values in multivariate, nonstationary time series. A number of different scenarios have been chosen for the missingness structure in order to mimic situations arising in practice.

For datasets containing missing entries, traditional analysis based on complete cases has proven to be reasonably accurate provided that the amount of missing values is small (Graham 2009). However, such methods yield poor results when the proportion of missing entries increases. To evaluate the performance of the proposed method as the amount of missingness increases, we remove 10, 20, 30 and 40% of values from the generating series at random, either from all channels simultaneously or from one channel independently. In practice, time series obtained from industrial applications can contain gaps that may extend over hours or even days due to faults in the recording equipment or human error. In order to reflect this, we also consider the case where information is missing from one or more variables of the time series for a period of 20 consecutive time points. As a third scenario, we also include the situation where the missingness occurs in bursts up to length 20 before the signal returns to normal for a set period of time.

For all missingness scenarios, the coefficients of the generating series randomly switch at set times in order to test the ability of the imputation methods to deal with slowly and rapidly evolving dependence within a signal. The time series used in each case have the following forms:

  • Slowly changing structure: Trivariate signal of length \(T=512\), two changes in the generating coefficients of the series, occurring at time 150 and 300.

  • Rapidly changing structure: Trivariate signal of length \(T=512\), four changes in the generating coefficients of the series, occurring at time 100, 200, 300 and 400.

The first example we consider is a mvLSW process with changing spectral structure, chosen in such a way that there is strong coherence between channels of the signal. In this case, the example consists of two underlying classes with differing LWS matrices as defined below

$$\begin{aligned} \text {Class}~1:\qquad S^{(1,2)}_{j}(z)&= {\left\{ \begin{array}{ll} 5 &{} \text {for}~j=1, \\ -6 &{} \text {for}~j=2, \\ -4 &{} \text {for}~j=5.\\ \end{array}\right. }\\ S^{(1,3)}_{j}(z)&= {\left\{ \begin{array}{ll} 2 &{} \text {for}~j=3, \\ -4 &{} \text {for}~j=4.\\ \end{array}\right. } \\ S^{(2,3)}_{j}(z)&= -6~\text {for}~j=2.\\ \text {Class}~2:\qquad S^{(1,2)}_{j}(z)&= {\left\{ \begin{array}{ll} -5 &{} \text {for}~j=1, \\ 6 &{} \text {for}~j=2, \\ 4 &{} \text {for}~j=5.\\ \end{array}\right. }\\ S^{(1,3)}_{j}(z)&= {\left\{ \begin{array}{ll} 8 &{} \text {for}~j=3, \\ 4 &{} \text {for}~j=4.\\ \end{array}\right. }\\ S^{(2,3)}_{j}(z)&= 6~\text {for}~j=2. \end{aligned}$$

Figure 1a displays a dataset simulated from this process using the different dependence structures described above. The second example we examine is a time-varying vector autoregressive moving average process with three different classes defined by the following coefficient matrices

$$\begin{aligned} \text {Class}~1:~\mathbf {X}_{t} =&\begin{pmatrix} 0.4 &{} 0.1 &{} -0.2 \\ 0.1 &{} 0.3 &{} -0.3 \\ -0.2 &{} -0.3 &{} -0.2 \end{pmatrix} \mathbf {X}_{t-1} + \mathbf {Z}_{t} + \\&\begin{pmatrix} 1 &{} 0.8 &{} 0.4 \\ 0.8 &{} 1 &{} 0.1 \\ 0.4 &{} 0.1 &{} 1 \end{pmatrix} \mathbf {Z}_{t-1}\\ \text {Class}~2:~\mathbf {X}_{t} =&\begin{pmatrix} -0.3 &{} -0.2 &{} 0.3 \\ -0.2 &{} -0.3 &{} 0.1 \\ 0.3 &{} 0.1 &{} 0.2 \end{pmatrix} \mathbf {X}_{t-1} + \mathbf {Z}_{t} + \\&\begin{pmatrix} 1 &{} -0.6 &{} 0.3 \\ -0.6 &{} 1 &{} -0.3 \\ 0.3 &{} -0.3 &{} 1 \end{pmatrix} \mathbf {Z}_{t-1}\\ \text {Class}~3:~\mathbf {X}_{t} =&\begin{pmatrix} -0.6 &{} 0.4 &{} 0.1 \\ 0.4 &{} 0.2 &{} 0.3 \\ 0.1 &{} 0.3 &{} 0.5 \end{pmatrix} \mathbf {X}_{t-1} + \mathbf {Z}_{t} + \\&\begin{pmatrix} 1 &{} 0.2 &{} -0.7 \\ 0.2 &{} 1 &{} 0.6 \\ -0.7 &{} 0.6 &{} 1 \end{pmatrix} \mathbf {Z}_{t-1} \end{aligned}$$

where \(\mathbf {Z}_{t}~\text {and}~\mathbf {Z}_{t-1}\) are zero-mean multivariate normal realisations, distributed with class-dependent covariances

$$\begin{aligned} \varSigma _{1}= \begin{pmatrix} 1 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 \\ 0 &{} 0 &{} 1 \end{pmatrix}, \qquad \varSigma _{2} = \varSigma _{3} = \begin{pmatrix} 3 &{} 0 &{} 0 \\ 0 &{} 3 &{} 0 \\ 0 &{} 0 &{} 3 \end{pmatrix} \end{aligned}$$

for classes 1–3, respectively. The signal switches randomly between each of the three classes at different times depending on whether we are considering slowly or rapidly evolving dependence. This ensures that the intra- and cross-channel dependence of the process changes over time (see Fig. 1b).

The third example we consider is a time-varying vector autoregressive process with two classes defined by the following coefficient matrices

$$\begin{aligned} \text {Class}~1:~\mathbf {X}_{t} =&\begin{pmatrix} 0.3 &{} 0.2 &{} -0.2 \\ 0.2 &{} 0.4 &{} -0.2 \\ -0.2 &{} -0.2 &{} -0.1 \end{pmatrix} \mathbf {X}_{t-1} + \\&\begin{pmatrix} 0.4 &{} -0.2 &{} 0.3 \\ -0.2 &{} -0.4 &{} 0.1 \\ 0.3 &{} 0.1 &{} -0.2 \end{pmatrix} \mathbf {X}_{t-2} + \epsilon _{1}\\ \text {Class}~2:~\mathbf {X}_{t} =&\begin{pmatrix} 0.2 &{} -0.2 &{} 0 \\ -0.2 &{} 0.4 &{} -0.2 \\ 0 &{} -0.2 &{} 0.2 \end{pmatrix} \mathbf {X}_{t-1} + \\&\begin{pmatrix} -0.1 &{} 0 &{} 0 \\ 0 &{} -0.4 &{} 0.3 \\ 0 &{} 0.3 &{} 0.3 \end{pmatrix} \mathbf {X}_{t-2} + \epsilon _{2} \end{aligned}$$

where the innovation vectors \(\epsilon _{i}\) are zero-mean multivariate normal realisations, distributed with per-class covariances

$$\begin{aligned} \varSigma _{\epsilon _{1}}= \begin{pmatrix} 1 &{} 0.2 &{} 0 \\ 0.2 &{} 1 &{} 0.1 \\ 0 &{} 0.1 &{} 1 \end{pmatrix}, \qquad \varSigma _{\epsilon _{2}} = \begin{pmatrix} 5 &{} 1.2 &{} 2 \\ 1.2 &{} 5 &{} 1.5 \\ 2 &{} 1.5 &{} 5 \end{pmatrix}. \end{aligned}$$

A realisation of such a process can be seen in Fig. 1c.

Within the simulation study, we compare our method to a range of multivariate imputation approaches, some of which assume that the data follow a (time-constant) multivariate normal distribution. For this reason, we also include a stationary example where the coefficients of the moving average process do not change over time. The coefficient matrices for this process are defined as follows

$$\begin{aligned} \mathbf {X}_{t} =&\mathbf {Z}_{t} + \begin{pmatrix} 1 &{} 0.5 &{} -0.2 \\ 0.5 &{} 1 &{} 0.3 \\ -0.2 &{} 0.3 &{} 1 \end{pmatrix} \mathbf {Z}_{t-1} + \begin{pmatrix} 1 &{} -0.4 &{} 0.2 \\ -0.4 &{} 1 &{} -0.6 \\ 0.2 &{} -0.6 &{} 1 \end{pmatrix} \\ \mathbf {Z}_{t-2}&+ \begin{pmatrix} 1 &{} 0.1 &{} -0.5 \\ 0.1 &{} 1 &{} -0.3 \\ -0.5 &{} -0.3 &{} 1 \end{pmatrix} \mathbf {Z}_{t-3} \end{aligned}$$

where \(\mathbf {Z}_{t-1}, \mathbf {Z}_{t-2}~\text {and}~\mathbf {Z}_{t-3}\) are zero-mean, multivariate normal realisations, with covariances given by

$$\begin{aligned} \varSigma = \begin{pmatrix} 2 &{} 0 &{} 0 \\ 0 &{} 2 &{} 0 \\ 0 &{} 0 &{} 2 \end{pmatrix}. \end{aligned}$$

An example of this stationary process can be found in Fig. 1d.

Fig. 2
figure 2

Time series plots of the accelerometer readings for the three sensors (axes) over the same time period: (a) original series; (b) series with induced missingness—red triangles show locations of missing datapoints. (Color figure online)

4.1 Competitor methods

In the simulation study, we compare our method with a number of alternative multivariate imputation approaches, suitable for one single realisation of a multivariate process. Firstly, we consider the modified expectation–maximisation (EM) approach of Junger and de Leon (2015) implemented in the R package mtsdi (Junger and de Leon 2018). Within this method, cross-channel correlations are taken into account within the multivariate normal modelling structure and inter-time behaviour is accounted for using a level estimation step in which temporal behaviour of each of the univariate time series is estimated. In the mtsdi package, a number of different methods are implemented for estimating the level of the univariate time series. For all simulated examples, we fit a cubic spline to each univariate component where the number of degrees of freedom of each spline is chosen by cross-validation.

Secondly, we compare against the multiple imputation method that combines expectation–maximisation with bootstrapping proposed by Honaker and King (2010), available in the Amelia II R package (Honaker et al. 2015), see also Honaker et al. (2011) for implementation details. As this is a multiple imputation approach, the method produces m completed datasets which are then averaged to obtain a final imputed dataset. Within the simulations, we choose \(m=5\) as this is suggested to be suitable unless the rate of missingness is very high (Schafer and Olsen 1998; Honaker et al. 2011). As both of these methods assume that the data can be modelled using a multivariate normal distribution, we would expect them to perform poorly in cases where the underlying time series is highly nonstationary.

As we wish to impute missing values in a multivariate, nonstationary time series, it is important to compare our method to a range of other model-based approaches available in the literature. Specifically, we apply the iterative PCA multiple imputation method of Audigier et al. (2016), which is available in the missMDA R package; see Josse and Husson (2016) and Husson and Josse (2018) for more details. Within the simulations, we apply the regularised iterative PCA algorithm with the number of random initializations set to 10 and the default parameters. In addition, we compare to the nonparametric random forest imputation method (Stekhoven and Bühlmann 2011) implemented in the R package missForest (Stekhoven 2013) where again we use the default parameters.

Fig. 3
figure 3

Imputation results for the first period of missingness in the trivariate accelerometry signal from the case study: missing values denoted by black dots; imputed values for the mtsdi, VAR-fb and mvLSWimpute methods shown by red crosses (), blue plusses () and green triangles (). (Color figure online)

Since our method involves using a one-step ahead forecasting and backcasting step within the mvLSW framework, as a direct comparison to this we also apply the vector autoregressive prediction approach from the R package MTS (Tsay 2015), described in the text Tsay (2013). For each missing index, the approach fits a vector autoregressive process to the available observations and then produces one-step ahead forecasts to predict the missing values. In an attempt to ensure fair comparison with our proposed method, we implement the vector autoregression prediction performing a backward pass of the data and combine the estimates from the forecasting and backcasting steps by averaging, similar to the mvLSWimpute method. This is denoted VAR-fb. For completeness, we also include the results from applying one-step ahead forecasting only within this setting (denoted VAR-f). Our proposed mvLSWimpute method was implemented using modifications to the code in the wavethresh (Nason 2016) and mvLSW (Taylor et al. 2017) R packages which perform estimation of multivariate LSW process quantities.

Fig. 4
figure 4

Example time series containing bursts of missingness; missing time locations are represented by the filled red triangles. (Color figure online)

4.2 Evaluation measures

For each of the missingness scenarios and dependence structures described in Sect. 4, \(K = 100\) realizations of the test signals are simulated and four different evaluation measures are considered. In order to assess the performance of the imputation methods, we consider a modified version of the root-mean-square error (RMSE) and mean absolute error (MAE). The majority of the simulated examples we consider contain changes in variability over time; this volatility affects the standard RMSE and MAE and makes it difficult to directly compare results for slowly and rapidly evolving dependence. For this reason, we scale the results over time using the true standard deviation.

Let \(\sigma _{t,P}\) denote the true standard deviation of the signal at time t for channel P. The calculations of the modified RMSE and MAE only include the predicted values at the missing time points and not the full time series. Let N denote the total number of missing values across all channels and time points, \(t_{\text {mis}}=\{t_{1}, t_{2}, \ldots \}\) contain the time points where observations are missing and \(P_{\text {mis}}=\{P_{t_{1}}, P_{t_{2}}, \ldots \}\) denote the corresponding channels which are affected. Then, the modified RMSE can be defined by

$$\begin{aligned} \text {RMSE} = \frac{1}{N}\displaystyle \sum _{b\in P_{\text {mis}}}\displaystyle \sum _{s \in t_{\text {mis}}} \dfrac{(X_{s,T}^{b} - \hat{X}_{s,T}^{b})^{2}}{\sigma _{s,b}^{2}}, \end{aligned}$$
(9)

and the modified MAE as

$$\begin{aligned} \text {MAE} = \frac{1}{N}\displaystyle \sum _{b\in P_{\text {mis}}}\displaystyle \sum _{s \in t_{\text {mis}}} \dfrac{|X_{s,T}^{b} - \hat{X}_{s,T}^{b}|}{\sigma _{s,b}}. \end{aligned}$$
(10)

In addition to this, we also rank the imputation methods based on the modified RMSE and MAE results. For each of the \(K=100\) realizations carried out, we track which of the imputation methods gives the lowest scaled RMSE and MAE and sum the results.

Table 1 Performance of the imputation methods over \(K=100\) realizations of vector moving average, autoregressive series with rapidly changing dependence structure (Fig. 1b) for different missingness scenarios occurring simultaneously across all channels, using the evaluation measures described in the text
Table 2 Performance of the imputation methods over \(K=100\) realizations of stationary vector moving average series (Fig. 1(d)) for different missingness scenarios occurring simultaneously across all channels, using the evaluation measures described in the text

The imputation results for each of the examples described above (as in Fig. 1) can be seen in Tables 1, 2, 3, 4, 5. For each example, we consider 10, 20, 30 and 40% missingness at random as well as chunks of 20 consecutive time points missing and bursts of missingness up to length 20. A description of how the bursts of missingness are generated can be found in “Appendix B” section. Note that we include the results for the situation where the missingness occurs in all channels simultaneously. In each case, we record the modified RMSE, modified MAE and rankings over the \(K=100\) realizations based on these errors (as described above); the numbers within the brackets represent the standard deviation of these quantities.

When we consider rapidly evolving dependence within the time varying vector autoregressive moving average setting (Table 1), it can be seen that the mvLSWimpute method performs well both in terms of the modified error measures and the rankings when percentages of the data are missing at random. On the other hand, the competitor methods which rely on the assumption of an underlying stationary model cannot cope with the changing dependence structure. Note that the addition of a backcasting step into the vector autoregressive prediction approach (VAR-fb) provides an improvement in performance. However, it can be seen that the results weaken when we look at more extreme missingness scenarios such as bursts or chunks missing. When imputing missing values in areas of a signal where consecutive time points are missing, perhaps unsurprisingly all methods struggle to accurately reconstruct the dependence behaviour within these areas.

We next turn to consider the stationary moving average process (Table 2). It can be seen that themvLSWimpute approach again produces more accurate results followed by both VAR-fb and mtsdi. Despite the competitor methods being designed for imputation within stationary time series, the mvLSWimpute method outperforms them both in terms of the modified error measures and the rankings.

As expected, for the mvLSW process exhibiting slowly varying spectral structure the mvLSWimpute method performs strongly across all evaluation measures. We report the full results for this scenario in Table 3 of “Appendix B” section. For the slowly evolving vector autoregressive series, the mvLSWimpute method consistently performs better than the competitors. However, the VAR-fb approach also performs well in this setting due to the underlying model used for the one step ahead predictions being designed for this scenario. In this case, the results produced are comparable to mvLSWimpute in terms of the modified RMSE and MAE. Again, we report the full results for this example in Table 4 of “Appendix B” section.

Note that in nearly all cases across the examples, mvLSWimpute performs consistently well in terms of the error measures considered, despite some of the competitor methods being designed specifically for imputation within those scenarios. We also observe that the use of a backcasting step within both the mvLSW and VAR imputation methods improves their performance, justifying its inclusion. However, it should be noted here that the VAR-f and VAR-fb methods struggled with fitting models when missingness occurs near the start (respectively, end) of the series, due to the number of observations available for parameter estimation; this limits practical use of these techniques in many contexts and reflects similar experiences remarked by other authors, see, for example, Knight et al. (2020). For the scenarios discussed above, missingness in one channel only was also considered and similar performance was observed. The results for the vector autoregressive moving average setting are included in Table 5 of “Appendix B” section.

5 Case study

In the previous section, we evaluated the performance of our multivariate imputation method against a range of alternatives for a variety of simulated scenarios. We now consider an application related to health-related motion analysis.

The reduction in cost and size of accelerometers over the past decade has led to the use of these devices in many areas of scientific research, e.g. sport science (Troiano et al. 2014), engineering component calibration (Yin and Huang 2014) and computer security (Mayrhofer and Gellersen 2009). Accelerometer data are particularly useful in cases when the measurement of other meaningful physiological signals is difficult or obtrusive. As such, they have gained popularity in disease-related assessment, for example, analysis of sleep disorders (Van Hees et al. 2015), obesity and other cardiometabolic biomarkers (Brocklebank et al. 2015; Augustin et al. 2017) and post-diagnosis changes in physiology (Sekine et al. 2004; McDonald et al. 2019).

The benefits of wavelet-based accelerometry analysis have been well established in the scientific literature, see Bidargaddi et al. (2007) or Godfrey et al. (2008), since accelerometry data often represent movements at different frequencies over time. It is also acknowledged that accelerometry signals are nonstationary in nature (Preece et al. 2009a, b).

It has been suggested in many studies that attempts to draw conclusions from accelerometer signals can be hindered by the presence of missingness within the data, and missing entries should be replaced with suitable estimates before further analysis can take place, particularly in the case of such data collected from smartphones (Barnett et al. 2018). Within this section, we focus on the problem of imputing missing values in accelerometer data arising from an experiment in which subjects were asked to perform a sequence of predetermined activities or postures. Such gait and postural transition analyses are important in assessing patients’ balance after neurological episodes such as strokes (Janssen et al. 2008). In particular, we consider recordings in a period of high activity for the first subject in the HAPT smartphone dataset (Reyes-Ortiz et al. 2016), obtained from the UCI data repositoryFootnote 1(Dua and Graff 2017). In what follows, we first difference the data to remove any trend, as is commonplace prior to secondary analysis (Ahrabian et al. 2017).

The resulting data we analyse are a trivariate signal of length \(T=2048\). Accelerometer data are often blighted by missingness, which usually occurs in bursts or chunks across all axes (see, for example, Ae Lee and Gill (2018) for a discussion of missingness patterns and imputation of accelerometry in a different modelling setting). To this end, we induce bursts of missingness of length \(l=30\) with spacings of \(d=350\) according to the procedure in “Appendix B” section. Figure 2a shows the accelerometer measurements over time for three sensors, with locations of missingness shown with red triangles (Fig. 2b).

We apply the mvLSW-based imputation approach with \(m=2\) points considered in the clipped predictor for both the forecasting and backcasting steps. For comparison, we apply the mtsdi method and the VAR-fb approach (with \(p=2\)) as, of the existing methods, these performed better in the simulation study.

The imputation results for each of the methods are shown in Fig. 3 for the first burst of missingness (between time indices \(t=50\) and \(t=80\)). Missing values are denoted with black dots, whilst imputed values for mtsdi are denoted by red crosses, those for the VAR-fb method are given in blue “plusses” and those for the mvLSWimpute method are denoted by green triangles.

It can be seen that whilst the imputation results for all three methods are quite similar, the mvLSWimpute method produces the most visually reliable estimate for the missing data. In particular, the mtsdi method does not track any variation in the volatility of the series, the estimates being essentially constant over time. The VAR-fb method, whilst better, tends to produce some significantly over-/underestimated values for the series. Our mvLSWimpute technique is able to strike a balance between accurate imputation and the changing dynamics of the data. We observed similar behaviour of the methods for different axes and gaps.

Next, we consider the VAR-fb and mvLSWimpute methods with \(p=2\) and \(m=2\), respectively. However, in some settings, accelerometer data will feature longer temporal dependence, see, for example, the study in Khan et al. (2013). When modelling the HAR dataset \(m> 2\) for our mvLSWimpute approach, we observed similarly accurate results; on the other hand, the VAR-fb method suffered from drastically variable data estimates for differing p, with the error often increasing 100-fold.

The overall aim of an analysis with accelerometer data is to, typically, perform activity recognition and analysis that could be used to provide health interventions. It is therefore important to be able to replace missing values with reasonable estimates which will then allow further analysis to be carried out. Our mvLSW imputation approach can be used as a first step to infill any missing values before attempting to predict activity levels or other secondary analysis tasks of interest.

6 Concluding remarks

In this work, we have introduced a wavelet-based imputation method that can be used to replace missing values within a multivariate, nonstationary time series. We compared the performance of our method against existing imputation approaches using simulated data examples and a smartphone accelerometer dataset. The simulated data examples demonstrate that the use of a backcasting step within imputation can improve the performance of the prediction methods. The case study shows that our method can be used to successfully impute missing values within time series containing both nonstationarity and seasonality, resulting in a more reliable imputation estimate compared to existing approaches.

In practice, we have found that, as with other competitor methods, the performance of our approach suffers when we have extreme scenarios such as chunks of consecutive time points missing or bursts of missingness. An avenue for future research could be to look at ways in which we could improve the imputation results for these cases.