1 Introduction

The atmospheric pollution is a great concern for many countries of the world, as a result of studies that have verified the negative effects on human health. Santiago de Chile is one of the most polluted cities in the world due to its geographical location and emission sources.

In particular, the ozone \(\mathrm{O}_{3}\) is a highly reactive gas that can irritate lungs and also cause eye, nose and throat irritation. Long exposure to ozone pollutant can seriously worsen these effects. In Chile, the \(\mathrm{O}_{3}\) concentrations are continuously recorded by the national air quality information system (SINCA) in order to check the critical levels and take preventive measures. For this reason, it is very important to develop methods which can provide a good predictions.

Many approaches have been proposed in the literature; most of these are based on the use of neural networks and ARIMA models. Duenas et al. [1] propose ARIMA models for estimating the ground-level ozone concentrations in air at an urban and rural sampling points in Southeastern Spain, [2] analyzes hourly ozone concentrations with multiple regression and multilayer perceptron models on observations of an urban area of the east coast of the Iberian Peninsula and [3] use ARIMA models for forecasting daily maximum surface ozone concentrations at the airport in Brunei Darussalam. A neural network approach has been used by [4,5,6,7,8] for predicting the intraday ozone concentrations. A comparison of neural network models with ARIMA and regression models has been provided by [9] for prediction of Houston’s daily maximum ozone concentrations.

In this work, we propose a new methodology for predicting the ozone concentrations in Santiago de Chile given by the combination of the wavelet analysis with the accumulated ARIMA approach. In particular, we apply the wavelet transform using the Haar cumulated wavelet function to the original time series; then, we estimate an ARIMA model using the transformed series. Finally, the inverse wavelet transform is applied to the predicted wavelet coefficients obtained by the estimated ARIMA model. Thus the reconstructed time series should represent the future cumulated behavior of the ozone pollutant.

Wavelet analysis is a recent tool which allows to decompose a time series into time–frequency space (see [10]). The multiresolution analysis for discrete wavelet transform proposed by [11] permits to reconstruct a signal until a certain level of resolution. Although the wavelet analysis is a very efficient tool that can be used in several applications (time series analysis, signal processing, image processing, etc.), few works deal with time series forecasting. Soltani [12] proposed to combine the wavelet approach with the artificial neural networks (ANN) for predicting the sun spot and MacKey–Glass time series. Mabrouk et al. [13] used the autoregressive models and the wavelet decomposition for forecasting the sun spot time series. The idea of the latter works is to predict time series by applying linear (ARMAX) or nonlinear (ANN) models on simplified signal given by the reconstructed time series in each level of resolution. From the point of view of this approach, our work is novel since we propose to apply linear models to the scaling and wavelet coefficients obtained by the discrete wavelet decomposition using the wavelet Haar filter.

The paper is organized as follows. First we introduce some basic concepts about the Haar wavelet and the discrete wavelet transform (Sect. 2); then, we show theoretical results about the AR(1) process and the wavelet coefficients (Sect. 3). A simulation study is implemented in Sect. 4 to show the ability of the wavelet approach to estimate the AR(1) parameters. Results and conclusions are reported in Sects. 5 and 6, respectively.

2 Haar wavelet and discrete wavelet transform

2.1 Haar wavelet

The development of wavelets connects with the work of Alfred Haar in the early twentieth century. According to [14], the Haar mother wavelet is a mathematical function defined by

$$\begin{aligned} \psi (x) = {\left\{ \begin{array}{ll} 1, &{} x \in \left[ 0,\frac{1}{2}\right) ,\\ - 1, &{} x \in \left[ \frac{1}{2},1\right) , \\ 0, &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$
(1)

Two relevant characteristics are the oscillation (the Haar wavelet “goes up and down”; more mathematically this can be expressed by the condition that \(\int _{-\infty }^{\infty } \psi (x){\hbox{d}}x = 0\), a property shared by all wavelets) and the compact support (not all wavelets have compact support, but they must decay to zero rapidly). Hence, wavelets are objects that oscillate but decay fast and hence are “little.” Haar wavelets are defined over the [0, 1] interval, but we will consider their shifted versions on the intervals \([i-1, i]\), \(i=1, \ldots , M\). The application of the Haar wavelets in modeling continuous functions stems from the result that any continuous function \(f \in \mathcal{C}([0,1])\) can be approximated as

$$\begin{aligned} f(x) \approx f_n(x) & =\langle \xi _0,f\rangle \xi _0(x)+ \langle \xi _1,f\rangle \xi _1(x)\nonumber \\ & \quad + \cdots +\langle \xi _n,f\rangle \xi _n(x), \end{aligned}$$
(2)

with \(\langle \xi _i, f\rangle =\int \xi _i(x)f(x){\hbox{d}}x\),

$$\xi _0(x)= {1}\{0 \le x \le 1\}$$
(3)
$$\xi _1(x)= {1}\{0 \le x \le 1/2\} - 1\{1/2 \le x \le 1\}$$
(4)
$$\xi _2(x)= \sqrt{2}[1\{0 \le x \le 1/4\} - 1\{1/4 \le x \le 1/2\}]$$
(5)
$$\begin{aligned}&\vdots \\ &\xi _n(x)= 2^{j/2}[{1}I_k - {1}J_k] \end{aligned}$$
(6)

where \(I_k:=\{k 2^{-j} \le x \le (k+1/2)2^{-j}\}, ~~J_k:=\{(k+1/2) 2^{-j} \le x \le (k+1)2^{-j}\}.\) We take \(n=2^j+k\), \(j \ge 0, 0 \le k \le 2^j-1\) and it can be seen that \(\xi _n(x)=\xi _{jk}(x)=2^{j/2}\xi _1(2^jx-k)\). In this paper, we will use \(\xi _n(x)\) for simplicity of notation. It is worth observing that \(\xi _0 (x)\) (scaling function) describes the average behavior of the function f(x), whereas the functions \(\xi _n(x), n>0\) represent the details of f(x). The function \(\xi _1(x)\) is the wavelet.

We consider the interval \([i-1, i)\), \(1 \le i \le M\) and the Haar wavelets up to the \((N-1)\)th level. Here we use \(j=0\) to denote the scaling function, whereas \(j=1, \ldots , N\) denotes the wavelet function at level \(j-1\). Given any \(x \in [i-1, i)\), there is just one wavelet at each level of containing point x. Therefore, we can write

$$\begin{aligned} f(x) = \sum _{j=0}^N d_j^{(i)}(x) I_j^{(i)}(x), \end{aligned}$$
(7)

where \(d_0^{(i)}(x)\) is the coefficient of the scaling function, \(I_0^{(i)}(x)\) is the set function of \([i-1, i)\), \(I_j^{(i)}(x)\), \(j=1, \ldots , M\), is the set function of the interval x belongs to, at the \((j-1)\)th level, and \(d_j^{(i)}(x)\) is the corresponding wavelet coefficient multiplied by \(2^{(j-1)/2}\) (here we have considered this factor, part of the definition of the wavelet at \((j-1)\)th level, within the coefficient).

2.2 Haar Discrete Wavelet Transformation (HDWT)

For more details about general discrete wavelet theory, see [15]. Here we look for two vectors u and v such that the set of their translates by even integers forms an orthonormal basis.

Definition 21

Suppose N is an even integer, say \(N=2M\) for some \(M\in \mathbb {N}\). An orthonormal basis for \(\ell ^2(\mathbb {Z}_N)\) of the form \(\mathcal {H}=\{R_{2k}u\}^{M-1}_{k=0}\cup \{R_{2k}v\}^{M-1}_{k=0}\) for some \(u, v\in \ell ^2(\mathbb {Z}_N)\) is called a first-stage wavelet basis for \(\ell ^2(\mathbb {Z}_N)\). We call u and v the generators of the first-stage wavelet basis. We sometimes also call u the father wavelet and v the mother wavelet.

There is a simple condition in terms of the DFT that characterizes orthonormal bases; see [15]. Here \(\hat{w}_n:=\sum _{m=0}^{N-1}w_m\,\mathrm {e}^{-\frac{2\pi mn}{N}}.\)

Lemma 21

Let\(\mathbf{w}\in \ell ^2(\mathbb {Z}_N)\). Then\(\{R_k\mathbf{w}\}_{k=0}^{N-1}\)is an orthonormal basis for\(\ell ^2(\mathbb {Z}_N)\)if and only if\(|\hat{w}_n|=1\)for all\(n\in \mathbb {Z}_N.\)

3 Estimation and transformation of AR(1) models

Suppose that the data are observed at discrete time points \(t_i\) and are corrupted by noise, so that the observed data follow AR(1) model,Footnote 1 i.e.,

$$\begin{aligned} x_t=\varphi \, x_{t-1}+\varepsilon _t, \end{aligned}$$
(8)

where “near zero” \(x_0\) is given, \(\varepsilon _t\) is a Gaussian white noise process with diagonal variances \(\sigma ^2_\varepsilon\) and \(\varphi\) is a parameter, which has to be estimated. Notice that we admit only such \(\varphi\) for which \(|\varphi |<1\) holds (it implies that \(x_t\) is stationary and ergodic). It is well known that \({\mathbb {E}}[x_t]=0\) and \({\mathbb {V}}ar[x_t]=\frac{\sigma ^2_\varepsilon }{1-\varphi ^2}.\)

Let

$$\begin{aligned} u=\left( \frac{1}{\sqrt{2}},\frac{1}{\sqrt{2}},0,0,\dots ,0\right) \quad \text {and} \quad {\textit{v}}=\left( \frac{1}{\sqrt{2}},-\frac{1}{\sqrt{2}},0,0,\dots ,0\right) . \end{aligned}$$

Then, the first-stage Haar basis for \(\ell ^2(\mathbb {Z}_N)\) is \(\mathcal {H}\) and one can trivially check (one can simply verify Lemma 21) that this wavelet basis is an orthonormal basis. Define the Haar transform matrix as \(\mathbf{H}_{2N}\) as

$$\begin{aligned} (H_{jk})=\left\{ \begin{array}{ll} (R_{2j}u)_k, &{}j=0,\dots ,N-1\\ (R_{2j}v)_k, &{}j=N,\dots ,2N-1\\ \end{array} \right. \end{aligned}$$
(9)

Suppose that observations \(\mathbf{x}\in \mathbb {Z}_{2N+1}\) follow model (8). The Haar matrix is in fact of \(2N\times 2N\) order with the shortened form

$$\begin{aligned} \mathbf{H}_{2N}=\frac{1}{\sqrt{2}}\left[ \begin{array}{c} \mathbf{I}_N\otimes [1,1]\\ \mathbf{I}_N\otimes [1,-1] \end{array}\right] , \end{aligned}$$
(10)

where \(\mathbf{I}_N\) is the identity matrix and \(\otimes\) denotes Kronecker product. After obtaining the measurements \(\mathbf{x}\) for model (8), we transform the data by HDWT

$$\begin{aligned} \mathbf{y}=\mathbf{H}_{2N}\,\mathbf{x}. \end{aligned}$$
(11)

Notice that we can shift N in both directions sufficiently far.

Lemma 31

HDWT (11) transforms model (8) into

$$\begin{aligned} y_t=\varphi ^2\,y_{t-1}+\sqrt{1+\varphi ^2}\,\varepsilon _t \end{aligned}$$
(12)

with\(\mathbb {E}[y_t]=0\) and \(\mathbb {V}ar[y_t]=\frac{\sigma ^2_\varepsilon }{1-\varphi ^2}.\)

Proof First of all, it is well known that an affine transformation of \({\displaystyle \mathbf {x} \ \sim {\mathcal {N}}({\varvec{\mu }},{\varvec{\Sigma }}),}\) i.e., \(\mathbf{y} =\mathbf{c} + \mathbf{Bx}\) has a multivariate normal distribution with expected value \(\mathbf{c} + \mathbf{B}{\varvec{\mu }}\) and variance \(\mathbf {B} \varvec{\Sigma }\mathbf {B}^\mathrm{T}\). Therefore, \(\mathbf{H}_{2N}{\varvec{\varepsilon }}_{2N}^0\) has a multivariate normal distribution with expected value \(\mathbf{0}\) and variance \(\mathbf{H}_{2N} (\sigma ^2_\varepsilon \mathbf{I}_{2N}) \mathbf{H}_{2N}^\mathrm{T}=\sigma ^2_\varepsilon \mathbf{I}_{2N}\) due to orthogonality of \(\mathbf{H}_{2N}\). Thus it is sufficient to transform values \(x_t\). One can see that

$$\begin{aligned} y_t=\left\{ \begin{array}{cc} \frac{\varphi }{\sqrt{2}}(x_{2t-2}+x_{2t-1})+\varepsilon _t,&{\text {for}}\,\,t=1,\dots ,N,\\ \frac{\varphi }{\sqrt{2}}(x_{2t-2}-x_{2t-1})+\varepsilon _t,& {{\text {for}}}\,\,t=N+1,\dots ,2N. \end{array}\right. \end{aligned}$$
(13)

From (8), we have

$$\begin{aligned} y_t=\left\{ \begin{array}{cc} \frac{\varphi }{\sqrt{2}}(\varphi \,x_{2t-3}+\varepsilon _{2t-2}+\varphi \,x_{2t-2}+\varepsilon _{2t-1})+\varepsilon _t,&{} {\text {for}}\,\,t=1,\dots ,N,\\ \frac{\varphi }{\sqrt{2}}(\varphi \,x_{2t-3}+\varepsilon _{2t-2}-\varphi \,x_{2t-2}-\varepsilon _{2t-1})+\varepsilon _t,&{} {\text {for}}\,\,t=N+1,\dots ,2N, \end{array}\right. \end{aligned}$$
(14)

On the other hand, we know from (11) that

$$\begin{aligned} y_{t-1}=\left\{ \begin{array}{cc} \frac{\varphi }{\sqrt{2}}(x_{2t-3}+x_{2t-2}),&{} {\text {for}}\,\,t=2,\dots ,N,\\ \frac{\varphi }{\sqrt{2}}(x_{2t-3}-x_{2t-2}),&{} {\text {for}}\,\,t=N+1,\dots ,2N, \end{array}\right. \end{aligned}$$
(15)

which implies

$$\begin{aligned} y_{t}=\left\{ \begin{array}{cc} \varphi ^2\,y_{t-1}+\frac{\varphi }{\sqrt{2}}(\varepsilon _{2t-2}+\varepsilon _{2t-1})+\varepsilon _t,&{} {\text {for}}\,\,t=2,\dots ,N,\\ \varphi ^2\,y_{t-1}+\frac{\varphi }{\sqrt{2}}(\varepsilon _{2t-2}-\varepsilon _{2t-1})+\varepsilon _t,&{} {\text {for}}\,\,t=N+1,\dots ,2N. \end{array}\right. \end{aligned}$$
(16)

Thus (12) is proved.

Now, since \(\varphi ^2<1\) one can show that \(\{y_t\}\) is stationary and ergodic. Therefore, \(\mathbb {E}[y_t]=0\) and from \(y_t=\sqrt{1+\varphi ^2}(\varepsilon _t+\varphi ^2\varepsilon _{t-1}+\varphi ^2\varepsilon _{t-2}+\dots )\) we have

$$\begin{aligned} \mathbb {V}ar[y_t]=\mathbb {E}[y_t^2]=(1+\varphi ^2)\mathbb {E}[(\varepsilon _t+\varphi ^2\varepsilon _{t-1}+\varphi ^4\varepsilon _{t-2}\dots )^2]= \frac{1+\varphi ^2}{1-\varphi ^4}\sigma _\varepsilon . \end{aligned}$$

The proof is complete. □

Notice that we do not know whether the distributions of \(x_t\) and \(y_t\) coincide. We only know that first two moments are equal and we can assume that errors stay unchanged in the sense of the next model. Therefore, we also assume model

$$\begin{aligned} y_t=\varphi ^2\,y_{t-1}+\varepsilon _t \end{aligned}$$
(17)

(conditional) MLE estimation of \(\varphi\) for model (8) with \(t\in \{1,\dots ,N\}\) is

$$\begin{aligned} \hat{\varphi }=\frac{\sum _{t=2}^Nx_{t-1}\,x_t}{\sum _{t=2}^Tx^2_{t-1}} \end{aligned}$$

(notice here that the conditional MLE of \(\varphi\) is also the OLS estimator of \(\varphi\), but this does not hold for the exact MLE). This implies that for model (12) with \(t\in \{1,\dots ,N\}\) we have

$$\begin{aligned} \hat{\varphi }^2=\frac{\sum _{t=2}^Ny_{t-1}\,y_t}{\sum _{t=2}^Ty^2_{t-1}} \end{aligned}$$

if we neglect the term \(\sqrt{1+\varphi ^2}\approx 1\). It is important to note that for a sample of size N in order to estimate an AR(1) process by conditional MLE, we will use only \(N - 1\) observations of this sample.

Then, we transform the data \({\hat{\mathbf{y}}}\) given by model (12) with estimated \(\hat{\varphi }\) back by IHDWT to obtain

$$\begin{aligned} {\hat{\mathbf{x}}}=\mathbf{H}^{-1}_{2N}\,{\hat{\mathbf{y}}}=\mathbf{H}^{T}_{2N}\,{\hat{\mathbf{y}}}, \end{aligned}$$

since matrix \(\mathbf{H}_{2N}\) is orthogonal. The following diagram summarizes the main idea (Fig.  1).

Fig. 1
figure 1

Transformations and estimation

Here we define the MSE of an estimator \(\hat{\theta }\) with respect to an unknown parameter \(\theta\) in a standard way as \({MSE} ({\hat{\theta }})={\mathbb {E}} \left[ ({\hat{\theta }}-\theta )^{2}\right]\).

These theoretical results can be extended to other linear processes such as the ARMA (p,q) and ARIMA (p,d,q) (see [16] for details on the autocorrelation functions, estimation and forecasting of such models).

4 Simulation study

We have calculated estimates for different parameter values \(\phi\) and \(x_0\) (for fixed value of \(\sigma _\varepsilon\)) for data generated by model (8) in the following way. For each realization, we have computed (conditional) MLE for the original data (here we have used fixed times \(t\in \{1,\dots ,N\}, ~N=100\)), which has been repeated K times. Afterward average value \(\hat{\varphi }_\mathbf{x}\) and estimation of \(\displaystyle {\text {MSE}} ({\hat{\varphi }})\) as average value of \(\left[ ({\hat{\varphi }}-\varphi )^{2}\right]\) were computed (i.e., computing empirical version of \({\text {MSE}} ({\hat{\varphi }})\)). The same procedure has been done using transformed data \(\mathbf{y}\).

Here we have used the assumption that errors stay unchanged. Therefore, we have used (conditional) MLE for model (17). We have obtained estimates \(\hat{\varphi }_\mathbf{y}\). Tables 1, 2, 3 and 4 show achieved results.

From the results, it is obvious that for large numberFootnote 2 of repetitions the estimate \(\hat{\varphi }_\mathbf{x}\) is better in average than the estimate \(\hat{\varphi }_\mathbf{y}\). Nevertheless, when the number of repetitions, e.g., \(K=20\) in our case, is small, the (conditional) MLE estimation \(\hat{\varphi }_\mathbf{y}\) has better behavior; see Tables 3,4 and values of “average” MSE. This results suggest that in practice might be helpful to use transformed data in order to obtain an estimation of specific parameter, especially when we realize that we have typically only one realization measured.

Table 1 Estimation of parameter \(\varphi\), \(\sigma _\varepsilon =1, ~K=1000, ~N=100, ~x_0=1\)
Table 2 Estimation of parameter \(\varphi\), \(\sigma _\varepsilon =1, ~K=1000, ~N=100, ~x_0=0.1\)
Table 3 Estimation of parameter \(\varphi\), \(\sigma _\varepsilon =1, ~K=20, ~N=100, ~x_0=1\)
Table 4 Estimation of parameter \(\varphi\), \(\sigma _\varepsilon =1, ~K=20, ~N =100, ~x_0=0.1\)

On the other hand, if \(\hat{\mathbf{x}}\) is a vector of predictions, and \(\mathbf{x}\) is the vector of observed values corresponding to the inputs to the function which generated the predictions, then the MSE of the predictor can be estimated by \({\text {MSE}}={\frac{1}{n}}\sum _{{t=1}}^{n}({\hat{x_{t}}}-x_{t})^{2}\). We have used the mean squared prediction error, which measures the expected squared distance between what our predictor predicts for a specific value and what the true value is: \({\text {MSPE}(L) }= \mathbb {E}\left[ \sum _{i=1}^n\left( x_i - \hat{x}_i\right) ^2\right]\). See results in Tables 5, 6, where two types of errors were computed. Clearly the inversion of \(\mathbf{H}_{2N}\) complicates whole situation. Nevertheless, we have computed mean variances for specific models (8), (12) and (17); see Table 7. Surprisingly the nearest estimation has model (17).

Table 5 MSPE for \(\sigma _\varepsilon =1, ~K=20, ~N=100\) and \(\ell _2\) error
Table 6 MSPE for \(\sigma _\varepsilon =1, ~K=20, ~N=100\) and \(\ell _\infty\) error
Table 7 Estimation of variance for \(\sigma _\varepsilon =1, ~K=1000, ~N=100\)

5 Application to Ozone data

In this section, we apply the HDWT to the ozone time series; then, we estimate ARIMA model on the resulting coefficients (detail and scaling coefficients); finally, we apply the IHDWT on the predicted coefficients by ARIMA models in order to reconstruct the future behavior of the considered cumulated time series. Description of used models follows in the next subsection.

5.1 Models

5.1.1 Autoregressive AR(p)

According to [17], an autoregressive model of order p, abbreviated AR(p), has the form

$$\begin{aligned} x_{t} = \phi _{1} x_{t-1} + \phi _{2} x_{t-2} + \ldots {} + \phi _{p}x_{t-p} + \varepsilon _{t}, \end{aligned}$$
(18)

where \(x_{t}\) is stationary, and \(\phi _{0}, \phi _{1}, \phi _{2},\ldots {}, \phi _{p}\) are constants \((\phi _{p} \ne 0)\). Although it is not necessary, we assume for the sake of simplicity that \(\varepsilon _{t}\) is a Gaussian white noise with zero mean and variance \(\sigma _{w}^2\), unless otherwise stated.

5.1.2 Moving average model MA(q)

According to [17], a moving average model of order q, abbreviated MA(q), has the form

$$\begin{aligned} x_{t} = w_{t} + \theta _{1} w_{t-1} + \theta _{2} w_{t-2} + \ldots {} + \theta _{q}w_{t-q}, \end{aligned}$$
(19)

where there are q lags in the moving average and \(\theta _{1}, \theta _{2}, \ldots \theta _{q}\)\((\theta _{q}\ne 0)\) are parameters. Although it is not necessary yet, we assume that \(w_{t}\) is a Gaussian white noise series with mean zero and variance \(\sigma _{t}^2\), unless otherwise stated.

5.1.3 Autoregressive integrated moving average ARIMA(p,d,q)

According to [18], in order to make a time series stationary, one must differentiate it d times and then apply an ARMA (p,q) model. Thus the original series is ARIMA (p,d,q), where p denotes the number of autoregressive terms, d is the number of times the series must be differentiated and q is the number of terms of the invertible moving average.

The ARIMA model is given by

$$\begin{aligned} X_{t}^d & = c +\phi _{1} X_{t-1}^d, +\ldots {}, \phi _{p}X_{t-p}^d + \varepsilon _{t}\nonumber \\ & \quad + \theta _{1} \varepsilon _{t-1}^d,\ldots {}, \theta _{q}\varepsilon _{t-q}^d + \varepsilon _{t}^d \end{aligned}$$
(20)

Expressed in terms of the backshift operator, we have

$$\begin{aligned} \phi (L)(1-L)^d X_{t} = c + \theta _{L} \varepsilon _{t} \end{aligned}$$
(21)

where \(X_{t}^d\) is the series of differences of order d, \(\varepsilon _{t}^d\) is a white noise process, and \(c, \phi _{0}, \phi _{1}, \phi _{2},\ldots {}, \phi _{p}, \theta _{1}, \theta _{2},\ldots {}, \theta _{q}\) are the parameters of the model.

In the present article, we use ARMA and ARIMA models on the different vectors of scale coefficients and wavelets of the DWT.

5.2 Data analysis

The data used in this work have been recorded by the SINCA (Chile). In particular, we consider the hourly ozone concentrations (\(O_ {3}\)) gathered in the providence monitoring station of the Metropolitan Region of Santiago, Chile, during the period from October, 1, 2002, to December 19, 2002.

The methodology is illustrated in Fig. 2. Firstly the time series from SINCA database are transformed by the DWT using the Haar filter; then, an AR or ARIMA model is identified through the partial and total autocorrelation coefficients (ACF and PACF) for each sequence of wavelet coefficients corresponding to each level of resolution. Then we apply the ARIMA models to the resulting coefficients in order to predict the future coefficients, namely 1, 2, 3, 7 and 9 levels. Finally we implemented the IHDWT on the predicted wavelet coefficients in order to obtain the future concentrations of the \(O_ {3}\) pollutant.

Since the wavelet algorithm works on time series which length is power of two, we extracted 16 hours for each day (from 6:00 to 21:00) (see Fig. 3).

The determination of the level of decomposition is fundamental since it provides the number of vectors with scaling coefficients and wavelets. For example, if we use three levels of decomposition, we obtain three vectors of scale coefficients and three vectors of wavelet coefficients, where the vectors of wavelet coefficients represent the detail of the series (see Fig. 4). For the reconstruction of the original series, the detailed vectors and the first scale in correspondence with the level of decomposition are used.

Fig. 2
figure 2

Methodology

Fig. 3
figure 3

Hourly Ozone concentrations from October, 1, 2002 to December 19, 2002, from 6:00 to 21:00

Fig. 4
figure 4

Wavelet decomposition

Different levels of resolution have been used in the reconstruction step. In particular, to reconstruct the series two levels of the DWT have been applied.

With respect to the two levels, we used the prediction of the coefficients in a cumulative way, that is, step by step, updating the time series and obtaining the predicted values with the ARIMA model in correspondence with the appropriate time interval.

In Figs. 5 and 6, we show the time series with the HDWT for a number of nine and two levels of decomposition, respectively. The first sequence of coefficients represents the scaling coefficients and the others the detailed coefficients of the HDWT. The original time series is at the bottom of Figs. 5 and 6.

Fig. 5
figure 5

Nine levels of decomposition

Fig. 6
figure 6

Two levels of decomposition

Then we performed the prediction of the wavelet coefficients and the reconstruction of the ozone time series for the next 10 days, that is, from December 20, 2002, to December 29, 2002. By using three and seven levels of decomposition, we obtain the prediction represented in Fig. 7.

We did not observe the significant difference between the two reconstructed series. Now let us consider one level in cumulative way, i.e., performing the prediction one step ahead and updating the data at each step. Figure 8 shows that the prediction is much better when using only one accumulated level of resolution.

Fig. 7
figure 7

Ozone prediction using seven and three levels of resolution

Fig. 8
figure 8

Ozone prediction using one cumulated level of resolution

Figure 9 compares the ozone prediction obtained by applying the ARIMA models on the transformed series (with two and three levels of resolutions) cumulated on the original ozone series. Differently to the classical application of the ARIMA models, the wavelet approach allows to predict the extreme events in cumulated ozone time series. Such extremes are very important from environmental and population health points of view.

Fig. 9
figure 9

Application of ARIMA models on original and transformed time series: The purple line illustrates the original time series, the pink line the prediction using two cumulated levels of resolution, the line green using 3 cumulated levels of resolution, and the blue line is the resulting of the ARIMA prediction without the application of the HDWT (color figure online)

Figure 10 plots the reconstructed time series with level 2 of accumulated reconstruction accompanied with the upper and lower limits of the 95 % confidence interval. Such confidence interval is important for further statistical analysis of the series.

Fig. 10
figure 10

Predicted ozone time series with its confidence interval and original ozone time series

6 Conclusion

We shown in this paper that the predictions obtained by proper combination of HDWT and ARIMA models are better than those obtained by the simple application of the ARIMA models to the original data. This is mainly because of complexity and non-linearity intrinsically present in ozone pollution data. Moreover the 95% confidence interval contains all the points of the reconstructed series. Thus the proposed methodology provides a new tool for improving the prediction of the ozone concentrations which could allow the competent authorities to take corrective and preventive actions in order to decrease the pollutant levels and consequently reduce the negative impact on the human health.