D-STEM : A Software for the Analysis and Mapping of Environmental Space-Time Variables

This paper discusses the software D-STEM as a statistical tool for the analysis and mapping of environmental space-time variables. The software is based on a flexible hierarchical space-time model which is able to deal with multiple variables, heterogeneous spatial supports, heterogeneous sampling networks and missing data. Model estimation is based on the expectation maximization algorithm and it can be performed using a distributed computing environment to reduce computing time when dealing with large data sets. The estimated model is eventually used to dynamically map the variables over the geographic region of interest. Three examples of increasing complexity illustrate usage and capabilities of D-STEM, both in terms of modeling and implementation, starting from a univariate model and arriving at a multivariate data fusion with tapering.


Introduction
The understanding of complex environmental phenomena usually requires the analysis of multiple variables observed over space and time, resulting in possibly large and complex data sets.When multivariate space-time data sets are considered, it is common to rely on statistical spatio-temporal models able to exploit the correlation across variables and to provide spacetime predictions over the geographic region of interest (Cressie and Wikle 2011).
This paper introduces the D-STEM (distributed space time expectation maximization) software as a statistical tool for the analysis of environmental space-time data sets and the prediction, uncertainty included, of the observed variables.D-STEM is developed in the MATLAB (The MathWorks, Inc. 2010) language and it is available at https://code.google.com/p/d-stem/.The modeling capabilities of D-STEM are detailed in this paper by introducing three case studies of increasing complexity.The reader can download the D-STEM_v4.7.11_Full.ziparchive -either from the journal web page or from the link above -including source code and demo folders with replication materials.Please follow the instructions given in the ReadMe.txtfile of the demo folder to reproduce the case studies.D-STEM requires the Statistics toolbox, the Optimization toolbox and the Mapping toolbox (The MathWorks, Inc. 2010).D-STEM is the evolution of the R (R Core Team 2014) package Stem (Cameletti 2012) which provides space-time data modeling capabilities by means of hierarchical space-time models within the frequentist paradigm.Excluding the many packages for spatial data, only few R packages can handle space-time data and even fewer are suitable for multivariate spacetime data.The R package spTimer (Bakar and Sahu 2014b,a) implements space-time models similar to those implemented by Stem but model estimation is performed within the Bayesian setting.Compared to the packages Stem and spTimer, D-STEM allows to estimate a larger class of univariate and multivariate hierarchical space-time models and it is optimized for large data sets.The gstat package (Pebesma and Gaeler 2013) can deal with multivariate space-time data but data interpolation is based on variogram modeling.When modeling environmental space-time variables, D-STEM is an alternative to the R package INLA (Rue, Martino, Lindgren, Simpson, Riebler, and Krainski 2014) which implements the integrated nested Laplace approximation (INLA) and the stochastic partial differential equation (SPDE) modeling approaches.Although D-STEM and the INLA package are based on hierarchical models and latent variables, the space-time models they implement overlap only partially and the user may benefit from using them both depending on the specific application (Cameletti, Lindgren, Simpson, and Rue 2013).
D-STEM has been tested by the authors in various real-data applications.At the urban scale, it has been used for assessing the space-time impact of traffic policies in Milan city (Fassò 2013).At the country scale, it has been used for evaluating multi-variable air quality indexes and for assessing the airborne pollutant exposure distribution in Scotland (Finazzi, Scott, and Fassò 2013).At the continental scale, considering a large data set of both ground level and remote sensing data, it has been used for air quality dynamic mapping over Europe (Fassò and Finazzi 2013).
The rest of the paper is organized as follows.Section 2 describes the capabilities of the software in terms of data modeling and data handling in general terms.Section 3 introduces the software classes at the basis of D-STEM.Sections 4, 5 and 6 illustrate software usage and capabilities considering the three case studies implemented in the above mentioned demo, which are based, respectively, on univariate, multivariate and data fusion models.Section 7 describes three options for handling large data sets and in particular tapering, distributed computing and software configuration to reduce the computational burden.Conclusions are given in Section 8.

Modeling capabilities
The parametric statistical model implemented in D-STEM is based on latent space-time random variables and space-time varying coefficients.The varying coefficients can be either observed covariates or the loadings derived from some basis functions.The model, thus, can reach a high level of flexibility and it is suitable for modeling variables over large geographic regions.
The latent spatial random variables are modeled as Gaussian random fields with a Matérn correlation function and, in the case of multiple variables, the spatial cross-correlation is modeled through the linear coregionalization model (LCM).On the other hand, time is assumed to be discrete and it is modeled through latent temporal random variables with Markovian dynamics.
In many applications, the observations of a variable must be calibrated using the observations of a second variable or a given variable is observed using more than one instrument and/or technique.For instance, remote sensing data are often calibrated using ground level data.D-STEM allows to jointly solve the calibration and the data fusion problems.In particular, point data and pixel/block data can be handled in a multivariate setting.
Model parameters are estimated following the maximum likelihood approach by means of the expectation maximization (EM) algorithm.When large data sets are considered, the tapering approach can be used in order to obtain sparse variance-covariance matrices reducing the computing time.If a computer cluster is available, model estimation can be performed in a distributed manner exploiting all the available CPU as well as CPU cores.
Details on the mathematical structure of the model at the basis of D-STEM are given in the following sections while model estimation formulas and their derivation can be found in Fassò and Finazzi (2013) and references therein.

Data handling
Multivariate space-time data sets are challenging as, in general, each variable can be observed at different spatial locations and missing data are the rule rather than the exception.D-STEM is able to handle heterotopic data sets where each variable is observed at possibly different sets of spatial locations.The sets of spatial locations or the grids of pixels are assumed to be time invariant.As a consequence, the single observation is considered to be missing if it is not observed at a given time step.Missing data, however, are automatically handled without the need of data imputation or interpolation.

Model output
The result of model estimation consists of the values of the estimated parameters, their variance-covariance matrix and the observed data log-likelihood.Moreover, cross-validation mean squared error can be obtained for each variable following a 2-fold cross-validation approach.The estimated model is eventually used to dynamically map each variable at high spatial resolution over the geographic region.

Software structure
D-STEM is based on the object oriented paradigm.Data handling and analysis are thus performed by creating objects from the D-STEM classes and by calling the appropriate methods.The following list describes the classes that the end user should manage.

Data handling
-'stem_varset' -the class contains the observed data of all the variables and the loading coefficients; -'stem_grid' -the class contains all the information related to the sampling locations of a single variable; -'stem_gridlist' -the class is the collector of the stem_grid objects for all the variables; -'stem_datestamp' -the class contains the information related to the date and time of the observations; -'stem_data' -the class is the collector of the 'stem_varset', 'stem_gridlist' and 'stem_datestamp' objects and it provides methods for preliminary data manipulation.
Model and model estimation -'stem_par' -the class contains the structure and the values of the model parameters; -'stem_model' -the class is the collector of the 'stem_data' and the 'stem_par' objects and it provides methods for model estimation; -'stem_EM_options' -the class includes the options of the EM algorithm used for model estimation; -'stem_crossval' -the class contains the information needed for cross-validation and the cross-validation result; -'stem_sim' -the class is used to simulate a data set from a given model.

Model estimation result
-'stem_EM_result' -the class contains the result of the EM estimation; -'stem_kalmansmoother_result' -the class contains the output of the Kalman smoother implemented within the EM algorithm.

Kriging
-'stem_krig' -the class includes all the information needed for mapping a variable over space and time using a dynamic kriging technique; -'stem_krig_result' -the class contains the result of kriging.

Auxiliary
-'stem_misc' -the class provides miscellaneous methods used by the mother classes.
All the methods of the class 'stem_misc' are static which implies that they can be called without creating an object of class 'stem_misc'.
Details on all class constructors, properties and input/output arguments can be displayed using the command doc <class_name> in the MATLAB environment.

Univariate model
The first case study concerns mapping the concentration of daily average nitrogen dioxide (NO 2 ) over Northern Italy for 2009, which is measured by n 1 = 194 ground level monitoring stations irregularly located over the geographic region.Three covariates are considered: wind speed, x wind (s, t), which is a space-time covariate, land elevation, x land (s), which is a purely spatial covariate and the dummy x sun (t) for Sunday, which is purely temporal.
The following model for the response variable y N O 2 (s, t) observed at spatial location s and time t is considered: where β j , j = 1, . . ., 3 are coefficients to be estimated, while z(t) is a stochastic time trend.Note that, x land (s) has a stochastic varying coefficient β 2 (s, t) = β 2 + αw(s, t) where β 2 is the global effect of land elevation on NO 2 while αw(s, t) is the random "variation" of β 2 specific to location s and time t.Since w(s, t) has unit variance, α is a scale parameter that can be tested to assess whether β 2 (s, t) is constant or not.Finally, z(t) is a scalar Markovian process modeling the temporal persistence of the pollutant while ε(s, t) is the measurement error.

Model description
The model in Equation 1 is a special case of the following general univariate model implemented in D-STEM: where y(s, t) is the scalar observation at time t ∈ {1, . . ., T } and spatial location s ∈ D.
Depending on the coordinate system of the data, two options are available, namely D ⊂ 2 or D ⊂ S 2 , where S 2 is the sphere in 3 .
In Equation 2, µ(s, t) represents the following fixed effect model: where x β (s, t) is a 1 × b dimensional vector of known coefficients and β is to be estimated.Moreover ω(s, t) represents the following random effect model: where x j (s, t), j = 1, . . ., c, and x z (s, t) are scalars and a 1 × p dimensional vector of known coefficients, respectively, while α j , j = 1, . . ., c, are to be estimated.
The p-dimensional latent component z(t) has the following Markovian dynamics: with transition matrix G assumed to have eigenvalues smaller than 1 in absolute value and innovations η(t) ∼ N p (0, Σ η ).Eventually the variables w j (s, t) are zero-mean and unitvariance independent Gaussian processes uncorrelated over time but correlated over space with Matérn spatial covariance function where s − s is the distance between two generic spatial locations, θ j > 0 and ν > 0 are parameters, Γ is the gamma function and K ν is the modified Bessel function of the second kind.Finally, ε(s, t) is a zero-mean Gaussian process uncorrelated over space and time with variance σ 2 .For this model setup, the parameter set to be estimated is where α = {α 1 , . . ., α c } , θ = {θ 1 , . . ., θ c } and v η is the p(p+1)/2 dimensional vector of unique elements of Σ η .Note that, for each Matérn correlation function used, only the parameter θ j is estimated while the smoothing parameter ν is fixed and can be chosen as 1/2, 3/2 or 5/2.
The model in Equation 2extends the model developed in Fassò and Finazzi (2011) by allowing the interaction between the latent spatial variables w j (s, t) and the loading coefficients x(s, t) and by allowing y(s, t) to be missing.The structure of the model in Equation 2is quite general and special cases thereof have already been used.For instance, Katzfuss and Cressie (2011) consider a similar model that includes both covariates and loading coefficients from basis functions.Although the software considers space-time varying x j (s, t), and allows to implement space-time varying coefficients such as the β 2 (s, t) in our case study, the simpler setup given by x j (s, t) = 1 is quite common to model a spatial trend, see for example Fassò (2013) where spatial correlation is considered a nuisance parameter.In general, we suggest to use coefficients x j (s, t) which are fixed in space and/or time.This is because w j (s, t) is itself space-time variant and identifiability issues may occur.
In our case study, the vector x β (s, t) includes all the covariates in Equation 1, x z (s, t) = 1 while x 1 (s) is equal to the land elevation.Moreover, ν = 1/2, namely the exponential correlation function is considered.

Software implementation
This paragraph describes the relevant lines of code of the demo_section4.m script related to the case study previously introduced.The script can be executed choosing option number one from the dstem_demo.m script.
It is assumed that observations and covariates are stored in MATLAB format files.In general, the user has to take care of loading the data from external sources and formatting them as requested by the class constructors.
Although not mandatory, the temporary data structure ground will be used to pass the data to the class constructors.In the following lines of code, data related to the NO 2 concentration are loaded into the structure ground along with the variable name.Note that the term "ground", referring to the monitoring network data, is used to contrast them with "remote" sensing data of Section 6.
Similarly, the loading coefficients are loaded into the same ground structure.Note that all the loading coefficients are supposed to be observed without error and/or missing data for each day and each spatial location.The loading coefficients related to β are directly obtained from the covariates as detailed below.
>> load ../Data/no2_ground_covariates.mat >> ground.X_beta{1} = X; >> ground.X_beta_name{1} = {'wind speed', 'elevation', 'sunday'}; The variable X is a n 1 × b × T array, with b = 3 the number of loading coefficients.Note that, since wind speed is time-variant, the other covariates are replicated T times in order to fill the third array dimension.If all the loading coefficients are time-invariant, however, X is simply a n 1 × b matrix.
The loading coefficients related to z(t) are constant and equal to 1 and they are defined in the following way.
>> ground.X_p{1} = ground.X_beta{1}(:, 2, 1); >> ground.X_p_name{1} = {'elevation'}; The suffix "_p" in X_p and X_p_name refers to ground data, which are assumed to be point data, and it is necessary to differentiate them from the remote data of Section 6, which are assumed to be block or pixel data.In general, X_p is a n 1 × 1 × T × c array.Since, in this case study, c = 1 and land elevation is time invariant, then X_p is simply a n 1 × 1 vector.
At this point, the obj_stem_varset_p object of class 'stem_varset' can be created using the class constructor as follows.
>> The constructor of the class 'stem_grid' requires to specify some information about the grid and in particular the unit of measure (degrees, kilometers or meters), the configuration of the spatial locations (sparse or regular) and the grid type (points or pixels).
The temporal information of the observed data is provided as follows and it is used when model output is displayed.
The objects so far created are necessary for the constructor of the class 'stem_data' to produce the obj_stem_data object as follows.
>> shape = shaperead('../Maps/worldmap'); >> obj_stem_data = stem_data(obj_stem_varset_p, obj_stem_gridlist_p, [], ... [], obj_stem_datestamp, shape); The third and fourth input arguments are empty as they are not required for this case study and will be discussed in Section 6.A custom map of the geographic region can be loaded from a shape file and passed as input argument to the constructor.A map of the world country boundaries is provided along with the case study data.
Along with the information about the type of the spatial correlation function (exponential in this case), the obj_stem_data object is needed to create the obj_stem_par object of class 'stem_par'.
>> obj_stem_model = stem_model(obj_stem_data, obj_stem_par); In order to improve the numerical stability of the model estimation algorithm, observations and loading coefficients are standardized using the standardize method of the class 'stem_data' as follows.
>> obj_stem_model.stem_data.log_transform;>> obj_stem_model.stem_data.standardize; The log_transform method only acts on the response variable y(s, t) and it is used here to reduce the distribution asymmetry.
The EM algorithm requires the model parameters to be initialized to some starting values.The estimation result may depend on the starting values and they must be chosen carefully.In its current version, D-STEM can automatically provide starting values only for the β parameter vector.The following lines of code describe the initialization of the model parameters.
At this point, model estimation can be performed by calling the method EM_estimate of the class 'stem_model' which requires as input argument an object of class 'stem_EM_options'.
>> exit_toll = 0.001; >> max_iterations = 100; >> obj_stem_EM_options = stem_EM_options(exit_toll, max_iterations); >> obj_stem_model.EM_estimate(obj_stem_EM_options); >> obj_stem_model.set_varcov;>> obj_stem_model.set_logL; The variance-covariance matrix of the estimated model parameters and the observed data loglikelihood are evaluated after model estimation using the methods set_varcov and set_logL of the class stem_model.All the relevant information about model estimation can be found in the internal object stem_EM_result which can be accessed as a property of the obj_stem_model object.After model estimation, the obj_stem_model object is saved in the subfolder Output of the Demo folder.
Using the print method of class 'stem_model', the following output is obtained.The standard deviations related to each estimated model parameter are directly obtained from the diagonal elements of the variance-covariance matrix.
The estimate of the latent temporal variable z(t) is stored in the stem_kalmansmoother_result object which is a property of the stem_EM_result object.The graph of Figure 1 shows the estimated temporal variable and it has been obtained calling the method plot of class 'stem_kalmansmoother_result'.
The estimated model is eventually used to map the NO 2 concentration over the geographic region following the kriging approach.Since the loading coefficients of this case study consist of a set of covariates, the same covariates must be available for the entire region as a regular grid with the proper spatial resolution.
The first step toward mapping is to create the obj_stem_krig object of class 'stem_krig' in the following way.

>> obj_stem_krig = stem_krig(obj_stem_model);
Since the kriging output is evaluated over a regular grid, the obj_stem_krig_grid object of class 'stem_grid' is created as follows.
Figure 1: Estimated latent variable z(t) and 95% confidence interval for the univariate model.
At this point, two important aspects must be considered.The first one is related to the grid pixels.If the variable should not be predicted over some areas of the geographic region, then it is possible to provide a mask of the pixels that must be excluded.This allows to reduce computing time.
The second aspect is related to the dimension of the grid and memory usage.If the grid is large and/or very dense, the number of pixels can be high and the loading coefficients may require a lot of memory when loaded.In order to avoid memory problems, the loading coefficients (related to the non-masked pixels) can be saved on disk within different blocks.Kriging is then executed block by block without the need of loading the entire data set of loading coefficients.On the other hand, when pixels are low in number, the user can implement kriging providing all the coefficients at once.All the details about the two approaches are found within the help of the class 'stem_krig'.In this paper, the first approach is considered as more complex in terms of data structure.
Kriging is executed by calling the method kriging of the class 'stem_krig' as it follows.
The kriging result is saved in the obj_stem_krig_result object and the plot method can be used to display the result on a map.For example, the estimated NO 2 concentration and the respective standard deviation for April 10, 2009 are depicted in Figure 2. Note that, since w(s, t) and x land (s) are interacted, the spatial pattern of the standard deviation does not reflect the monitoring network, that is, the standard deviation is not necessarily lower near the monitoring stations.

Multivariate model
In order to demonstrate the multivariate capabilities of D-STEM, a simple bivariate case is introduced.A more complex case study with three variables can be found in Finazzi et al. (2013).
In addition to the NO 2 data of the previous section, measurements of particulate matters concentration PM 2.5 coming from n 2 = 44 monitoring stations over the same geographic region are considered.Note that only a subset of the PM 2.5 measurements are co-located to the NO 2 measurements.D-STEM, however, allows for fully or partially heterotopic networks.In this way, the spatial information of the more dense NO 2 monitoring network may be used to improve PM 2.5 mapping.
The response variable is now y(s, t) = (y N O 2 (s, t), y P M 2.5 (s, t)) and the model in Equation 1is extended by introducing a bivariate temporal component z(t) and a bivariate space-time component w(s, t) modeled through an LCM.The same covariates of the previous section are considered for both NO 2 and PM 2.5 .

Model description
Multivariate models are tackled considering the following straightforward extension of the model in Equation 2: which unifies the modeling approaches developed in Zhang (2007), Fassò and Finazzi (2011) and Finazzi et al. (2013).
In particular, extending fixed and random effect models of Equations 3 and 4, we have µ(s, t) = X β (s, t)β and where the symbol represents the element by element or Hadamard product; moreover y(s, t), α j , x j (s, t), w j (s, t), j = 1, . . ., c and ε(s, t) are q × 1 vectors, while where blockdiag is the block diagonal building operator.The vectors of loading coefficients x β,i (s, t) have dimensions 1 × b i for i = 1, . . ., q while the vectors x z,k (s, t) have dimensions 1 × a k for k = 1, . . ., p.In Equation 7, each multivariate spatial latent variable w j (s, t), for each fixed t, is modeled as a LCM with the following spatial variance-covariance matrix functions where V j is a valid q × q correlation matrix and j = 1, . . ., c. On the other hand, the elements of ε(s, t) are independent and normally distributed with variances σ 2 i , i = 1, . . ., q.It follows that the parameter set for the model in Equation 6is where σ 2 = (σ 2 1 , . . ., σ 2 q ) , α is the cq × 1 dimensional vector obtained by stacking α 1 , . . ., α c and v is the cq(q − 1)/2 × 1 dimensional vector obtained by stacking the unique and non diagonal elements of V 1 , . . ., V c .

Software implementation
This paragraph describes the relevant lines of code of the demo_section5.m script.The script can be executed choosing option number two from the dstem_demo.m script.Since demo_section4.m of Section 4 and demo_section5.m are similar, only the differences induced by the multivariate setting are detailed here.
The additional data related to the PM 2.5 variable are loaded into the temporary structure ground in the following way.>> load ../Data/pm25_ground.mat >> ground.Y{2} = pm25_ground.data;>> ground.Y_name{2} = 'pm2.5ground'; >> n2 = size(ground.Y{2}, 1); Note that the second cell of the cell arrays Y and Y_name is used.The same strategy is followed for X_beta, X_beta_name and so further.The coordinates of each variable must be provided separately and added to the obj_stem_gridlist_p object as follows.
In Equation 9, the fixed and random effects have a structure similar to Equation 6 with c = 1, namely The resolution change between point and block data is defined by where w(s, t) is a zero-mean Gaussian process with variance-covariance matrix function as defined in Equation 8 with parameters V B and θ B .Similarly to point data, the measurement error vector ε B (B, t) is assumed to be uncorrelated over space and time and across variables with variance vector σ 2 B = (σ 2 B,1 , . . ., σ 2 B,q ) .In Equation 10, the additional remote-ground link term is given by ω(s, t) = α BP x BP (s, t) w(s, t), where, the parameter vector α BP gives the intensity of the correlation between pixel and point variables, which can be modeled by the q × 1 vector of loading coefficients x BP (s, t).
Note that z B (t) is an additional Markovian component as the temporal dynamics may differ beween remote and ground data.
The parameter set for the data fusion model is where v B is the q(q−1)/2×1 dimensional vector obtained by stacking the unique, non-diagonal elements of V B .
In many environmental applications, the grid of pixels is regular (all the pixels have the same shape and dimension), the pixels are small compared to the geographic region and the process w(s, t) is smooth.In this case the approximations w B (B, t) w(s * , t) and w(s, t) w(s * , t), where s * is the center of B, are reasonable.D-STEM implements these approximations avoiding the time-consuming computation of the integral in Equation 12for each pixel B and each iteration of the EM algorithm.Possible errors induced by this approximation are covered by ε B (B, t).
The model in Equations 9 and 10 is similar to the Gaussian Markov random field smoothed downscaler developed in Berrocal, Gelfand, and Holland (2012), with the main difference that D-STEM handles multivariate data and use Gaussian processes instead of Gaussian Markov random fields.In fact Gaussian processes, thanks to the EM algorithm, are more suitable for handling extensive missing data which often arise in remote sensing.

Software implementation
This paragraph describes the demo_section6.m script which can be executed choosing option number three from the dstem_demo.m script.Only the code related to the pixel variables and the downscaler are discussed here, the reader being referred to the previous Section 5.2.
A simple version of the model in Equation 9 is considered here.In particular, remote sensing data are described by the equation and, similarly, x BP (s, t) = 1 is used in Equation 13.Hence a constant vector is added to the temporary data structure ground of Section 5.2 as follows.
>> ground.X_bp{1} = ones(n1, 1); >> ground.X_bp_name{1} = {'constant'}; >> ground.X_bp{2} = ones(n2, 1); >> ground.X_bp_name{2} = {'constant'}; The observations related to the pixel variables are loaded from disk as detailed in the following lines of code.The resulting maps are displayed in Figure 4.The observed pixel data are characterized by large areas of missing data but the latent variable w B (B, t) allows to reconstruct the missing data and to filter the observed data corrupted by noise.Moreover, since w B (B, t) is used to model both point data and pixel data, the reconstruction of the missing pixel data benefits from the observed point data.

Large data sets handling
The statistical models that D-STEM implements are separable with respect to space and time.If N is the total number of spatial locations where all the point variables are observed and T is the total number of time steps, then the largest variance-covariance matrix that D-STEM handles is only N × N .If pixel variables are also considered and the total number of pixels where all the variables are observed is M , then the largest variance-covariance matrix is D × D, where D = max(N, M ).In many applications, however, N and D can be large and both computing time and memory usage can increase drastically.In the following paragraphs, three strategies that D-STEM provides for reducing computing time are discussed.Even if these strategies are intended for large data sets, in order to keep the computing time feasible, the same case studies of the previous sections are considered.

Tapering
The tapering approach consists of adopting a sparse variance-covariance matrix characterized by a high percentage of zero elements (possibly higher than 90%).The idea behind tapering is that spatial locations at great distance should not exhibit spatial correlation.Hence, tapering forces to zero the covariances of observations at distances higher than a threshold in such a way that the positive definiteness of the variance-covariance matrix is preserved.
When the generic variance-covariance matrix A is used to solve matrix equations in the form Ax = b, the computing time is greatly reduced if A is sparse.The higher the matrix sparseness the lower the computing time.In order to model spatial correlation, however, the above mentioned threshold cannot be too low and there exists a trade-off between low computing time and good approximation of the latent Gaussian processes used to describe the spatial correlation.
In order to implement tapering, D-STEM applies the so called one-taper tapering of Kaufman, Schervish, and Nychka (2008) to each LCM component w j (s, t).To do this, in the spatial correlation matrix function of Equation 8, the Matérn correlation function ρ is substituted by where Φ s−s φ is the compactly supported radial Wendland function (Wendland and Mathematik 1995), with φ the width of the radial function.
Although the asymptotic theory of Kaufman et al. (2008) is proven for the purely spatial univariate case, T = 1 and q = 1, in light of the results of Ruiz-Medina and Porcu (2014), we conjecture here that it holds true also for the purely spatial multivariate case, T = 1 and q > 1, and, a fortiori, for the space-time case with large T .Note that, unlike the so called two-tapers tapering of Kaufman et al. (2008), the one-taper may give a biased estimate of θ in

Computing load distribution
Thanks to the separability between space and time, most of the matrix algebra operations at the basis of the EM algorithm only involve the data of a single time step.Moreover, each time step is independent from the previous and the next time steps.The Kalman filter itself is implemented in such a way that some operations executed at time t are independent from the result of the operations at time t − 1.
If a cluster of computing nodes is available, then model estimation can be performed in a distributed manner.In particular, if n nodes are available and T ≥ n, then the data set is split into n temporal frames and distributed to the nodes on the basis of the node speed (evaluated at each EM iteration) and the number of missing data in each time frame.Indeed, time steps characterized by a high missing data rate imply faster computing.
The computing nodes can be any number of heterogeneous machines connected through a local area network (LAN) and no additional parallel and/or distributed software libraries are required.All the nodes must be able to read and write to a common shared folder and each node must run at least one MATLAB process.One node, usually the fastest or the one with the highest quantity of RAM, is designated to be the master while all the other nodes are considered slaves.The number of slaves can change during model estimation but the master node must always run.
In order to estimate the model in a distributed manner, a script that calls the daemon.mfunction must first be executed on each slave node possibly in batch mode.The function requires as input argument the path of the shared folder to use.The master runs the usual main script but the name of the shared folder, as well as additional parameters, must be given as properties of the obj_stem_EM_options object.
Choosing option number five from the dstem_demo.m script, the case study of Section 6 is reproduced and model estimation is carried out in distributed manner.To avoid the complication of setting up a distributed environment, the dstem_demo.m script starts a second MATLAB process in which the demo_runslave script is executed and that, in turn, executes the daemon.mscript.The original MATLAB process executes the demo_section7_2.mscript which differs from the demo_section6.m script with respect to the following lines of code.The timeout input argument is the time in seconds that the master waits when listening for the slave nodes.It is worth knowing that the content of NFS shared folders on UNIX distributed environments is not always updated in real time.If the user cannot change the updating time of NFS folders, then the timeout input argument must be increased in order to ensure that master and slaves can always read the files written in the shared folder.
The output of model estimation is equal to the output already reported and discussed in Section 6.

Observed data log-likelihood evaluation
By default, D-STEM compares the estimated parameters and the observed data log-likelihood between two consecutive EM iterations.If the relative norm of the difference between the parameter vectors or between the log-likelihoods is lower than the tolerance specified by the property exit_toll of class 'stem_EM_options' (see Section 4.2), the EM algorithm stops.In the case of large data sets, computing the log-likelihood at each EM iteration is time consuming.In order to speed up model estimation, the evaluation of the log-likelihood can be avoided and the exit condition is only based on the model parameters.
The demo_section7_3.m script, which can be executed choosing option number six from the dstem_demo.m script, implements the same case study as in Section 4 but model estimation is carried out without computing the observed data log-likelihood at each iteration.
The following lines of code describe how the obj_stem_EM_options object is created in order to avoid the evaluation of the log-likelihood at each iteration.Due to the different exit condition, the model parameters estimated without computing the log-likelihood at each iteration differ from those estimated in Section 4. In particular, the θ parameter decreased from 31.42 to 22.45 km.Nonetheless, the observed data log-likelihood, evaluated after model estimation, is only slightly higher (−13797.357 vs. −13889.064).This is due to the fact that, using a non-large data set as in this case study, the θ parameter of Equation 5 is poorly identifiable and it monotonically changes from one iteration to the next even if the observed data log-likelihood does not change significantly.
Although the computing time of each iteration is reduced, thus, a possible drawback is that the total number of iterations required to estimate the model might be higher.The user should be careful when adopting this strategy as, if poor identifiability of θ is not detected, the EM algorithm might not converge or it might take more iterations than necessary.In the above example, the EM algorithm takes 78 iterations to converge with respect to the 32 iterations required when the log-likelihood is evaluated at each iteration.Again, this strategy to reduce computing time is intended for large data sets.

Conclusions
In this paper, the use of D-STEM has been illustrated for three different case studies involving a univariate model, a bivariate model and a data fusion model.The model at the basis of D-STEM is general enough to accommodate many environmental data sets, nonetheless, both model and software can be extended with respect to many aspects.From the modeling point of view, additional spatial correlation functions could be introduced as well as more flexible "coregionalization models" (Apanasovich and Genton 2010; Gneiting, Kleiber, and Schlather 2010).Moreover, Markov random fields could be introduced in order to model pixel data.Indeed, Gaussian random fields easily handle data sets with extensive missing data but they are more computationally expensive even under tapering.Finally, it could be useful to extend the model to accommodate for time-varying grids and irregularly spaced sampling times.
Regardint the software side, some time consuming procedures such as the estimation of the variance-covariance matrix of the model parameters and kriging could also be implemented in a distributed manner.Furthermore, the handling of the model parameters could be improved by introducing constraints on the parameter vectors and matrices, widening the range of models that can be estimated.
D-STEM is constantly updated and improved and new versions are released on https:// code.google.com/p/d-stem/.Google Code runs a project hosting service that provides revision control and an issue tracker.The users of D-STEM are welcome to notify bugs and to submit extensions or improvements of the code.

Figure 2 :
Figure 2: Estimated NO 2 concentration [µg • m −3 ] below 800 m of elevation (top) and standard deviation (bottom) for April 10, 2009 over Northern Italy using the univariate model.Monitoring stations are depicted by the '+' symbol.