A method to generate fully multi-scale optimal interpolation by combining efficient single process analyses , illustrated by a DINEOF analysis spiced with a local optimal interpolation

We present a method in which the optimal interpolation of multi-scale processes can be expanded into a succession of simpler interpolations. First, we prove how the optimal analysis of a superposition of two processes can be obtained by different mathematical formulations involving iterations and analysis focusing on a single process. From the different mathematical equivalent formulations, we then select the most efficient ones by analyzing the behavior of the different possibilities in a simple and well-controlled test case. The clear guidelines deduced from this experiment are then applied to a real situation in which we combine largescale analysis of hourly Spinning Enhanced Visible and Infrared Imager (SEVIRI) satellite images using data interpolating empirical orthogonal functions (DINEOF) with a local optimal interpolation using a Gaussian covariance. It is shown that the optimal combination indeed provides the best reconstruction and can therefore be exploited to extract the maximum amount of useful information from the original data.


Introduction
Optimal interpolation (in the following noted OI) is well established (e.g., Gandin, 1965;Delhomme, 1978;Bretherton et al., 1976) and a reference tool when analyzing satellite images.The method has therefore been applied in a large number of scientific studies (e.g., Kawai et al., 2006) and operational setups (e.g., Stark et al., 2007;Donlon et al., 2012;Nardelli et al., 2013).To be optimal, the method requires the correct specification of covariance matrices, most of the time given by parametric functions (e.g., Reynolds and Smith, 1994).Another approach for analyzing a set of satellite images (data interpolating empirical orthogonal functions -DINEOF) uses the data to create a truncated empirical orthogonal function (EOF) representation of the data set to fill in missing data (e.g., Beckers and Rixen, 2003;Alvera-Azcárate et al., 2005, 2007).The latter method has been favorably compared to OI and has been exploited in a series of applications (e.g., Sheng et al., 2009;Ganzedo et al., 2011;Nikolaidis et al., 2014;Wang and Liu, 2014), including operational setups (e.g., Volpe et al., 2012).In some situations, however, the truncation of the EOFs series can reject some interesting small-scale features that only give a small contribution to the total variance, and that can often be split into several modes (Sirjacobs et al., 2008).This is due, on one hand, to the fact EOF truncation is related to the percentage of variance that would be associated with noise and, on the other hand, to the limits of EOF decomposition itself in identifying evolving mesoscale features in a single mode (actually, any feature propagating across the spatial domain is split into several modes) and under clouds.Hence, there might remain small-scale information not fully exploited.One natural approach would be to first analyze the large scales with DI-NEOF and then add an analysis of the residuals using a local optimal interpolation.We will show that this simple approach provides good results but is still suboptimal.The purpose of the present paper is thus to recover additional pieces of Published by Copernicus Publications on behalf of the European Geosciences Union.
information in the reconstruction process by optimally combining the DINEOF approach with a local small-scale optimal interpolation approach.The method we will propose is however more general in the sense that we will show how to deal with a situation in which a combination of processes at different scales needs to be analyzed, possibly using different software optimized to analyze a specific process.We will then illustrate the approach in the particular situation with DINEOF analyses for larger scales and OI for smaller scales.
Section 2 will present the theory of optimally combining the analyses of two processes.Then a synthetic example in Sect. 3 will serve as a test bed to provide guidelines in selecting an analysis order among a series of possible options identified.The algorithm called DINEOFOI is then spelled out for the combination of DINEOF and OI in Sect. 4. Finally, the application of the method to real data with a validation exercise will be presented in Sect. 5.

Multi-scale optimal interpolation
We will present the method in an optimal interpolation framework and use the following notations and assumptions.Without loss of generality, data are considered anomalies with respect to a reference field and are stored in vector d.The observations are considered imperfect or as including representativity errors so that the observational error is characterized by covariance matrix R. The analysis will generally be performed on regular grids, hence in locations not necessarily coinciding with the observation points.The so-called observing operator H (e.g., Kalnay, 2002) allows the definition of how to retrieve the analysis values at the observational points from the analysis grid.For filling pixels in satellite images, the application of this matrix to the analysis containing all pixels would simply retrieve the pixels where data were originally present.
The field we try to recover is considered to be created by different independent processes with each process characterized by a specific covariance.On the analysis grid, each process therefore defines a different background covariance matrix.We note these matrices B s for a specific process s.Since we are in the presence of different independent processes, each one leading to a different covariance matrix B s , the covariance of the total field is B = s B s and includes the contribution of processes at all scales (e.g., Wackernagel, 2003).Here we start with the premise that the covariances have already been chosen as there is a wide range of literature on how to model and calibrate covariances (using parametric functions, reduced rank representation, ensemble approaches, smoothness or diffusion operators, see for example Gasparini andCohn, 1999 andWeaver andMirouze, 2013).Hence, the purpose of the present paper is not to defend or propose one particular version.
In this case, with covariances defined, the best analysis φ of data d including all scales in the sense of optimal interpolation (e.g., Cushman-Roisin and Beckers, 2011) would be given by φ = Kd with gain matrix At this stage, we observe that the optimal interpolation demands the inversion of i HB i H T + R .If this can be achieved for the chosen representation of the different B s , then the problem of the multi-scale analysis is solved since the optimal interpolation is provided by Eq. ( 1).
However, sometimes some of the individual matrices B s are not explicitly calculated (for example, in spline methods, e.g., Brasseur et al., 1996, or 3DVar-NMC implementations, e.g., Parrish andDerber, 1992;Fisher, 2003) and cannot be added to other matrices before inversion.In other cases, using a single B s can lead to very efficient inversions (for example, when reduced rank approaches are possible, e.g., Kaplan et al., 2000;Beckers et al., 2006, or localization is used, e.g., Reynolds and Smith, 1994) while when trying the inversion with the sum of B s , one cannot exploit the particular structure of the individual B s anymore.In such situations, we can therefore only suppose that we have efficient tools to calculate K s x, i.e., we are able to apply an analysis tool with a single specific background covariance B s to any data array x.Formally, the Kalman gain matrix for the sole process s is 1 Hence, the problem is to find a way to calculate K s x when we can only calculate K s x.
Here we present the solution for two processes, but by recursion more scales can be taken into consideration.If we have only two processes, we can provide the optimal analysis for each one of them as K 1 d and K 2 d and also the overall optimal interpolation K 1 d +K 2 d exploiting the following matrix identities (proof in Appendix A): Similar relationships hold for K 2 (as we can just interchange indices 1 and 2).We therefore now have a way to calculate 1 Carefully note the absence of to distinguish it from Eq. (2).
the optimal interpolation exploiting the individual analysis tools K 1 and K 2 .All formulations are mathematically equivalent but they have the problem that (I − K 2 HK 1 H) −1 or similar inversions are needed.These are not accessible or lead to an expensive inversion for the same reasons we did not want to invert a matrix involving the sum of B s .So it seems we have only displaced the problem.However, the important point to observe is that gains K s H or HK s are "smaller than 1" 2 and the matrix inversion can therefore be implemented by a series expansion: for smaller than 1, we have the convergent series (e.g., Young, 1981): When applied to a vector x, this immediately provides the algorithm to calculate y = (I − ) −1 x as follows: Algorithm 1. Approximate matrix inversion.
This will converge to the desired vector y.In our case, the calculation of y only involves successive analyses using K 1 and K 2 , which by hypothesis can be done efficiently.Then of course, if we limit the number of iterations, the four formulations (4)-( 7) might lead to different results.In fact, it is easy to show that n iterations used in Eqs.(4) and (5) will yield the same results since with 1 = K 2 HK 1 H and 2 = HK 1 HK 2 .Similarly, iterative versions of Eqs. ( 6) and (7) will also lead to the same analysis.The only difference is possible in terms of computational load.Since generally the number of observations is much smaller than the number of points on the output grid of the analysis, using Eqs.( 5) and ( 7) during iterations will work on smaller files/arrays and also could benefit from quicker analysis if the analysis tool is faster when asked to output values only at data locations.Hence, Eqs. ( 5) and ( 7) are two formulations which we will retain and which will differ if iterations on the inversion of (I − HK 1 HK 2 ) or (I − HK 2 HK 1 ) are not pursued until convergence.For K 2 , 2 A formal definition of "smaller than 1" for the matrices demands the module of their eigenvalues being all smaller than 1; see page 127 in Daley (1993) for a proof that HK, which is also sometimes called the hat matrix or influence matrix, has eigenvalues between 0 and 1.
we now write out the two equivalent relevant versions: The cost for calculating an analysis with K 1 or K 2 using n iterations is 3 + 2n analyses using K 1 and K 2 in Eqs. ( 5) or (11), whereas the use of Eqs. ( 7) or (12) demands 2 + 2n standard analyses.In the following, P1a and P2a will refer respectively to the application of Eqs. ( 5) and ( 11) with iterations, and P1b and P2b refer respectively to the use of Eqs. ( 7) or (12).
We also note that even when using iterated versions, using Eqs.( 5) and ( 12) leads to which will be exploited in our final algorithm.K 1 and K 2 allow us to calculate the best analysis for each of the processes.A natural but suboptimal estimate for each process would of course be obtained when using K 1 for process 1 and K 2 and for process 2. A slightly more elaborated idea would rather be to use K 1 (I − HK 2 ) and K 2 (I − HK 1 ) so as to get the analysis of one process only using data from which we tried to subtract the information from the other process.Therefore, we can also design several simple ways to sub-optimally estimate the overall analysis using the following gain matrices: These simple versions (S * * ) can also be compared later to the iterated versions.

Error covariance fields
The estimation of uncertainties is very valuable information that is to be added to the analysis itself.The formal derivation of it is well established for the general case: the analysis error covariance matrix P a when using any gain matrix K is given by (e.g., Cushman-Roisin and Beckers, 2011) When the gain is optimal, the last term vanishes and we get the minimal error of the optimal interpolation approach.Hence, we can also assess the errors of the different analyses we propose.
www.ocean-sci.net/10/845/2014/Ocean Sci., 10, 845-862, 2014 For example, if we are interested in the analysis of process 1 and use the exact form of K 1 , the expected error covariance on the analysis of process 1 reads Similarly, the error of the analysis of process 2 alone when using the exact form of K 2 reads If we are interested in the overall analysis including both processes, the error on the sum of the two analyses is since we used K = K 1 +K 2 ; we note that the error on the full analysis is not the sum of the errors on each process but will be lower.In all cases, since we have a way to replace K 1 and K 2 as functions of K 1 and K 2 , the error covariance fields can also be assessed in any desired location by applying our new method to a pseudo-data set containing, in fact, covariances.

Synthetic example
The objective of the synthetic test case is see which formulations are the most interesting, and we do so by using a Monte Carlo approach.In order to allow for a large number of realizations, a simplified framework is used in which only spatial interpolation will be performed instead of a full time-space analysis.This approach is not limiting since the time interpolation amounts to add a dimension to the system.As long as the same types of correlation functions are also used over time, the timescale will play exactly the same role as correlation length, and the conclusions of the spatial interpolation will also hold for the higher dimensional case (see also Nardelli, 2012).Also, the conclusion we will derive from the synthetic case will be shown a posteriori to perform well in the real application including the time dimension.
Focusing on spatial interpolation, we generate two random fields of different covariances ("true states") on a domain 100 grid points wide, sample the fields, sum them up, add noise and then analyze these pseudo-data with the different formulations.Then we can calculate the squared norm of the difference between the analysis and the true fields (each field individually and the overall field).This exercise is repeated a large number of times (100 000) to have a statistically significant estimate and also error distribution.To simplify the interpretation, the error estimates are scaled by the expected analysis errors Eqs. ( 20)-( 22) so that if the analysis is indeed optimal, the average scaled error should be 1.
We can then look at the quality of the analysis for each individual process and the overall analysis for a series of different values of the parameters describing the covariance structures of each process.These parameters will characterize the correlation length of the processes and the signal-to-noise ratio (e.g., Troupin et al., 2010): the signal-to-noise ratio for process 1 is defined as λ 1 = trace (B 1 ) /trace (R) and similar definitions hold for the signal-to-noise ratio for process 2 and the total signal-to-noise ratio.
We made a series of tests with different length scales and signal-to-noise ratios for both processes the results of which are provided in Appendix C for completeness.Interestingly, very clear and robust guidelines emerged.

Resulting guidelines
From what we saw, the best we can do is to label the process with the highest signal-to-noise ratio as 1 and the other process as 2 and apply P1a + P2b if we want to achieve the best analysis.
If both processes have a similar signal-to-noise ratio, we should label the larger-scale process as process 1 so that the same formulas are still the best.
For simpler analyses Eq. ( 15) defining S 01 remains a good option with this choice of numbering.
For individual processes, P1a should be used for the iterated versions, whereas K 1 is the best choice for a simple analysis for process 1; for process 2 version P2b and K 2 (I − HK 1 ) are indicated respectively for the iterated approach or a simple approach.
For iterative methods, if scales are well separated or at least one of the processes has a small signal-to-noise ratio, only very few iterations are needed.In this case, the simple formulation can also be competitive.
On the other hand, in difficult situations with overlapping scales and strong signals, the iterated versions can always be made convergent by using more iterations and the optimal interpolation will result.Alternatively if scales are very similar, a single analysis with average correlation length could be used.

DINEOF + OI
We now come to the application of our ideas to the better reconstruction of satellite images.DINEOF has proven to be efficient, particularly when looking at larger scales since the interpolation exploits dominant large-scale EOF structures to reconstruct missing data under clouds.Those EOF structures generally capture a very high percentage of the data variance and in this sense we can assume that the signal-to-noise ratio is high.
If we try to add a local covariance structure with an optimal interpolation approach, the signal-to-noise ratio for this additional process we want to extract is probably much lower.Hence, we are in the situation covered in synthetic case (cases 3 and 4 of Appendix C) and the iterative version of P1a + P2b is the natural choice, where K 1 stands for the DINEOF-based analysis and K 2 for the local OI application.
The analyzed field φ can be easily obtained from P1a + P2b, called hereafter DINEOFOI, programmed as Algorithm 2. Combination of two analysis tools for an optimal interpolation.
φ ← K 1 d w 1 ← Hφ (Use only analysis at data points without clouds) Calibrate or check OI parameters at this point using w 1 w 2 ← w 1 (Start of iterative matrix inversion) Loop n times w 2 ← HK 2 w 2 (Apply tool 2 and retrieve solution at data points) w 2 ← HK 1 w 2 (Apply tool 1 and retrieve solution at data points) If scale 1 solution is requested, subtract saved scale 2 solution from φ and save it This shows that the simple application of existing analysis tools to data arrays and a few working arrays/files (w 1 , w 2 , ω 1 ) will solve our problem and provide the analysis φ.This demands 2 + n applications of K 1 and 1 + n applications of K 2 .
For the calibration of OI parameters it is probably safe to work on residuals after step 3 (available in intermediate results w 1 ) for fitting or verification of the correlation length and signal-to-noise ratio since we showed in the synthetic case that if scale separation is good, K 2 (I − HK 1 )d is a good proxy of the analysis for process 2.
There remains to formulate how K 1 and K 2 are defined.If during step 1 of the algorithm φ could be interpreted as the DINEOF result (and hence K 1 the DINEOF tool applied to the data), in the subsequent part of the algorithm we cannot just replace K 1 by "apply DINEOF" since the EOFs calculated are data dependent.Hence, when working with residuals in the later steps, we would not use the same covariances as we would generate new covariances from the residuals instead of using the covariances from the large-scale processes.So instead of choosing "apply DINEOF", we must find a way to exploit the large-scale covariances recovered by the original DINEOF application.For this purpose we can use the EOFs calculated by DINEOF on the raw data to specify covariance matrices as in Beckers et al. (2006).For such an optimal interpolation using EOFs from DINEOF, we recall that DINEOF provides the singular value decomposition (SVD) of the data3 stored in a m×n matrix X with m pixels for each cloud-free image and n images as The N spatial EOFs (stored in the m × N matrix U) and temporal EOFs (stored in the n × N matrix V) can be used together with the singular values (stored in the diagonal matrix ) to estimate the spatial covariances These spatial covariances then allow the spatial interpolating/analysis of each n image: for a spatial analysis of image j , the gain matrix reads indeed Note that for each image j , another observation operator H is used as the cloud coverage changes.In order not to overload notations we do not add an index but remember that H is image dependent.The computational efficiency of this method stems from the fact that the application of the Woodbury matrix identity (e.g., Golub and Van Loan, 2012) translates this into a least square fitting of N modes (e.g., Beckers et al., 2006).Assuming independent and identical observational errors of variance 2 , this leads to with L = U / √ n and the corresponding version for the data present in each image L p = HL (note that here H still changes for each image).The computational efficiency is now understood as for each image the matrix to invert is of size N × N when using Eq. ( 26) instead of p × p when using Eq. ( 25), with the number N of retained EOFs much smaller then the number p of pixels in an image.So from the EOFs found by DINEOF, we can perform spatial interpolations at low cost yet retaining the original spatial covariances.
By symmetry we can also choose to perform a temporal interpolation of each of the m pixels by using for pixel i the time interpolation from covariance and use the computationally efficiently calculated gain matrix with T = V / √ m and T p = HT (here H changes for each of the pixels and selects the images in which the pixel is not clouded).
Then we can further improve the covariance model exploiting both temporal and spatial correlations as follows: instead of choosing one or the other method (spatial or temporal interpolation alone) we can again apply the basic idea of the present paper allowing the combination of two analysis www.ocean-sci.net/10/845/2014/Ocean Sci., 10, 845-862, 2014 tools.We can indeed use as a covariance the combination of the spatial and temporal covariances and for a pixel i of image j , the covariance with a pixel i of image j is modeled as jj δ ii with the standard Kronecker4 symbol δ ii .In other words, as we are in the presence of the combination of two covariances, without actually creating the sum and full gain matrix, we can use again the iterated versions P1a + P2b where process 1 is now provided by the spatial interpolation tool and process 2 by the temporal interpolation tool.This "inner" iterative combination provides then our EOF-based analysis tool K 1 for the "outer" combination method with a local OI.
So if it is clear that in the later steps of Algorithm 2 we use K 1 as just described (the combination of spatial and temporal EOF-based covariances), the first step could still be taken as the DINEOF analysis as it could be regarded as some kind of best first guess.This option will be looked at later.
For K 2 , a standard optimal interpolation tool is used.In practice, the OI application for satellite images repeated at the same locations can be strongly optimized since it is easy to collect the data points which are within reach of the covariance function at any given point in which the analysis is demanded.As the correlation length will be small, the number of data points involved will also be small and the matrix inversion very quick for each analysis point.The typical parametric correlation function c used in optimal interpolation is a Gaussian where δx, δy and δt are the difference in space and time coordinates between the two points for which we want to calculate the covariance.L x and L y are then the correlation length scale in x and y direction and T the correlation timescale.The correlation function has to be multiplied by variance σ 2 2 to provide B 2 .For the implementation of this optimal interpolation we used the open-source code optiminterp available at http://octave.sourceforge.net/optiminterp/overview.html5 .Its use was computationally optimized by exploiting the fact that for each point in which the analysis is demanded the gridded nature of the data points allows a direct selection of possibly involved data points and a rejection of all others.So it means one can feed the analysis only with data in a box of size 4L x × 4L y × 4T centered around the analysis point, which greatly reduces sorting and selecting time and also allows for a parallel execution.

Test case
For the real case application, we use an interesting sea surface temperature (SST) hourly data set derived from Spinning Enhanced Visible and Infrared Imager (SEVIRI) onboard the Meteosat Second Generation (MSG) geostationary satellite and produced operationally by the European Organisation for the Exploitation of Meteorological Satellites (EUMETSAT) Ocean and Sea Ice Satellite Application Facility (OSI-SAF) at Meteo-France/Centre de Meteorologie Spatial (CMS) in Lannion (EUMETSAT, 2011;Le Borgne et al., 2011).Because of its geostationary orbit, high-frequency sampling is available and analyses resolving the daily cycle appeared (e.g., Marullo et al., 2010Marullo et al., , 2014)).Some high-frequency content is expected to be filtered by standard DINEOF applications and therefore the data should provide a good test bed for our new method.Furthermore, capturing the diurnal cycle in using satellitederived sea surface temperature is considered an important challenge (Stuart-Menteth et al., 2003) and several studies have been performed recently using hourly data (Karagali et al., 2012;Le Borgne et al., 2012;Eastwood et al., 2011).
Excellent results when using SEVIRI data have been obtained (Marullo et al., 2014): by combining a dynamical model resolving the daily cycle with the SEVIRI data, their reconstruction of hourly data covering the whole Mediterranean Sea during the 3-month summer period of 2011 showed rms (root mean square) errors compared to in situ data of 0.64 K with offshore moorings and 0.47 K with drifter data, with a well-resolved daily cycle.A synthetic cloud cross-validation exercise also provided an rms difference of only 0.16 K between the artificially clouded SEVIRI points and their reconstruction.In their case, the artificial clouds were constructed by moving a 200 km wide meridional band of "clouds" westward across the domain over around 24 h so that no persistent artificial clouds were included.
For our study focusing on the methodology rather the reconstruction itself, we selected a smaller data set, covering the period of 1-16 August 2013.This data set has some clear images with daily cycles on which artificial clouds can be added for validation purposes.The spatial resolution of the data is 0.05 • , available at an hourly frequency, leading to n = 16×24 = 384 images.In all cases, we used all data with quality flags 4 or 5 (acceptable and best quality).In order to assess the robustness of the method and the conclusions of this experiment, we look at four different cases.
Two different spatial coverages were considered, one looking only at the western Mediterranean Sea, the other to the whole Mediterranean Sea.This approach permits checking the robustness of the statistical results and the effect of different EOF structures, and also shows the effect of larger data variability.For the western Mediterranean Sea, we retrieved images of 327 × 217 = 70 959 pixels from which 29 380 fall on land, so that the maximum number of pixels in each image useful for analysis is m = 41 579.Without additional artificial clouds, this data set presents an average cloud coverage of 29.70 %.For the whole Mediterranean Sea, images have 850 × 320 pixels, from which 120 343 fall onto the sea so that m = 151 566 (see Fig. 1).The average cloud coverage is 23.53 % in this case.We also note that for the setup over the whole Mediterranean Sea, data from the Black Sea and the Atlantic Ocean are included, making the test more challenging as the EOF structures will need to cope with different, mostly unconnected, variabilities.
For cross-validation purposes, additional clouds will be added to the original data.The so-clouded cross-validation data are never used in the calculations (neither in the DI-NEOF decomposition nor the different optimal interpolations) but allow for the computation of metrics characterizing the quality of the analysis under clouds.The rms error δ D of the DINEOF reconstruction under these additional clouds can be used as a reference error which we want to improve.We can then define a skill-score S as which measures the relative increase in precision (a 0 skillscore value indicates no better results than DINEOF whereas a value of 1 means a perfect reconstruction).A second standard metric for the validation is the correlation coefficient between the analysis and the cross-validation points noted r.
Two different kinds of artificial cloud sets allows one to check the effect of more or less persistent cloud coverage on the reconstruction quality and the improvements brought by our new method.In one situation, clouds from the last 2 weeks of July 2013 are superimposed onto the already present clouds and the pixels masked in this way taken aside as cross-validation data.In the other situation, we generate artificial clouds as in Marullo et al. (2014): the slab of clouds moves from east to west across the domain in about 1 day and then starts again from the east.The fraction of overlap between successive images was taken similar to the one in Marullo et al. (2014).
From these two different spatial coverages and two different cloud coverages we create our four test cases:

Setup 1: western Mediterranean Sea, added July clouds
For cross-validation purposes, clouds from the last 2 weeks of July were superimposed, resulting in total average cloud coverage which increases from 29.70 to 42.06 % The data set is then analyzed with DINEOF, providing N = 38 EOFs, the singular values and the DINEOF reconstruction explaining 91 % of the variance.As expected, DINEOF captures a large part of the variability.A first diagnostic can be performed by calculating the root mean square difference between this reconstruction and the artificially clouded pixels (the crossvalidation points, here we have 1 973 438 of them), and we obtain an associated standard deviation δ D = 0.496 K close to the noise level in the data, as the SEVIRI SST standard deviation is around 0.5 K (Brisson et al., 2002).For DINEOF we obtain r = 0.9371, which is therefore the baseline to improve upon in this case.
For this case and the following cases, the metrics of the standard DINEOF application just mentioned are summarized in Table 1 and for the following cases we only describe the cross-validation approach.

Setup 2: western Mediterranean Sea, added banded clouds
In the first setup, analyses under persistent large-scale clouds can of course not be improved by the local optimal interpolation.To check if we can reconstruct data under fast-moving clouds, we take a band of westward-moving "clouds".

Setup 3: whole Mediterranean Sea, added July clouds
To check how the reconstructions behave when a larger range of observations are present, we look at the whole Mediterranean Sea.The full Mediterranean Sea case exhibits generally less clouds (23.53 %); with the July clouds added to the original data, the cloud coverage increases to 35.29 %.
www.ocean-sci.net/10/845/2014/Ocean Sci., 10, 845-862, 2014 Table 1.Skill scores and correlation coefficients for the four SEVIRI setups and the different methods.DINEOF stands for the direct use of standard DINEOF results.DINEOF + K 2 uses the simple approach of adding a local optimal interpolation of residuals.DINEOF + 10 it stands for the method in which we start from DINEOF results and then apply the iterative version using K 1 based on the EOFs.Version K 1 uses only the EOF-based covariances, S 01 stands for the simple approach when we add the local optimal interpolation of residuals to the previous solution and finally the last two lines correspond with the full method with no iterations or 10 iterations (for inner iterations within K 1 combining space-and time-EOF-based interpolations, iterations are maintained in all cases).
Setup Finally, we complete the comparisons by using the whole Mediterranean Sea, but with a fast-moving westward band of "clouds".
To proceed, we now need to specify the observational error variance and the variance σ 2 2 for the local optimal interpolation in each case.As we will show later, the conclusions remain robust in all cases and we only detailed results of the most challenging Setup 3 in Appendix D. The parameter optimization of the other cases was performed in a similar way with similar results.
From the calibration we note in Fig. 2 that the time autocorrelation spectrum exhibits an anticorrelation at 6 h and maximum autocorrelation at 12 h.This reflects, in fact, the asymmetry of the diurnal cycle.When analyzing the raw data instead of residuals, the spectrum and autocorrelation (not reproduced here) indeed shows an anticorrelation at 12 h and correlation at 24 h.Thus, the residuals will allow for following the higher-frequency content.
The observational noise variance was fixed as 0.25 K 2 .A sensitivity analysis was performed, changing this value into the range of 0.1 to 10 K 2 .In all cases, the conclusion in terms of ranking of the different methods remained the same; only the relative part of the solution picked up by the EOF-based covariances or local optimal interpolation was affected.As the value of 0.25 K 2 is the typical value associated with SEVIRI data (Brisson et al., 2002), we kept this value even if better skill scores were obtained for other values used in sensitivity analyses.
With all parameters defined, we proceed to the optimal interpolation and compare the different techniques to the standard DINEOF solution for the four different setups.As seen from Table 1, robust conclusions can be drawn: -Results for cases working on a larger domain had better correlations then the cases working on the western Mediterranean Sea alone because of the larger data range, but ranking between the methods remain the same, independently of the domain covered.
Conclusions are therefore robust with respect to the domain size.
-Results with fast-moving slabs of cloud are always better than with more realistic clouds, but again the ranking of the methods is not changed between the two situations and conclusions robust with respect to the type of cross-validation clouds added.
-In all cases, our new iterated method provides the best skill scores and correlation coefficient.Using 0 iterations instead of 10 iterations only slightly degrades results and keeps the non-iterated method ranked second in terms of skill scores and correlation coefficient.
-The simple method of using DINEOF followed by a local optimal interpolation of residuals (DINEOF + K 2 ) also provides good skill scores.In cases with fastmoving artificial clouds, the skill score is close to the one of our new version, but adding real clouds using the new method clearly increases the skill further.
-Starting from DINEOF and then applying iterations with K 1 based on the DINEOF EOFs (called DINEOF + it) is not a good idea since in all cases negative skills and lower correlations are observed.Hence, in this case the inconsistency of the first step in our algorithm with the rest of the steps deteriorates results.
-Finally, the simple method S 01 also yields better results than DINEOF + K 2 .The higher correlation found with the banded fast-moving clouds is also seen in the correlation plots of Figs. 3 and 4. When July clouds are used, some persistent large-scale clouds do not permit improving DINEOF reconstructions in these regions.This leads to larger dispersion.When fastmoving clouds are used, the local optimal interpolation can infer missing information from the surrounding pixels and correlations are much higher.We also notice that even in the July cloud case, the new method filters the extreme values less than the original DINEOF reconstruction (the Sshaped tendency of DINEOF reconstructions dampening the extreme variations is reduced with the new method).
Since the computational load of the new method is a fraction of the computational cost of the DINEOF decomposition (typically 10 % for the iterated version) it is the natural choice to improve DINEOF results when good skill scores and correlations are sought.If one does not want to program  K 1 and iterations6 , the simple approach of using DINEOF followed by the local optimal interpolation of residuals is a simple alternative.
Skill scores and correlation coefficients are not the only criteria to select a method, and we can further check if the new method also leads to better feature reconstruction.
Figure 5 shows the analysis, residuals and the y autocorrelation for the standard DINEOF approach and our new iterated solution DINEOFOI.While DINEOF residuals exhibit patterns identified in the autocorrelation function, the residuals of the new method are almost pure spatially uncorrelated noise.For the other methods, not shown here, K 1 and DI-NEOF + it both contain a similarly autocorrelated signal as DINEOF.All other methods are able to retrieve this structure and leave mostly spatially uncorrelated residuals.We also see that the amplitude of the daily cycle is now closely following the data (Fig. 1).
Here the comparison of the reconstructions with the crossvalidation data not used also reveals that some features are now better visible in the reconstruction with the new method, in particular the form of the cold patch in the center of the domain.
Instead of looking at the full analysis, we can also focus on individual processes 1 and 2. For example, the time evolution at latitude 34.375 • N in the eastern Mediterranean Sea (Fig. 7) shows clearly how scale 1 solution is dominated by the daily cycle (we even see the earlier heating in the eastern part during the day).Scale 2, on the other hand, has a 12 h period component whose structure could lead to interesting interpretations.

Conclusions
We presented a general framework for multi-scale analysis using optimal combinations of analysis tools focusing on different scales and processes.It provides this multi-scale analysis in an efficient way by combining the analyses obtained with different simpler tools based on the assumption that only a single process is present.The method also allows one to focus on individual scales by isolating the analysis of a single process in the presence of another one.
With a synthetic test case, we were able to provide guidelines for which formulations to choose among a series of possible algorithms.In particular, we showed that by naming process 1 as the process which has the highest signal-to-noise ratio (or largest correlation length when the other process has a similar signal-to-noise ratio), process 1 can then be efficiently analyzed by a well-identified iterative method7 .The other process then is optimally reconstructed with another iterative formula8 and the combination of the two provides an analysis which is close to the theoretical optimal interpolation even with a reduced number of iterations.Then we showed how we can exploit large-scale covariances together with small-scale covariances in a feasible way without the need to explicitly invert huge matrices that would arise from a direct application of the OI formula.This was possible through the particular choice of the covariance model (EOF-based large-scale covariance and localized parametric covariance) and the new iterative method we proposed.
This was then implemented in a real application where it was shown that with the new approach, we can indeed retrieve fine-scale features previously filtered out.The iterative version with a combination of DINEOF-based EOF covariances and local optimal interpolation called DINEO-FOI behaved best, but an alternative cheaper version is using the DINEOF analysis itself augmented by the analysis of residuals with a local optimal interpolation.But we have shown that the latter, quite natural approach was suboptimal in all situations.Other simple methods have also been identified: a time-space interpolation using EOFs from DINEOF alone9 possibly augmented by an additional local optimal analysis of residuals10 .Both yield good results, but they also remain suboptimal.Hence, DINEOFOI provided the optimal solution at a still reasonable cost.
Further fine-tuning of the real application could be performed but is out of the scope of the present paper, focusing on how to combine two or more analysis tools, each one exploiting different covariance specifications.To provide the best possible SST products, the following improvements could be included: we can adopt more complicated expressions of the observational error covariances, exploiting the preliminary outliers detection offered by DINEOF and the quality-flag value of the original data.In this case, suspect pixels (near clouds or with statistically too-large residuals) can be flagged and see their observational error increased.This should lead to further improvement in noise filtering.The temporal covariance functions used in the local optimal interpolation could be modulated by a cosine function to take into account the 12 h cycle identified in the residuals and provide better estimates for clouds present over several hours.Table C1.Average error for different analysis formulations of process 1. Column 5 of the upper part shows the same result as column 3 since it can be shown that without iterations, the formulations are equivalent.So column 5 of the upper part of the table essentially gives confidence that the implementation is correct.The average relative error variance is based on 100 000 independent realizations.therefore clearly P1a for the larger scales, equation P2b for the shorter scales and their sum for the overall analysis.

Case
For case 2, we use the same parameters but examine the behavior when the scales are less well-separated and use a correlation length of 16 grid points instead of 4.5 for process 2. Compared to case 1, the conclusions remain mostly unchanged, but we observe that more iterations are needed to reach the same quality in the iterated version.
For case 3 and the following cases, we only use observations in parts of the domain.In one region, sampling is done at grid resolution and in other regions with larger parts void of data (up to 20 grid points).This mimics the situation we encounter with satellite images.Otherwise, the case is We see that for process 1, it is better to use the full data set, whereas for process 2 it is better to work with residuals, i.e., data from which we subtract the analysis of the process 1.For reference, the distribution of the optimal solution is also shown, proving that not only is the average error of the new method combining P1a and P2b optimal, but also that the error distribution falls along the optimal one.identical to case 111 .Conclusions for case 3 are similar to case 1, and we see that even with few iterations we capture the maximum information from the data available.
In case 4 we take the same parameters as in case 3 (partial observation, very different scales), but assume that process 1 has a much higher signal-to-noise ratio: we use a value of 20 instead of 1.In this case, P1a again provides the best analysis for process 1 and P2b, for process 2 alone.For the simple approaches for the total field (Table C3), now the selection of the field used to create residuals is important, and as one might have guessed, one has first to analyze the large-scale process with the high signal-to-noise ratio and then add the small-scale analysis of the residuals (meaning S 01 is used for the global analysis).For the iterated version, it is now clearly the combination P1a and P2b which outperforms the others.
Since this case is also similar to our original question, we can now also have a look at the error distribution of the 100 000 realizations (Figs.C1 and C2).They confirm that when the average error of the formulations is close to the error of the truly optimal solution, the histogram of errors also almost coincide, providing evidence that we are indeed very close to the optimal interpolation.
To complete the analysis, some additional cases have been examined: in case 5 we use the same parameters as in case 4 but use the short correlation length for process 1 and the larger one for process 2. Despite this inversion, the same conclusions as in case 4 remain.This means that it is the process with the highest signal-to-noise ratio which should be labeled as process 1 so that P1a and P2b remain optimal.
For case 6 we use the same situation as in case 3, but now both processes have the same high signal-to-noise ratio of 20.Similar conclusions still hold, but the simpler approaches now degrade.
Finally, case 7 uses the same situation as case 6 (two high signal-to-noise ratios), but in addition the correlation scales are now closer to each other (30 grid points for process 1 and 16 grid points for process 2).This is the most difficult situation and requires more iterations to converge.In this case, one could anyway question whether a scale separation approach is meaningful or if a single correlation length scale should be used.

Figure 1 .
Figure 1.Mean spatial structure of standard DINEOF analysis (left panel) and spatially averaged time evolution (right panel) of the analyzed field (solid blue line), the original data (dashed green line) and the DINEOF analysis at pixels with data present (red line).The lower temperatures in the DINEOF reconstruction are simply due to the fact that mostly cold regions were covered by clouds and the data average only used available (warmer) pixels.More importantly, we see however that the DINEOF reconstruction has a lower-amplitude cycle than the data.With the new methods DINEOFOI explained later, we recover the amplitudes found in the data (dotted line coinciding with the dashed line).

Figure 2 .
Figure 2. Spectrum analysis leading to autocorrelation functions, for longitude, latitude and time (from left to right).Autocorrelation is given as a function of distance expressed in number of grid points or as a function of time-delay expressed in number of time steps (hours).The continuous line corresponds with the fitted Gaussian correlation function.

Figure 3 .
Figure 3. Correlation at cross-validation points using DINEOF alone and the new method DINEOFOI in Setup 1 (top) and Setup 2 (below).

Figure 4 .
Figure 4. Correlation at cross-validation points using DINEOF alone and the new method DINEOFOI in Setup 3 and Setup 4.

Figure 6 .
Figure 6.Data used (upper left panel), data including cross-validation data (upper right), DINEOF reconstruction (lower left) and new approach DINEOFOI with 10 iterations (lower right).

Figure 7 .
Figure 7. Hovmöller diagram in the eastern Mediterranean at latitude 34.375 • N. Temperature as a function of longitude (in degrees) and time (in days) for process 1 (left panel) and process 2 (right panel).

Figure C1 .
Figure C1.Histogram of scaled errors for reconstruction of process 1 (left)and process 2 (right).We see that for process 1, it is better to use the full data set, whereas for process 2 it is better to work with residuals, i.e., data from which we subtract the analysis of the process 1.

Figure C2 .
Figure C2.Histogram of scaled errors for simple reconstructions of the total field (left) and with 0 iterations in the new formulations (right).For reference, the distribution of the optimal solution is also shown, proving that not only is the average error of the new method combining P1a and P2b optimal, but also that the error distribution falls along the optimal one.

Table C2 .
Average error for different analysis formulations of process 2. As in TableC1, column 5 of the upper part gives the same result as column 3, as it should.The average relative error variance is based on 100 000 independent realizations.

Table C4
), we see a similar pattern; adding up formulations of the same type (P1a + P2a or P1b + P2b) does not perform as well as adding up formulations of different types (P1a + P2b or P1b + P2a), but convergence is observed in all cases.The natural choice for this case is

Table C3 .
Average error for different simplified analysis formulations for the overall analysis.The average relative error variance is based on 100 000 independent realizations.

Table C4 .
Average error for iterative versions for the overall analysis: 0, 2 and 20 iterations.The average relative error variance is based on 100 000 independent realizations.