A deep learning approach for spatial error correction of numerical seasonal weather prediction simulation data

ABSTRACT Numerical Weather Prediction (NWP) simulations produce meteorological data in various spatial and temporal scales, depending on the application requirements. In the current study, a deep learning approach, based on convolutional autoencoders, is explored to effectively correct the error of the NWP simulation. An undercomplete convolutional autoencoder (CAE) is applied as part of the dynamic error correction of NWP data. This work is an attempt to improve the seasonal forecast (3–6 months ahead) data accuracy for Greece using a global reanalysis dataset (that incorporates observations, satellite imaging, etc.) of higher spatial resolution. More specifically, the publically available Meteo France Seasonal (Copernicus platform) and the National Centers for Environmental Prediction (NCEP) Final Analysis (FNL) (NOAA) datasets are utilized. In addition, external information is used as evidence transfer, concerning the time conditions (month, day, and season) and the simulation characteristics (initialization of simulation). It is found that convolutional autoencoders help to improve the resolution of the seasonal data and successfully reduce the error of the NWP data for 6-months ahead forecasting. Interestingly, the month evidence yields the best agreement indicating a seasonal dependence of the performance.


Introduction
Meteorology and climate science use computer models to represent a realistic depiction of the phenomena that take place in the atmosphere for the case of Numerical Weather Prediction (NWP) and climate prediction based on mathematical equations. Hence, for a skilful forecast, they require the initial conditions to be known as accurately as possible. The models can be categorized either based on the size area under study as (a) global and (b) regional forecast models or according to the timespan of prediction as (i) nowcasting (hours), (ii) forecast (days), (iii) subseasonal (weeks), (iv) seasonal (months) and (v) climate (years). As long as the state of the atmosphere is known at an initial time via observations and given the fact that the atmosphere is a fluid, the models employ discretised equations of fluid dynamics and thermodynamics to estimate the state of the fluid at some time in the future. A set of continuo us equations is used to estimate the air density, pressure, water vapour mixing ratio, the potential temperature scalar field and the air velocity (wind) vector field of the atmosphere (Kalnay, 2003). The nonlinear partial differential equations that describe the system are impossible to solve exactly through analytical methods (Strikwerda, 1990). Thus, numerical methods obtain approximate solutions. Different models use different numerical methods; some global models and almost all regional models use finite difference methods for all three spatial dimensions, while other global models and a few regional models use spectral methods for the horizontal dimensions and finitedifference methods in the vertical (Terrell, 2019).
Single forecasts may generate results that diverge from reality because of the chaotic growth of forecast errors, linked to inevitable uncertainties in the knowledge of the initial state of the atmosphere, and because of the necessary numerical model approximations. Consequently, ensembles have been used to provide forecasts of higher confidence levels during approximately the last two decades. They have been designed to try to simulate all relevant sources of forecast error. Despite the rapid progress in NWP modelling, the raw ensemble forecasts exhibit systematic errors in both magnitude and spread (Buizza, 2018). Systematic errors in calibration and accuracy are routinely corrected before issuing a weather forecast using statistical post-processing (Thorarinsdottir et al., 2018). A fundamental challenge is to preserve physical consistency across space, time and variables (Heinrich et al., 2021). On forecasting time scales beyond 2 weeks, the error that arises from the growth of the initial uncertainty becomes large (Palmer, 2019). Sources of predictability at long time scales spanning a wide range from months to decades are usually associated with the existence of the slowly evolving components of the earth system (e.g. the El Nino Southern Oscillation (ENSO) -its two phases last several months each and typically occur every few years with varying intensity per period, the North Atlantic Oscillation (NAO) -year to year variability, the Madden Julian Oscillation (MJO) -an intraseasonal (30-to 90-days) variability, etc.) (Hoskins, 2013). Thus, in order to obtain an accurate prediction, it is important for the models to be able to reproduce these slowly evolving components. Thus, it is important to interpret the relationship between the atmospheric modes of low-frequency variability and the prediction variables which are required by the users (Smith et al., 2020). Apart from the timespan prediction (forecast lead time), the spatial resolution correlated with the topography of the region under study can also affect the prediction accuracy. Reliable NWP and climate prediction in domains of complex topography require high spatial resolution to better resolve the bottom boundary forcing and important valley-scale processes (Bonekamp et al., 2018).
In the last few years, owing to the increased availability of large datasets and computational power, meteorology and climate science have begun to benefit from substantial advancement in the development of Artificial Intelligence (AI), in order to fully exploit spatial and temporal structures in the data and overcome shortcomings in dynamical models (Cohen et al., 2019). In this context, Reichstein et al. (2019) described the opportunities opened by deep learning for earth system science problems. Deep learning is a subset of Machine Learning (ML) that is based on multi-layered neural networks (Lecun et al., 2015). Deep learning is able to exploit vast amounts of data effectively for climate forecasting, ultimately extracting hidden spatio-temporal features, by predicting a future state starting from a measurable initial condition, exploiting temporal-dynamical properties as well as spatial structures.
There are attempts that try to mimic the NWP simulations as a way to produce time series of weather data more efficiently, but with limited success. One of the most effective approaches is the work of Ouala et al. (2018) that investigates the relevance of recently introduced bilinear residual neural network representations, which mimic numerical integration schemes such as Runge-Kutta, for the forecasting and assimilation of geophysical fields from satellite-derived remote sensing data. It demonstrates that the proposed patch-level neural-network-based representations outperform other data-driven models, including analog schemes, both in terms of forecasting and missing data interpolation The most common architectures used for time-series forecasting in deep learning literature are recurrent and convolutional neural networks. These networks use parameter sharing by repeating a set of fixed architectures with fixed parameters over time or space (Dabrowski et al., 2020).
In the work of Hewamalage et al. (2021), an extensive empirical study was performed alongside a review of an open-source software framework of existing recurrent neural networks (RNN) architectures for forecasting. The latter led to a series of guidelines and best practices for their use. It was concluded that RNNs can model seasonality directly if the series in the dataset possess homogeneous seasonal patterns. In addition, the Recurrent Neural Filter (RNF) has been introduced as a recurrent autoencoder architecture that learns distinct representations for each Bayesian filtering step, captured by a series of encoders and decoders (Lim et al., 2020). On the other hand, Salinas et al. (2020) proposed DeepAR, a methodology for producing accurate probabilistic forecasts, based on training an auto-regressive recurrent network model on a large number of related time series. In addition, more complex approaches are available, such as DeepTrends from Xu et al. (2020), neural network for multivariate time-series trend prediction based on a tensorized long short-term memory (LSTM) with adaptive shared memory (TLASM), or the work of Zhu et al. (2020) that proposed a Generative Adversarial Network (GAN) model. The latter study argued that a good generative model for time-series data should preserve temporal dynamics, so that new sequences emulate the original temporal relationships between variables. However, current methods that apply GANs to sequential settings insufficiently handle the temporal correlations distinctive to time-series data. At the same time, supervised models for sequence prediction -which allow finer control over network dynamics -are inherently deterministic. Hence, a framework has been proposed for realistic time-series data generation by combining the flexibility of the unsupervised model with the control offered by supervised training.
Lately, Transformer Deep learning architecture has been used in time-series datasets. Lim et al. (2021) introduced the Temporal Fusion Transformer (TFT), an attention-based architecture, which combines high-performance multi-horizon forecasting with interpretable information about temporal dynamics. TFT uses recurrent layers for local processing and interpretable self-attention layers for learning long-term dependencies. A combination of recurrent and convolutional networks can be found in the paper of Shi et al. (2015) who propose a convolutional LSTM (ConvLSTM) and use it to build an endto-end trainable model for the precipitation nowcasting problem.
Recent works try to predict the spatial distribution of a weather variable instead of forecasting the next step of the time series. In the work of Ham et al. (2019), a statistical forecast model is introduced employing a deep-learning approach to produce skilful ENSO forecasts for lead times of up to one and a half years. The study uses transfer learning to train a convolutional neural network (CNN) first on historical simulations and subsequently on reanalysis data to surpass the limited amount of available observational data. The CNN model is found to be better at predicting the detailed zonal distribution of sea surface temperatures, overcoming a weakness of dynamical forecast models and, thus, proved to be a powerful tool for both the prediction of ENSO events and for the analysis of their associated complex mechanisms. In addition, Scher (2018) applied a convolutional autoencoder (CAE) neural network to demonstrate the complete emulation of the dynamics of a simple global circulation model (GCM) of low spatial and temporal resolution for successfully predicting the model several time steps ahead. The latter also pointed out the need to develop neural networks for climate studies that correctly represent external forcing as in more complex and realistic GCMs. Similar conclusions are reached in the review study of Schultz et al. (2021). On the other hand, Weyn et al. (2019) used CNN models to create a weather forecast model to predict one or two fundamental meteorological fields at specific height (500 hPa), which outperform persistence, climatology, and the dynamics based barotropic vorticity model, but do not beat an operational full physics weather prediction model. The latter points out the attractiveness of CNNs to be used in weather variables data-driven applications.
The current study utilizes the knowledge gained from previous works and takes a different approach aiming to effectively correct the error of global NWP simulation results based on a deep learning approach of convolutional autoencoders. More specifically, this work is an attempt to improve the accuracy of global seasonal forecast (3-6 months ahead) data with a more reliable reanalysis dataset of finer spatial resolution that incorporates observations. The method is applied to the area of Greece, which presents a particularly complex topography. The outcome can be regarded as a regional forecasting dataset or it can be used as initial and boundary conditions in NWP simulations, thus, enabling more accurate long-term prediction.

Material and methods
In the current section, the datasets and the methodology used are described in detail.

Dataset description
In meteorological or climate sciences, there are a series of platforms that provide data to the scientific community, for example, Copernicus (COPERNICUS, 2022), GEOSS (GEOSS, 2022), EMODnet (European Marine Observation and Data Network, 2022), ESA (European Space Agency, 2022), RDA (Research Data Archive, 2022) etc. The datasets consist of model data, fusion data (models and observations), satellite data, station observational data, statistical analysis of raw data, etc., in various formats and timespans. The two major and most popular platforms are the European data storage platform of Copernicus and its equivalent USA system, the Research Data Archive (RDA), which is managed by the Data Engineering and Curation Section (DECS) of the Computational and Information Systems Laboratory (CISL) at the National Centre for Atmospheric Research (NCAR). Both platforms contain a large and diverse collection of meteorological and oceanographic observations, operational and reanalysis model outputs, and remote sensing datasets to support atmospheric and geosciences research.
The available datasets are categorized depending on (a) their applicability of the time ahead simulation and (b) whether they assimilate observations. Accordingly, they fall within the following categories: • Forecast: They consist of global gridded meteorological data suitable for prognosis of the state of the atmosphere. • Reanalysis: They consist of global gridded past short-range weather forecasts combined with observations through data assimilation.
Different terms are used to describe the data that refer to different forecasting time periods. So, at a specific location, weather reflects the short-term atmospheric conditions (minutes to hours), while climate is the average daily weather for an extended period of time, usually of 30 years or more. Consequently, weather forecasting is suitable for the short-term (2-5 days), seasonal forecasting for the medium-term weather forecast (3-6 months) and climate prediction for the long term (years ahead). These two types of categories can be combined to characterize a weather dataset in terms of applicability and quality.
In the current study, two platforms and two different kinds of data were used in order to build the dataset. Monthly initialized model data were obtained from the Copernicus platform covering the period from January 2017 to June 2020 and predicting forward 6 months. The 6-months prediction timespan categorizes the data in seasonal forecasting. The source data agency was METEO FRANCE (French National Meteorological Service). More specifically, the spatial resolution of the gridded global dataset was 1° × 1° (~111 km × 111 km), while the weather variable obtained was temperature at 2 m height. The NWP information that was used as input and attempted to be improved was the METEO FRANCE Seasonal (MFS) dataset.
As a target dataset, the National Centers for Environmental Prediction (NCEP) Final Analysis (FNL) operational global reanalysis data, available from RDA, was used. The data present better spatial quality (0.25° × 0.25° grid resolution, ~28 km × 28 km) and they are prepared operationally every 6 hours from the Global Data Assimilation System (GDAS), which continuously collects observational data from the Global Telecommunications System (GTS), and other sources, for many analyses. The same parameter was used (temperature at 2 m height) for the same time period.

Methodology
At first, the data had to be treated accordingly in order to be used in a deep learning algorithm. The steps below were followed: (1) The area of Greece (34°N-42°N × 19°E-29°E) was extracted for both data sources.
(2) MFS input resolution was upscaled to match FNL resolution by applying a Gaussian regression model (PyKrige, 2022) (Figure 1(a)), for the same area as step 1.
(3) Ordering the MFS and FNL frames according to the timestamp.
(4) Keeping the information of four properties (Forecast Frame Number, month, season, and hour) to be used as evidence transfer.
It is noted that in step 3, the MFS data are initialized every 1 st day of each month, with each initialization consisting of 720 frames into the future (6 months, and temporal resolution of 6 h). As such, there are overlapping time frames due to different initializations. On the other hand, the FNL data, produced continuously in a day cycle of 6 h (00, 06, 12, 18). In our case, the cycle of 00 is taken for each daily data of FNL. As such, the FNL dataset is a continuous time series without overlaps. In order to create the dataset, each initialization was matched with the FNL dataset and the process was repeated by appending the next MFS initialization alongside the matching FNL. A representation of the data preparation and target output is illustrated in Figure 1 In addition, the evidence representation is a class number relative to the evidence: (I) Forecast Frame Number (FFN): The id number of the simulation frame saved in the dataset for each forecast time-series → 720 classes (II) Month: The month that each frame belongs to → 12 classes (III) Season: The season that each frame belongs to → 4 classes (IV) Hour: The hour of the day that frame belongs to → 4 classes In the end of data preparation process, three arrays were produced with the following dimensions (timeframe, width, and height): (A) input -(30240, 30, 48) (B) output -(30240, 30, 48) (C) evidence transfer -(30240, 4) Convolutional autoencoder (CAE) (Masci et al., 2011) are better suited for image processing as they utilize the full capability of convolutional neural networks to exploit image structure (Martinez-Murcia et al., 2020). In the current study, an under-complete CAE is used as part of the dynamic error correction of NWP data. The goal is to learn the most salient features of the gridded input data in order to reconstruct an output of better resolution.

Evidence transfer
Evidence transfer is the process by which the outcome of an external, auxiliary task is exploited to improve a primary task (Davvetas et al., 2018). It is by definition a transfer learning configuration where the weights of a pre-trained autoencoder are trained to integrate the external evidence into the task of reconstruction. In the case of CAE evidence transfer, the steps followed are described below: (1) Initialization: Introduction of a baseline method by training the CAE, as per standard practice, whereas the model learns to reconstruct the input X as output X', by generating a bottleneck and learning by association, useful and reusable latent representations h.
(2) Evidence preparation: (i) Prepare the evidence to be used in raw format or (ii) train an additional autoencoder (AE) for each source of evidence, to produce latent variables. For (ii) a model that is trained, whereas a latent representation h e is learned for the evidence. (3) Input evidence: Additional layers (one for each source of evidence) are added to the output of the CAE in order to predict the raw or latent (h e ) value of step 2. It is noted that there is no direct manipulation of the latent space h but a new training that permits the adjustment of latent space weights (indirect manipulation). Depending on the quality of the evidence, the weights decay in the case of low-quality evidence.
The workflow of training is presented in Figure 2. The steps are followed for the case of the dataset produced as shown previously and a comparison study is performed between (i) the simple CAE, illustrated in Figure 3a and (ii) the CAE with evidence transfer, illustrated in Figure 3b. It is noted that in step 3 the evidence is used in raw format and that evidence preparation step, is only part of the training process and the output of the final models (CAE with evidence transfer) are assessed in Section 4. It is noted that such approaches are also called auto-encoder-like architectures (Scher, 2018) and (Weyn et al., 2019) but in the current study we follow the terminology of Davvetas et al. (2018).

Results and discussion
In the current study, a CAE is trained with a series of evidence, and a qualitatively and quantitatively comparison is carried out. The optimization, train and validation strategy are based on the following stages: • A greedy search of the optimum combination of loss function and optimizer (50 epochs). The winning combination was the Mean Square Error (MSE) loss function alongside the Stochastic Gradient Descent (SGD) optimizer. The latter is chosen not only for the better performance but also for the less aggressive changes in the weights of the network. A milder change of the weight enables the network to distinguish the quality of the evidence and essentially not use it. • A greedy search of optimizer's hyperparameters (50 epochs): The learning rate of SGD was assessed and the value of 0.01 was chosen. • Training of the CAE without any evidence with 80% of the dataset (500 epochs) (X: MFS input, X': FNL, see Figure 3(a) • Use the pretrained CAE as basis and train a CAE with evidence transfer (one for each evidence) with 80% of the dataset (200 epochs) (X: MFS input, X': FNL, e i : evidence i, see Figure 3(b)) • Assess the different results by: (a) Analysis performed by examining spatial distribution of RMSE and comparison with a bias correction result on MFS input data (Grid analysis) (b) Extracting the test data series for 10 different points (10 major cities) and calculating the Root Mean Square Error (RMSE) and Index Of Agreement (IOA) between the target data (FNL) and the predicted (CAE output). Both metrics are commonly used in assessing weather time-series data.
It is noted that the IOA formula, the CAE architectures, alongside the performance assessment, are presented in the ANNEX section.

Grid analysis
In order to provide a baseline to assess the results of CAE models built in the current study, a simple calculation was performed in MFS input data, thus resulting in a bias corrected (BC) MFS dataset (Nourani et al., 2021). The latter consists of subtracting the average value of temperature of each grid point and adding the average value of the same FNL grid point. The original MFS average temperature values alongside the difference between average MFS and MFS BC (same as the difference between the average MFS and average FNL) are presented in Figure 4 It is found that the bias corrected MFS differs over −10 and + 10°C. For assessing the performance of CAE cases, the percentage of difference between the spatial mean temperature over time of CAE outputs and MFS BC was calculated and presented in Figure 5(a-d).
In all cases except the FFN evidence case, the solution of CAE provides a good datadriven approach to the downscaling process with slightly better results compared to MFS BC, especially in areas with high altitude and rough terrain. The process can be described as a denoising procedure. This result indicates that even if the MFS data are sufficient for the NWP simulation to forecast the weather conditions, the CAE solution seems to perform more like a denoising CAE. Moreover, the CAE outputs illustrate better results in areas with complex terrain and surface compared to bias correction method. For the case of season evidence (Figure 5c), the difference percentage seems overall higher, which may be an indication of different performance of the specific evidence. In addition, in the ANNEX section, the spatial RMSE difference between MFS bias corrected and CAE outputs can be found ( Figure A3.3), whereas the same conclusion is derived. Moreover, to quantify the performance of CAE outputs that performed as they should (all except FFN), the mean spatial RMSE over time was calculated and plotted ( Figure 6). The CAE outputs illustrate similar performance with the values of RMSE to be ~2 and 7. The illustrations of grid analysis section cannot conclusively answer the question of which case performs better or which evidence is more useful. As such, a point analysis has been performed and shown in the next section.
It is noted that in the case of FFN, the CAE was under-performed by produced noisy images. As such, the FFN case was not used in point analysis. An illustration of mean spatial RMSE over time for FFN can be found in the ANNEX section ( Figure A3.2).

Point analysis
In order to assess the performance of the CAE as an error correction tool for seasonal forecast data, a point analysis has been performed by extracting the datapoints of 10 major cities of Greece (Alexandroupoli, Athens, Thessaloniki, Ioannina, Irakleio, Kefalonia, Lamia, Naxos, Rodos and Tripoli). The most common statistical indices in weather data are RMSE and IOA, and those are used in the current study. Figure 7a and Figure 7b illustrate the distribution of log RMSE between the target value and the prediction of the models for the 10 cities, alongside the baseline distribution (error between MFS input and target FNL).
In general, the data-driven approach for the resolution augmentation and error correction of the temperature field appears to be successful in terms of improving the input. By examining the data plots of the RMSE distribution, there is a significant improvement in the case of seasonal evidence outperforming all the others.
On the other hand, the hour evidence and step from initialization have the worst performance. The hour (with 6-h step in the current case, meaning 4-h steps per day) represents the fluctuation of the temperature during the day, which has a peak and a low (two states/classes). As such, the 4-h steps per day act negatively to the CAE and lowers the performance significantly.
The above observations are consistent with the average RMSE and IOA of all cities (Table 1). All cases yield a relatively low RMSE with no major deviation compared to the input. The values of IOA seem to be improved significantly, (values closer to 1 for IOA indicate better agreement (Willmott, 2013)), with the case of no evidence and month evidence appearing as the best cases.
More interesting are the results of monthly performance of CAE presented in Figure 8 (data available in Table A.3.1 in the ANNEX section). The Net-plot illustrates the monthly results of RMSE and IOA, whereas, the line closer to 0 and closer to 1 is better accordingly. In both metrics, the no evidence and month evidence outperforms or performs as good as the other cases in monthly analysis, whereas the hour is outperformed even from the MFS input in some cases (IOA in May). The similar results of no evidence and month evidence may be an indication of the neutral effect that the evidence has on the model or the only evidence that can potentially provide better results.
From the monthly distribution of the IOA, some months from autumn and spring are the most difficult cases. This is obvious in season evidence case (October to March). The latter has a binary behavior. Season evidence has as good performance as no evidence    and month evidence during half of the year (April to September) and low performance the other half (October to March). This result is to be expected since spring and autumn are the most variable seasons, with extreme diurnal changes, hence it is hard to produce a better reconstruction. It is noted that even the NWP models produce results with high error for spring and autumn compared to meteorological station observations. Overall, the CAE gives a very good approximation of error correcting and downscaling of the temperature field for the area of Greece, with the no evidence and month evidence improving in the case of spring and autumn. The RMSE metric seems to be a non-useful metric for the current case study, as it does not vary enough between cases to the extent to conclude the evidence that yields the best results. On the other hand, IOA highlights the differences between types of evidence, and it enables us to deduct useful conclusions about the seasonality of the data agreement that may lead to new strategies with the purpose of improving the performance.

Conclusions
In the current study, a data-driven approach was used to dynamically correct the error of the global NWP model for Greece, for the case of temperature. The approach was a CAE with evidence transfer, trying to depict the most significant factors that may affect the simulation results, thus, enabling CAE to reconstruct a better approximation.
Overall, CAE proved to be a very good approach with some types of evidence giving indication that potentially it can provide better results compared to no evidence case. The validation was performed on the test part of the dataset, and it was based on a grid and point analysis. The grid analysis comprised the spatial mean over time comparison of the MFS BC method and RMSE statistics of temperature contour plots while the point analysis of the RMSE and IOA metrics for 10 major cities. The grid analysis revealed that the CAE approach provides similar but not the same results with bias-corrected calculations. On the other hand, the point analysis study was performed upon all the time covered by the test dataset and, also on a monthly basis. The types of evidence describe the monthly, seasonal, hour and simulation fluctuation of the error appearing in the forecasting models. As such, it was expected to give a better performance to CAE, especially the distance of the frame from the initialization of the simulation (FFN). The results revealed different behavior of the CAE models between the stable (winter and summer) and variable seasons of the year (spring and autumn); winter and summer gave very good results (closer to 1 for IOA), whereas spring and autumn presented an average performance. In contrast to the original assumption, the step from initialization (FFN) is the worst case and seems to add noise to the model. The latter is probably the result of a small dataset, compared to the classes the model needs to learn (720 classes with 42 unique seasonal NWP simulations).
The current approach has significant advantages compared to other well-established machine-learning methods. There is a series of regression models that belong to the family of model output statistics (MOS) (Glahn & Lowry, 1972). The most common approaches use multiple linear regression (with forward selection), polynomial or logistic regression schemes in order to improve the ability of the numerical weather models to forecast by relating model outputs to observational or additional model data (Holmstrom et al., 2016). It is noted that commonly, the input and output are of the same resolution, especially in the case where the observation data are not gridded, but points. Interestingly, the FNL dataset used as target values in the current study, is the result of GFS model corrected with satellite, weather station and other data sources via the aforementioned procedure, produced a day after the GFS model. The advantage of the CAE approach is the production of higher resolution gridded data that approximates the FNL quality, for future time steps (e.g. 6 months ahead) and not for past time periods.
Moreover, more simple approaches exist that are used as preprocessing tools for NWP simulations. They create a higher resolution grid from the global input model, by linearly interpolating the x, y, z-axis using a filter (e.g. Gaussian). The resulting domain is used as the initial step for the NWP simulation to perform weather predictions and at the same time produce a higher resolution grid of the area under study (dynamic downscaling). A similar method was used in the current study with a similar goal. The MFS input grid resolution was upscaled to match FNL resolution by applying a Gaussian regression model. Hence, such methods were used as pre-processing tools, resulting in a gridded domain of a much lower quality compared to other analytical, statistical or data-driven methods. The performance of the current work may be an indicative justification for use these tools under the same context.
Due to the promising results of the CAE approach with evidence transfer, the reconstructed domains can be used either as a decision support system for assessing more reliably the weather conditions 6-months ahead (temperature in this case) from global data, or it can be used as better initial and boundary conditions for performing more reliable NWP seasonal simulations. Future work in terms of validation and analysis may include a bigger dataset for assessing the evidence or combination of evidence (two or three at the same CAE) and especially the month and FFN evidences or group the FFN classes to diminish the number. Also, the use of other weather variables besides temperature, such as precipitation and wind, can be investigated. Furthermore, considering the sequence of the frames, hence, the dimension of time, and combining CAE with RNN architectures could constitute another challenging future task. Diamando Vlachogiannis is a Research Director at NCSRD "Demokritos" working and coordinating national and EU-funded projects. Her interests revolve around weather, seasonal and climate data projections and air pollution modelling. She is currently a member of EREL at NCSR "Demokritos".

Data availability statement
The data that support the findings of this study are openly available in https://cds.climate.coperni cus.eu/#!/home for MFS dataset and in https://rda.ucar.edu/ for FNL dataset.