Reduced order digital twin and latent data assimilation for global wildfire prediction

. The occurrence of forest fires can impact vegetation in the ecosystem, property, and human health, but also indirectly affect the climate. JULES-INFERNO is a global land surface model, which simulates vegetation, soils, and fire 10 occurrence driven by environmental factors. However, this model incurs substantial computational costs due to the high data dimensionality and the complexity of differential equations. Deep learning-based digital twins have an advantage in handling large amounts of data. They can reduce the computational cost of subsequent predictive models by extracting data features through Reduced Order Modelling (ROM) and then compressing the data to a low-dimensional latent space. This study proposes a JULES-INFERNO-based digital twin fire model using ROM techniques and deep learning 15 prediction networks to improve the efficiency of global wildfire predictions. The iterative prediction implemented in the proposed model can use current-year data to predict fires in subsequent years. To avoid the accumulation of errors from the iterative prediction, Latent data Assimilation (LA) is applied to the prediction process. LA manages to efficiently adjust the prediction results to ensure the stability and sustainability of the prediction. Numerical results show that the proposed model can effectively encode the original data and achieve accurate surrogate predictions. Furthermore, the 20 application of LA can also effectively adjust the bias of the prediction results. The proposed digital twin also runs 500 times faster for online predictions than the original JULES-INFERNO model without requiring High-Performance Computing (HPC) clusters. The implementation code of this study and the developed models are available at https://github.com/DL-WG/Digital-twin-LA-global-wildfire.


Introduction
Every year, unwanted wildland fires result in significant economic, social, and environmental impacts on a global scale (Grillakis et al., 2022).In addition, forest fires are especially sudden, extraordinarily destructive, and notably difficult to timely respond (Grillakis et al., 2022).Globally, an average of more than 200,000 forest fires occurs each year, burning more than 1% of the world's forested area (Pais et al., 2021).Wildfires not only destroy vegetation but can also pose risks to life and property, impact human health due to smoke pollution, and feedback on climate change through the release of stored carbon (Kim, 2015;Marlier et al., 2015;Jauhiainen et al., 2012;Lasslop et al., 2019;Ward et al., 2012).Studying the influence of vegetation and climatic factors on wildfire occurrence is critical for anticipating future wildfires and preparing for their possible impacts on ecosystems and society.
Global fire frequency is related to land use, vegetation type, and meteorological factors.Arid land, hot and dry weather, and combustible vegetation are all wildfire risk factors.Therefore, data from validated natural environment models can predict forest fire risk.Numerous valid and relevant models exist, such as Earth System Models (Claussen et al., 2002) and Dynamic Global Vegetation Models (DGVM) (Prentice and Cowling, 2013).The Joint UK Land Environment Simulator -INteractive Fire and Emissions algorithm for Natural envirOnments (JULES-INFERNO) (Best et al., 2011;Clark et al., 2011) is an example of a DGVM.JULES-INFERNO simulates fire burnt area and emissions over time based on geographic features such as population, land usage, and meteorological conditions (Burton, 2019;Mangeon et al., 2016).However, implementing such a DGVM simulation is time consuming due to the complexity of the physical model and the number of geographical features associated.For example, predicting wildfire occurrence over a 100-year climate change scenario takes approximately 17 hours runtime using 32 threads on the JASMIN High-performance Computing (HPC) service (JASMIN, 2022;Lawrence, 2013) (100-year runtime estimated as average from 4 simulations, range 14.3 -19.7 hours, using Intel Xeon E5-2640-v4 "Broadwell" or Intel Xeon Gold-5118 "Skylake" processors with ~7 GB RAM available per thread).This is further limited by availability and queuing priority of the compute resource.Therefore, building an efficient digital twin as a surrogate model is necessary for accelerating the prediction process.
The digital twin paradigm combines process-based information and data-driven approaches to best estimate complex systems.
An extensive range of scientific problems, including medical science (Quilodran-Casas et al., 2022), nuclear engineering (Gong et al., 2022a), and earth system modelling (Bauer al., 2021), use the digital twin paradigm for modelling.Since the 1990s, researchers have studied the relationship between vegetation and wildfire using machine learning and artificial intelligence (Nadeem et al., 2020).Recent research shows that machine-learning-based Reduced Order Modelling (ROM) can effectively reduce the computational cost by constructing surrogate models (Mohan and Gaitonde, 2018).ROM efficiently compresses the raw data into a low-dimensional space using an encoder and then decompresses the data using a decoder.Thus, predictions and simulations can be performed in a low-dimensional latent space before being decoded into a real physical space.However, predictive models trained using large amounts of data do not necessarily guarantee long-term prediction accuracy.In fact, iterative applications of Sequence-to-Sequence (Seq2seq) forecasting models can lead to error accumulation, resulting in incorrect long-term predictions (Cheng et al., 2022a;Cheng et al., 2022b).Researchers have applied data assimilation (DA) methods to address this challenge.DA methods correct the predicted data by combining simulated data with observations through specific weighting (Gong et al., 2020).Applications of DA in high-dimensional dynamical systems include weather forecasting (Lorenc et al., 2000;Bonavita et al., 2015;Ma et al., 2014), hydrology (Cheng et al., 2020), and nuclear engineering (Gong et al., 2022b).However, wildfire data from the increasing number of meteorological satellites have high data dimensions (Jain et al., 2020).Applying DA operations on full-size data is computationally expensive, if not prohibitively so.Because there are some major challenges in high dimensional dynamical system models, such as ROM, dynamical system identification, and model error correction (Cheng et al., 2023).
Latent data Assimilation (LA), which combines ROM, ML surrogated models, and DA was recently proposed (Peyron et al., 2021) and applied to a wide range of engineering problems, including air pollution modelling (Amendola et al., 2021), multiphase fluid dynamics (Cheng et al., 2022a) and regional wildfire predictions (Cheng et al., 2022b).In LA, data compression happens before the DA operation (Peyron et al., 2021), significantly reducing the computational cost.Iteration of this prediction assimilation process improves the starting point of the next time-level forecast and leads to more robust longterm predictions.
The main objective of this study is to propose a digital-twin model with the same burnt area output as the JULES-INFERNO system.The digital twin model combines ROM, machine learning predictive models, and LA.Data from the JULES-INFERNO simulations are used to train the machine-learning-based digital-twin model.Time series of wildfire-related data is given as input to predict the occurrence of wildfires in following years.The proposed model is tested on unseen initial conditions to predict subsequent fire conditions.This digital twin can significantly improve prediction efficiency compared to the physical model.
Two ROM approaches, Convolutional Autoencoder (CAE) and Principal Component Analysis (PCA), are chosen to reduce data dimensionality.Long Short-Term Memory (LSTM) forms the main structure for wildfire occurrence probability prediction because LSTM is suitable for dealing with dynamic simulations with long-term temporal correlations (Graves and Schmidhuber, 2005a).LA is applied to correct the model results as soon as 'observation' data become available.The 'observation' data used here for LA is the encoded data from the original JULES-INFERNO simulations, which we use as a proof-of-concept substitute for high-quality observation data.Compared to the traditional DA in the entire physical space, LA can considerably improve computational efficiency thanks to the ROM (Graves and Schmidhuber, 2005a).This research aims to use the initial months' output from JULES-INFERNO as input to implement a global-scale wildfire predictive model that combines ROM, recurrent neural networks (LSTM), and LA for efficient wildfire forecasting.
To summarize, the main contributions of this study are as follows.
• This research implements a deep learning digital twin for wildfire prediction based on the JULES-INFERNO landsurface model.The digital twin can greatly improve the computational efficiency of the physical model, by embedding the input into a low-dimensional space before prediction.
• We tested and compared two ROM approaches: Convolutional Autoencoder (CAE) and Principal Component Analysis (PCA), in terms of reconstruction and prediction accuracy over the simulation period of 1961 to 1990.The objective is to achieve a significant reduction in data dimensionality to improve the efficiency of subsequent predictions but maintain the original characteristics of the data.The digital twin achieves long-term predictions by making iterative predictions by using the current predicted results as the next-level input to predict the situation in the next time step.
• LA is applied to avoid error accumulation.LA compares the predicted values against the original JULES-INFERNO outputs (considered as observations in this study) and periodically adjusts the predictions.When applying unseen scenarios to the developed model, LA adjusts the predicted values to improve the accuracy of subsequent predictions.
Therefore, LA implemented in this project can effectively stabilize prediction results.
The paper is organized into the following sections: Sect. 2 explains the original physical model, JULES-INFERNO, and the data set of this project.Sect. 3 describes the dimensionality reduction method used in the study (PCA and the constructed CAE structure), explains the surrogate model used in this research for wildfire prediction, and outlines the principles and applications of LA.Sect. 4 discusses the experimental results and analysis of the study.Finally, Sect. 5 concludes the most important findings of this research and suggests future directions for enhancement.

JULES-INFERNO
The Joint UK Land Environment Simulator (JULES) is simulating land surface vegetation, carbon stores, and hydrology, primarily using physical models to simulate the processes of land-use, water and carbon fluxes with the atmosphere, and climate interactions with vegetation dynamics (Best et al., 2011;Clark et al., 2011).The INFERNO fire scheme was constructed based on a simplified parameterization of fire ignition and vegetation flammability (Pechony and Shindell, 2009).
Coupled with JULES, the INFERNO scheme predicts fire burnt area and carbon emissions based on the simulated vegetation, as well as vegetation mortality due to fire, which feeds back on the vegetation distribution (Burton, 2019;Mangeon et al., 2016).More precisely, the model calculates flammability based on soil moisture, fuel density, temperature, humidity, and precipitation, and then combines lightning strikes to estimate the average area burnt.Finally, emitted atmospheric aerosols and trace gases can be calculated from vegetation-dependent emission factors (Mangeon et al., 2016).Figure 1 illustrates the input variables and critical components of JULES-INFERNO.However, this model involves a large amount of data and parameters, leading to high computational cost.In the present paper, we propose the use of a surrogate model that substitutes the original high-precision simulation model.The surrogate model aims to use the same input fields as the original model and outputs a result that approximates the original INFERNO burnt area output but is less computationally intensive to solve.
This research considered four meteorological boundary conditions and predicted burnt area (temperature (), humidity (), rainfall (), and lightning ()) and the field of burnt area fraction ()) from the JULES-INFERNO model as input data.We used an ensemble of five JULES-INFERNO simulations which were forced using meteorological conditions from the FireMIP last glacial maximum (LGM) scenario (Rabin et al., 2017), which uses a detrended 1961-1990 re-analysis timeseries, uniformly shifted to match LGM global average climate, as the atmospheric boundary conditions.Monthly-mean values were collected for each meteorological boundary condition, nominally from 1961-1990 for each simulation.Thus 360 snapshots for each variable are available, denoted as temperature (   ,  = 1, 2, ⋯ ,360 ), humidity (   ,  = 1, 2, ⋯ ,360 ), rainfall (   ,  = 1, 2, ⋯ ,360), and lightning (  ,  = 1, 2, ⋯ ,360).JULES-INFERNO was run with a resolution of 1.25° latitude × 1.875° longitude, giving each snapshot a latitude span of 144 units and a longitude span of 192 units, so the size of each snapshot is 144 × 192.We denote the wildfire burned area predicted by JULES-INFERNO from each of the 5 ensemble members as Although each ensemble member simulated the same nominal period (1961 to 1990) and was forced by the same detrended meteorological boundary conditions, we applied different initial internal states to sample a range of model internal variability.As shown in Table 1,  1 was randomized, and the initial internal states of the subsequent experimental results were all the last internal states of the previous one.The burnt area model output is only diagnosed over latitudes with non-zero land cover, and so ( , ,  = 1, 2, ⋯ ,5,  = 1, 2, ⋯ ,360), only spans 112 latitude units, meaning the fire snapshot size is112 × 192 rather than 144 × 192.
In subsequent experiments,  1 ,  2 and portion of  3 are the training set and another portion of  3 is the validation set during training. 4 and  5 were used as the test set for all models built in the study.In addition, to eliminate the adverse effects of odd sample data in training, this research standardized all training and test sets (climates and fire variables) by applying Eq. ( 1) to normalize the data to [0, 1]:

PCA
PCA extracts essential information from the data and eliminates redundant information by analysing principal components.
PCA uses the idea of dimensionality reduction to form new variables by linearly combining multiple parameter indicators, mapping the original n-dimensional features to k-dimensional features ( > ), k is known as the truncation parameter in PCA (Bianchi et al., 2015).
The compressed variable  ̃ can be represented by a linear combination of  1 ,  2 , … ,   as shown in Eq. ( 2), to make the global variance after the transformation maximal.
The covariance matrix C is shown in Eq. ( 3), where the formula for the covariance is shown in Eq. ( 4). (3) where, (  ) represents the mathematical expectation of   .
The contribution rate of each principal component can be calculated using the principal component explained variance formula (Eq.( 5)), the principal components corresponding to the special eigenvalues  1 ,  2 , ⋯ ,   are selected in the descending order to construct  ̃.The cumulative explained variance is referred to Eq. ( 6).

CAE
An autoencoder (AE) is a self-supervised deep learning method that minimizes the reconstruction error between its model input and output.The autoencoder optimizes parameters to obtain a low-dimensional data representation of high-dimensional features (Huang et al., 2017).The feature extraction method used in standard AE is fully connected and straightforward to implement.However, each neuron connects to all the neurons in the next layer.Thus, standard AE generates massive parameters, making the computation more expensive while ignoring some spatial patterns in the image (Masci et al., 2011).
Convolutional Autoencoder (CAE) (Masci et al., 2011) The encoder designed for this study uses three convolutional layers, three max-pooling layers and two fully connected layers.
The decoder includes four convolutional layers, three up-sampling layers and one fully connected layer.Figure 2 shows the CAE structure of this study with a latent space of dimension 100, and Conv2d, MaxPooling2D, Dense, UpSampling2D representing the 2D convolutional layer, max pooling layer, fully connected layer, and up-sampling layer respectively.
Firstly, the sample data,  = {  } =1,2,⋯, need to be converted into a two-dimensional matrix of  rows and  columns ( *  = ) according to the latitude and longitude range of the original data, and then each snapshot could be interpreted as an image.Given the input , features are extracted by multiple convolutional kernels to obtain the output ∁  of the  ℎ layer, then the ∁  ′ is obtained by down-sampling through the subsequent pooling layer to retain the main features of the inflow data and prevent over-fitting.The operational methods are shown in Eq. ( 7) and Eq. ( 8). Figure 3 shows the basic structure of the whole surrogate model.[ 1,1 ,  ]. , Thanks to the gate structure, LSTM networks are efficient in dealing with long-term temporal correlations that standard RNN cannot handle (Graves and Schmidhuber, 2005b).More precisely, the essence of the gate structure of the LSTM is to use the sigmoid activation function so that the fully connected network layer outputs a value between 0 and 1, describing the proportion of the information quantity passed.The forgetting gate indicates the proportion of the output information quantity forgotten at the last moment, and the input gate represents the proportion of the input information quantity retained at the current moment, both updating the state value together.Finally, the output gate represents the proportion of the new state output.Suppose   ( 0 = [ , ,  ,+1 , ⋯ ,  ,+  −1 ],  = 1,2,3,  = 1,2, ⋯ ,349) are the current inputs, ℎ −1 is the output of last moment,   is the value of new state,  −1 is the state value of last moment.Equations (12,13,14,15,16,17)

Latent Data Assimilation
The basic principle of DA is the combination of numerical models with observation data to improve the forecast of the system under study (Vallino, 2000).When applying the developed predictive model to unseen initial conditions, we apply the DA method periodically to enhance the current prediction results with the help of the observation data.The observation data in this research is considered the original output of the JULES-INFERNO model so that we can simulate the situation where high-quality real-time observation data are available.As periodic corrections are made to the forecast data, DA allows for a more stable and accurate long-term prediction.In this study, LA was applied, which combines ROM with DA, i.e., reducing the dimensionality of the data before applying the DA method (Peyron et al., 2021).In addition, ROM could reduce the parameters for subsequent operations, which allows LA to have a significant advantage in terms of computational efficiency compared to the classical full-space DA.
Both DA and LA can be summarized as a problem of solving the minimization of an objective function   (•) characterizing the deviation between the analysis field (optimal state  ̇ ) and the observation field (actual state   ) as well as the background field (predicted state   ′ ), as shown in Eq. ( 18).
where  indicates the temporal index (the  ℎ predicted year),   ̇ is the analysis variable of DA,   ′ is the background variable,   is the observation (original) variable.Β and Ο represent the background and observation error covariance matrix.The modelling of these covariances determines the weight of prediction and observation in the objective function, which is crucial for DA approaches (Lawless et al., 2005).ℋ(•) is the state-observation function in latent space, which maps the analysis variable to the observation variable.In this study, since we are assimilating the compressed output of the original JULES-INFERNO model, ℋ(•) is the identity function.
The latent analysis field  ̇, could be obtained by optimizing the objective function (Eq.( 18)), The Eq. ( 18) can be solved by the Best Linear Unbiased Estimator (Fulton, 2000), since the transformation operator ℋ = Η is a linear function in the latent space, The obtained analysis state  ̇, can be used as an initial point for the next level prediction, as shown in Figure 4, the predicted field in the latent space could be  ̇, (when applying LA at  time index) or  , ′ ( LSTM prediction at  time index).
These processes can be repeated periodically to consistently improve the prediction performance.Furthermore, in Figure 4,  In this section, we evaluate the proposed digital twin model's performance regarding prediction accuracy and computational efficiency when the digital twin model faces unseen initial conditions.

ROM results
To set an optimal dimension of the latent space, we examined the explained variance of PCA for the training dataset.The training dataset consists of P1, P2, and P3 with different numbers of principal components, as shown in Figure 5.According to the work of (Li et al., 2021), 85% to 95% of the explained variance can reflect most data information while effectively reducing the original data dimensionality.Therefore, to reduce the latent space dimension and the computational cost of subsequent processing, we chose in this research the 100-dimension latent space with 87% explained variance, as shown in Figure 5.In addition, we also chose the 20-dimensional latent space for both PCA and CAE for comparison purposes.we first input the first year (12 snapshots) data and then use the current predicted result for the next-level prediction, until the 30 ℎ year's total burned area is predicted.The MSE for different combinations of ROM and LSTM are shown in Table 3.  Comparing the results shown in Table 2 and Table 3, there is a gap between the MSE of the predictive model and the ROM reconstruction, mainly because of the accumulation of errors during the iterative prediction process.Therefore, we periodically used the observations (raw data) and LA to adjust the forecast results to stabilize subsequent forecasts.LA is implemented every five years during the prediction phase in this study.According to Table 3, there is a steady reduction in forecast MSE after correction by LA.The most significant reduction of MSE is obtained with 100-dimensional CAE compared to other approaches.
To better illustrate the evolution of the prediction error, we display the MSE of the prediction against time in different predictive models.In this study, the existing climate and fire data in 1961 (provided by JULES-INFERNO) with different initial conditions is given to predict the subsequent 29 years of fire. Figure 6 shows the MSE per prediction for the different surrogate models (in red when without LA and in green when with LA) evaluated on  4 test set and we plotted the ROM MSE (in blue) for comparison.shown in Figure 6 (c, d).These results demonstrate the advantage of CAE compared to PCA in terms of generalizability when applied to unseen data.This effect has also been highlighted in the work of (Peyron et al., 2021).Comparing the four cases presented in Figure 6, the 100-dimensional CAE demonstrates the strength of both the original prediction and the assimilated one.As observed in Figure 7, CAE and LSTM without LA can effectively reconstruct most of the features in the original image.
Furthermore, comparing the LSTM predicted images (i, j, k, l) and the optimisation results after applying LA (m, n, o, p), it is found that the assimilation results are significantly closer to the JULE-INFERNO output.
Overall, in this research, we implemented four ROMs based on PCA and CAE, and the comparison reveals that the 100dimensional CAE has the best reconstruction performance.Then, after combining the ROMs with predictive models (LSTM), a gap between the prediction results and the ROM reconstruction results is noticed.Eventually, the predictions of the surrogate model are made more stable after regular applications of LA to adjust the prediction outputs.
Table 4 compares the prediction time costs of the surrogate models using PCA and CAE downscaled to a 100-dimensional latent space without and with LA, respectively.The prediction time of JULES-INFERNO on HPC is also shown as a reference.
As can be seen from Table 4, the difference of PCA and CAE on the running time of the models is not significant; the application of LA slows down the prediction, but still reduces the computing time cost significantly (around 500 times) compared to the original JULES-INFERNO fire predictive model.or GPU resources.Thus, a personal laptop can also run the model successfully.

Conclusion and Future work
This study implemented a deep learning-based digital twin for global wildfire prediction using the same input and output fields to replace the physical wildfire model JULES-INFERNO.The proposed model builds ROMs to encode the data into latent space to reduce subsequent processes' computational costs.Then it makes use of the compressed data to train a predictive model based on RNN.Finally, it applies LA to stabilise the prediction results when using the predictive model for iterative prediction to avoid error accumulation.According to the numerical results, the digital twin built using CAE is more accurate compared to the ones using PCA and can effectively capture the features of the original image for encoding and decoding.
Applying LA periodically to optimise the prediction results can address the issue of error accumulation and achieve stable

Declaration of competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper

Figure 1
Figure 1 The mechanism of JULES-INFERNO Figure 2 The CAE Model structure

Figure 3
Figure 3 The Surrogate Model structure The seq2seq prediction based on the LSTM network is implemented in this study with   timesteps as input and   timesteps as output.Here, each timestep represents a month of simulation time.The length of the input and output sequences is fixed as the final predicted sequence of the digital twin is the combination of the { ̇, } [0,29] and { , ′ } ∈[0,29] ( ≠ ), which could be denoted as   ′ .

Figure
Figure 4 The Predict Model with LA

Figure 5 PCA
Figure 5 PCA Explained Variances with different Dimensions In terms of performance evaluation, the scenarios  1 ∪  2 ∪  3 are used as the training set and  4 ∪  5 are considered as the test set.The number of training epochs for the CAE models is fixed as 1000.The MSE between reconstructed and original simulations for different models are shown in Table 2. To quantify the mismatch between the JULE-INFERNO output  and decoded prediction ′, we compute the MSE averaged by the number of pixels, i.e., ∈ MSE = √∑ || , − ′ , || 2 2 /(112 * 192)

Figure 6
Figure 6 Evolution of MSE against the years in different Surrogate modelsAccording to Figure6, using LA can effectively reduce the prediction error, as the vast majority of red points are significantly lower than the corresponding green points.Furthermore, CAE-based surrogate models can better adapt the LA, stabilizing the predictions afterwards and reducing the accumulation of errors.When applying the LA to the PCA-based surrogate models to reduce the error after the simulation, there may still be a sharply increasing forecast error in the subsequent year, as seen in

Figure 7
Figure 7 The nomadized output of 100-dimensional CAE-based digital twin Figure 7 displays the output of the100-dimensional CAE-based digital twin model of year 1983 and 1988 (one year after LA is implemented), the assimilated results (m, n, o, p) are compared with the original simulation (a, b, c, d), the CAE reconstruction (e, f, g, h), and the decoded image predicted by the LSTM model (i, j, k, l).In addition, the total burnt area shown in Figure 7 are normalized to the range 0 to 1.The horizontal and vertical coordinates in Figure 7 represent latitude and longitude, and the colour values in the figure represent the grid box which indicates estimated burnt area fraction.The brighter long-term predictions.Ultimately the research achieved a much more efficient wildfire-prediction digital twin based on JULES-INFERNO.The proposed model takes only 35 seconds on a laptop to predict 30 years of burnt area fractions, compared to ~5 hours using the original physical model on a 32-thread HPC node.The present research clearly brings additional insight to the computing of reduced order digital twins for high-dimensional dynamical systems.Future work can consider to further improve the generalizability of the proposed approach, by, for instance, training the model with scenarios of different time periods or fine tuning the model when the initial conditions are outside the range of the training set.In addition, it is reported in recent works(Teckentrup et al., 2019) that long-term predictions of JULES-INFERNO can introduce forecast bias.Further efforts can be considered to apply latent data assimilation framework developed in this paper with real-time satellite observations.

Table 1 JULES-INFERNO Experiments' initial internal states
̂= {  ̂} =1,2,⋯, .ROM's primary purpose is to minimize the expectation of the Mean Square (Masci et al., 2011)E and CNN, has been proposed to address the drawbacks of fully connected AEs.The CAE model inherits the self-supervision function of standard AE but replaces the matrix product operation between hidden neurons with convolution and pooling operations.Compared to AE, CAE can capture local spatial patterns of monitoring data(Masci et al., 2011).Furthermore, extracting features by convolution can reduce the number of parameters and increase the training speed.
This study includes climate and fire conditions from various regions in the form of two-dimensional images.Therefore, this research constructed a CAE model using AE reconstruction, dimensionality reduction features, and CNN local feature extraction capabilities.The CAE model aims to achieve self-supervised learning of environmental and wildfire features.
1,2 , ⋯ ,  1,  ]As for the iterative predictions in test dataset, the trained model uses the current predictive result as the input of next prediction.The first-year data of the test set is used as model input, and the prediction result set only contains data from the following 29 years.At each iteration, the model can predict the global wildfire risk for the next year, as shown in Eq. (11).

Table 2 PCA and CAE performances
According to Table2, for both PCA and CAE with 100-dimensional latent space, the reconstruction error is significantly lower compared to 20-dimensional.Furthermore, it can be evaluated from the experimental results that the CAE reconstruction MSE is lower compared to PCA.As previously described, CAE can capture feature information in 2-dimensional images during the encoding process, while PCA, which performs compression in 1-dimension space, may ignore these spatial patterns.In summary, CAE with 100-dimensional latent space has the best reconstruction performance in these experiments.As for the compression of climatic conditions , ,   , 83% data are used for PCA and CAE model training, and the rest of the data are used for model testing, where PCA and CAE have a similar performance.After completing the ROM, the original JULES-INFERNO data are compressed into specific latent spaces, considerably reducing the number of parameters in the predictive model.The concatenated wildfire burnt area field and the four climate variables act as inputs to the surrogate model.Thus, the surrogate model achieves the same inputs as JULES-INFERNO.The predictive model (LSTM) trains separately using encoded data from different ROMs.Similar to the ROM,  1 ,  2 and  3 are used as the training set while  4 ,  5 are used as the test set to evaluate the model performance.Each LSTM model has been trained 10000 epochs.As explained in section 3.2, iterative predictions are used for the proposed digital twin.In other words,

Table 3
shows that the predictions obtained using PCA down to 20 and 100 dimensions and CAE reduced to 20 dimensions are similar, with an MSE close to 2.7E-4.Same as ROM, the best performance in LSTM prediction is obtained by applying CAE with 100-dimensional latent space, with an MSE loss of approximately 2.5E-4.

Table 4 Time efficiency comparison
In this study, the training of CAE and LSTM is performed with one NVIDIA P100 GPU (16 Gb VRAM) of Google Colab environment.All the online computation, including decoding, prediction and assimilation, is performed with Google Colab Intel CPUs as shown in Table4.It is important to highlight that the application of such digital twin doesn't require any HPC