Multi-Tempo Forecasting of Soil Temperature Data; Application over Quebec, Canada

: The profound impact of soil temperature ( T S ) on crucial environmental processes, including water inﬁltration, subsurface movement, plant growth


Introduction
Soil temperature (T S ) is one of the essential environmental factors used in various disciplines such as engineering and hydraulic structures design, geotechnics, hydrological and meteorological studies, agriculture, forestry, and many other fields. The majority of the chemical and physical characteristics of soil are temperature-dependent. For instance, fluctuations in T S directly affect water infiltration; thus, variations in T S and infiltration rates follow similar patterns [1][2][3]. Soil temperature may affect other processes, such as groundwater discharge, land-atmosphere fluxes such as evapotranspiration, gas exchange, and surface runoff patterns [4][5][6]. Fluxes of greenhouse gases such as carbon dioxide (CO 2 ) and nitrous oxide (N 2 O) directly depend on T S [7,8], which relates to global warming on a large scale. Any alterations to atmospheric constituents mutually affect the soil temperature. Because the thermal energy balance between the soil and atmosphere is preserved, the soil can either act as a heat source or sink [9][10][11][12]. Similarly, the impact of T S on the agriculture section cannot be ignored, e.g., the dependency of crops' growth on T S , irrigation planning, and hydrological drought monitoring [13,14].
In a recent study by Zeynoddin et al. [37], a comprehensive exploration of techniques for predicting time series data was undertaken, encompassing variable-based models, AI modeling approaches, and linear methods. They proposed a linear methodology that preprocesses T S data using several methods and models the data dynamically via stochastic models. Comparative analysis with ML techniques [19] revealed that combining the SS method with the stochastic model yielded significantly more precise results. Building upon these findings, Zeynoddin and Bonakdari [38] introduced a structurally optimized deep learning method that integrated the SS method for long-term soil moisture forecasting. Remarkably, their study demonstrated that the SS method excelled at regenerating long-term time series features and substantially enhanced forecast accuracy compared to other methods. Moreover, their solution effectively mitigated the data size dependency problem often encountered in data-driven modeling approaches. While this proposed structure has limitations, such as its finite domain of model variables, exclusivity to seasonally structured data, and the complexity of the deep learning method, the authors hypothesized that the comprehensive and generalizable structure of the SS method for seasonal data modeling could be leveraged to fill the temporal gap in FLDAS datasets. These datasets, generated at a monthly resolution [39] with a two-month availability lag, as indicated on the GEE catalog website (Earth Engine Data Catalog: NASA_FLDAS_NOAH01_C_GL_M_V001#description), are influenced by the annual or monthly trends exhibited by the T S time series [10], which temporally and spatially vary due to atmospheric and subsoil processes affecting thermal equilibrium.
The objectives of this paper are (a) to introduce an extraction method for the collection of monthly mean T S data for two agricultural areas in Quebec, Canada and (b) to apply the powerful SS method for modeling. Following these are (c) a spatial validation analysis will be carried out to analyze the model's accuracy alternation and pattern evaluation in case a time series of different locations are used, and (d) a trend and long-term alternations analysis of the T S variable will be carried out based on the model results and forecasts. Finally, (e) the results of the proposed FLDAS-state-space method will be comprehensively discussed in the following sections. Furthermore, this paper will outline the practical application of these results, including their integration into future T S forecasts, thereby making a significant contribution to the field.

Time Series and Pre-Processing Review
Since this study uses a time series-based approach, a review of necessary methods for time series preparation is presented. Time series can have jumps, periods, and seasonal and non-seasonal trends deemed deterministic. They can additionally have a stochastic component, constituted of residuals of the deterministic characteristics subtracted from the time series' values. The deterministic component is the predictable part of the time series' pattern and can be detected and extracted. Detecting the mentioned patterns and analyzing T S data entails employing several tests. These tests not only help to detect meaningful patterns, however they can additionally help to extract statistics and interpret the time series. Following pre-processing methods, models are applied to prepare data for forecasting [40].
The first attribute of a time series to be checked is stationarity. It means the absence of meaningful changes in mean, variance, and other statistics over time. Therefore, the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test is applied to T S for assessing stationarity [41]. In the KPSS test, a regression equation is fitted to the data. The T S series is stationary if the variance of the independent variables of that equation is null. Alternatively, it is stationary in trend if the deterministic term of the trend is not null and stationary in level if the deterministic term of the trend is equal to zero. Seasonal and non-seasonal trends in the T S time series as one of the non-stationarity factors is checked by the Mann-Kendal and seasonal Mann-Kendal trend (MK and SMK) tests [42,43].
Jumps appear in time series as sudden increases or decreases and can be both induced by human and natural sources. To evaluate this factor, the non-parametric Mann-Whitney (MW) test can be performed [42]. More details about pre-processing steps can be found in Zeynoddin et al. [22]. The autocorrelation function (ACF) is a tool to evaluate a dataset's randomness or assess whether a non-linear or linear model would be appropriate. This way, it can be found whether an observation is related to adjacent observations or not. Additionally, specific patterns in time series, such as sinusoidal fluctuations interpreted as periodicity, can be detected. Therefore, we can judge the time series' condition and interpret it. An ACF plot is an intuitive method to assess the data more conveniently. The ACF demonstrates the autocorrelation of data separated by a lag distance. Both positive and negative ACF values are considered as the correlation between lags. Still, for several lags, if the positive values of an ACF consecutively exceed specific values, in this case, the 95% significance levels, they are interpreted as significantly correlated lags. These significant correlations should be reduced or removed from the time series to obtain independent variables and reduce the impacts of the deterministic components in the modeling procedure. As mentioned earlier, the deterministic components are the predictable part of the series and are responsible for dependency in the time series. This part of the time series' structure can be separated before modeling and restored to modeled data. The stochastic component is the mere important part for modeling purposes. The significance intervals are defined based on the cumulative distribution function of the standard normal distribution and the 95% significance level.

Famine Early Warning Network Land Data Assimilation System
The Famine Early Warning Network Land Data Assimilation System (FLDAS) program improves the quality of measured values by combining hydroclimatic data and observations to develop rainfall, soil moisture, evapotranspiration, and soil temperature products at a monthly resolution. These products are hosted by Google Earth Engine and can be merged with variables related to food security assessment in data-sparse countries. The program is developed by the U.S. Geological Survey (USGS), NASA GSFC, Goddard Space Flight Center (GSFC), and the University of California Santa Barbara (UCSB). It can provide helpful information to water resources and agricultural stakeholders. The output of the FLDAS can be used to compare current and past hydrological conditions and can be merged with other land information systems to produce an ensemble of hydroclimatic variables.
The FLDAS utilizes the Noah version 3.6.1 land surface model (LSM), the Climate Hazards Group Infrared Precipitation with Station (CHIRPS) 6-hourly rainfall, and the Modern-Era Retrospective Analysis for Research and Applications version 2 (MERRA-2). Moreover, the system uses the African Rainfall Estimation Algorithm Version 2 (RFE 2.0) and the CHIRPS for low latency. Using the Land Data Tool Kit (LD TK ), the FLDAS provides and preprocesses the land cover parameters needed for the Noah model. The LD TK contains the products of the NASA Soil Moisture Active Passive (SMAP), the Moderate Resolution Imaging Spectroradiometer (MODIS), the Advanced Very High-Resolution Radiometer (AVHRR) from NOAA 19 satellites, the CHIRPS quasi-global rainfall data set obtained from high-resolution satellite imagery, MERRA2, and the Global Data Assimilation System (GDAS), which uses multiple sources of information such as balloon, aircraft, radar, and satellite data. This tool kit allows the FLDAS to use rich sources of high-resolution datasets, with near real-time observations. In return, the data are fed to the Noah LSM to produce high-accuracy outputs.
All datasets are available from 1982 to the present at a 0.1 arc degree and a monthly temporal resolution. A flexibility in using different products represents the advantage of the FLDAS, products such as FEWS NET operational rainfall. Using multiple surface models, normal assessment, and a fast and appropriate data distribution represent the other advantages of this system. The FLDAS datasets can be obtained from the Google Earth Engine (GEE), which contains a vast catalog of satellite and geospatial datasets. In this application programming interface (API), both Python and JavaScript languages can be used to obtain and process the required data. Using the GEE API, various datasets for different fields of study can be accessed, such as global surface water changes, crop production, and drought monitoring [44][45][46]. Figure 1 shows the data collection and modeling procedural steps. Aside from the GEE API, many data transmission mechanisms are presented in this figure. In the GEE code editor, the following pseudocode and the corresponding commands in the Java language can be used to obtain the desired datasets (Appendix A Algorithm A1), or users can download the dataset from the developed application, SOILPARAM APP, by [47]. Engine (GEE), which contains a vast catalog of satellite and geospatial datasets. In this application programming interface (API), both Python and JavaScript languages can be used to obtain and process the required data. Using the GEE API, various datasets for different fields of study can be accessed, such as global surface water changes, crop production, and drought monitoring [44][45][46]. Figure 1 shows the data collection and modeling procedural steps. Aside from the GEE API, many data transmission mechanisms are presented in this figure. In the GEE code editor, the following pseudocode and the corresponding commands in the Java language can be used to obtain the desired datasets (Appendix A Algorithm A1), or users can download the dataset from the developed application, SOILPARAM APP, by [47].  According to the Köppen classification, Quebec City's climate is "humid continental". This means that summers are hot, and winters are cold and snowy. Fall and spring temperatures are, additionally, moderate. The atmospheric temperature in this region varies on average between −18 • C and 25 • C. Montreal has a "warm-summer humid continental climate", based on the same classification. This classification means summers are hot and humid, and winters are icy and snowy. The other two seasons are moderate. The air temperature ranges between −30 • C and 27 • C in some regions of the study areas [48]. Figure 2 shows the locations on a map.
To validate the results of the proposed methodology, the soil temperatures at two other locations were obtained, using the same method as the GEE API. These locations were selected based on the mapping of the land use of the St. Lawrence Lowlands report, published by Environment and Climate Change Canada in 2018. The Mauricie (MA) validation point is located between the first two study points (QC and MC), and the second validation point Lennoxville (LX) is located east of these points. These locations are two critical agricultural areas in Quebec, Canada. They are located between and southeast of the previous ones ( Figure 2). summers are hot and humid, and winters are icy and snowy. The other two seasons are moderate. The air temperature ranges between −30 °C and 27 °C in some regions of the study areas [48]. Figure 2 shows the locations on a map. To validate the results of the proposed methodology, the soil temperatures at two other locations were obtained, using the same method as the GEE API. These locations were selected based on the mapping of the land use of the St. Lawrence Lowlands report, published by Environment and Climate Change Canada in 2018. The Mauricie (MA) validation point is located between the first two study points (QC and MC), and the second validation point Lennoxville (LX) is located east of these points. These locations are two critical agricultural areas in Quebec, Canada. They are located between and southeast of the previous ones ( Figure 2). The time series plots for each location are displayed in Figures 3 and 4. The time series length was divided into training and testing intervals, with a ~75:25% ratio for modeling purposes. Hence, from a total of 445 months of data, 336 months are considered for model training and the rest, 109 months, for testing. In addition, two other sites were selected to obtain additional TS data for further validation of the models. These locations were chosen based on the 2018 land cover mapping of the St. Lawrence Lowlands, published by "Environnement et Changement Climatique Canada". The first site is in Mauricie (MA), with a latitude of 46.216° and longitude of −73.056°, and the second one is in Lennoxville (LX), with a latitude of 45.37° and longitude of −71.82°. The differences between the study sites and validation sites are illustrated in Figure 5. The size of the datasets of the validation sites is the same as that of the study sites.

State-Space Method
Smoothing methods are among the data processing approaches to stationarize time series and eliminate the deterministic components. Using specific algorithms, they can remove fluctuations and predictable patterns to enable predicting trends, seasonality, and future values. Among them are included moving average, simple exponential, linear exponential, seasonal exponential smoothing, and damped trend exponential smoothing [49][50][51].
The state-space (SS) method is a renowned, advanced, and robust triple exponential smoothing method used to remove trend, seasonality, and level in highly seasonal series and has additionally been used for forecasting data [52]. The SS method is divided into an additive form for cases where seasonality is roughly constant and a multiplicative form for series with seasonal trends. Each one has three smoothing parameters that need to be optimized using any optimization method, or trial and error. The multiplicative form of the method is as follows [53]: In the above equations, l t is level smoothing, b t is trend, and S t is seasonal components smoothing; A, β, and γ are additional smoothing parameters; Ω is seasonality, and m denotes forecasting time steps; SS t+m is the forecasted data by the state-space method for m future lags; and Tω + m assures the use of the latest historical data. Using real data to update forecasts, the model adapts itself to the new data, and the error is subsequently reduced. Therefore, the longer the training set is, the fewer errors will be generated. This feature can be considered a drawback too. In short time series, the model cannot adapt to data fluctuations, and therefore, accuracy may be reduced in multi-step forecasts. Compared to stochastic models, another advantage of this model is that it does not require normalization. The SS parameters can be obtained through either trial and error or optimization algorithms such as nonlinear generalized reduced gradient (GRG), which is applied to the three components of the SS method in this study.

Performance Metrics
The first 75% of the data is used in this study to calibrate the model parameters, and the last 25% is used for model validation by using the following indices: Coefficients of determination (R 2 ), root mean squared error (RMSE), and mean absolute error (MAE) are used to evaluate the model accuracy of the time series obtained from pre-processing of the original T S data. To assess the modeling power, the Nash-Sutcliffe model efficiency (E N-S ) is additionally used [54]. The closer E N-S is to a value of one, the more powerful the model is.
where T S obs,i , T S P,i and T S are the ith value of observed data, predicted soil temperature, and mean of data, respectively; n is the number of months.

T S Time Series
In order to retrieve the soil temperature data from the FLDAS, a set of commands was coded using the GEE API. The extracted time series were tested for stationarity, and the results are presented in Table 1 for Quebec City and Montreal, respectively. Since the focus was on the study points, and the validation points followed the same pattern, the results of these points are not presented. The SMK test results show significant seasonal trends in both T S time series. For each soil layer (there are 3 layers with the following upper-lower depths of 0-10, 10-40, and 40-100 cm), the probabilities corresponding to the MK and MW test results are higher than the significance level of 5%. Thus, the presence of non-seasonal trends and jumps in the datasets are not significant. Figure 6 shows the ACF plots of all the series. The seasonal correlations can be seen as sinusoidal patterns, with peaks at lags 12, 24, 36, 48, 60, 72, and 84, equivalent to 7 seasonal lags with a repetition of 12 months. In these lags, the ACF values are higher than the 95% confidence interval, and therefore, these correlations are considered significant. They are found at all soil depths. Considering the significant periodicity that could be observed and the presence of a seasonal trend, the KPSS test failed to detect the impact of these non-stationary factors properly. All probability values for this test are higher than a significance level of 5%. It additionally indicates the importance of utilizing several tests for stationarity detection. In other words, by using the MK and SMK tests, the significance of the gradual changes over time was assessed. According to these results, these incremental changes were only significant seasonally. As the MW test demonstrated, no sudden changes were detected in the datasets.
On the other hand, the ACF plots showed intensive periodicity. Consequently, as the seasonal trend and periodicity are both prerequisites of the state-space multiplicative method, this method is deemed suitable for forecasting. On the other hand, if the series were trendless, then the additive method would have been appropriate.

State-Space Modeling Results
In this study, as mentioned in the methodology section, the state-space (SS) m is employed to forecast the TS of Montreal and Quebec City. This method require smoothing parameters to eliminate deterministic terms in the time series. Moreover values are needed to obtain the SS formula's level, trend, and seasonal components. fore, the first period of the TS time series was used to initiate the modeling, which case, was the first 12 months or the first seasonal lag. In addition, optional initial were chosen for the smoothing parameters α, β, and γ, and they were optimized GRG solving method to achieve the most accurate results.
Three types of modeling were conducted to examine the SS method's model pability. The first was dynamic modeling, where the model was updated after eac cast. Next, a 6-step-ahead forecast (fs6) was performed, followed by a 12-step-ahea cast (fs12) that covered the entire period. The results of the 1-step-ahead forecast (fs1 els are provided in Table 2. As seen, the SS model could forecast the test period coefficient of determination higher than 0.90 and with low and relatively close erro cept for the 40-100 cm depth in Quebec City. Additionally, a decreasing trend in curacy by increasing the depths was observed. Tables 3 and 4 present the modeling results for the fs6 and fs12 modeling me As the soil depth increased, a general decline in accuracy could be seen, except for th soil layer (10-40 cm) for both fs6 and fs12. At this depth, the accuracy was lower t the other two depths. According to the results, there are some variations in the mag of the errors with a maximum value of ~3 °C. For all depths, the average differen tween forecasts and observations was ~2 °C except for Montreal (10-40 cm). For th bec City modeled time series, all sudden drops in accuracy were observed deeper

State-Space Modeling Results
In this study, as mentioned in the methodology section, the state-space (SS) method is employed to forecast the T S of Montreal and Quebec City. This method requires three smoothing parameters to eliminate deterministic terms in the time series. Moreover, initial values are needed to obtain the SS formula's level, trend, and seasonal components. Therefore, the first period of the T S time series was used to initiate the modeling, which in this case, was the first 12 months or the first seasonal lag. In addition, optional initial values were chosen for the smoothing parameters α, β, and γ, and they were optimized by the GRG solving method to achieve the most accurate results.
Three types of modeling were conducted to examine the SS method's modeling capability. The first was dynamic modeling, where the model was updated after each forecast. Next, a 6-step-ahead forecast (fs6) was performed, followed by a 12-step-ahead forecast (fs12) that covered the entire period. The results of the 1-step-ahead forecast (fs1) models are provided in Table 2. As seen, the SS model could forecast the test period with a coefficient of determination higher than 0.90 and with low and relatively close errors, except for the 40-100 cm depth in Quebec City. Additionally, a decreasing trend in the accuracy by increasing the depths was observed.  Tables 3 and 4 present the modeling results for the fs6 and fs12 modeling methods. As the soil depth increased, a general decline in accuracy could be seen, except for the first soil layer (10-40 cm) for both fs6 and fs12. At this depth, the accuracy was lower than in the other two depths. According to the results, there are some variations in the magnitude of the errors with a maximum value of~3 • C. For all depths, the average difference between forecasts and observations was~2 • C except for Montreal (10-40 cm). For the Quebec City modeled time series, all sudden drops in accuracy were observed deeper in the soil profile (40-100 cm). A decrease in model accuracy was additionally observed in the Montreal model results. However, this reduction in accuracy of the first soil layer (10-40 cm) is more significant than that of the second (40-100 cm), in contrast to the Quebec City model results, which had a uniform trend in accuracy reduction. The Nash-Sutcliffe coefficient additionally demonstrates the power and efficiency of the generated model in forecasting shallow depths. This sudden decline is due to extreme changes in the level component of the SS model, and the model could not adequately adapt over that period. Furthermore, the parameters are not optimized properly. The T S datasets are highly periodic, and the SS seasonal parameter was expected to have a greater weight in the equation. However, on the contrary, for the middle soil layer in Montreal (10-40 cm), the involvement of the seasonal parameter in the model equation is less than the other 2 parameters.
Alongside assessing the models using indices such as R 2 , RMSE, etc., the statistical features of the forecasts should be evaluated. Appropriate models should additionally reproduce the statistical characteristics of the original data [55]. To investigate the reason for drops in accuracy and evaluate the statistical aspects, box plots of forecasts vs. observed data were drawn (Figure 7). Some outliers were observed in QCT S s (40-100 cm) and T S Ts (10-40 cm). These outliers were the reason for poor results at these depths and were created because of some extreme values in the level component during the training period. These extreme values were multiplied by the SS relationship's seasonal factor and seasonal coefficients. The interquartile areas and means of the T S data in the Montreal time series were almost perfectly reproduced. However, these features for the Quebec City time series fluctuated by a few degrees centigrade.  These figures show a good correlation between the forecasts and observed data, and most of the estimates are located within the 10% intervals of the scatter plots. For each time step, the dispersion in the forecasts of QC datasets increases with soil depth; however, the forecasts were more correlated for the fs12 predictions than in the other time steps. The forecasts were more dispersed for the fs6 forecasts of the QC 10-40 cm and 40-100 cm soil layers, as the fs6 forecast of the QC 10-40 cm soil layer had the most dispersion. The same pattern was observed for the MC forecast datasets, albeit the fs6 forecasts of the MC 10-40 cm soil layer had the most dispersion. According to the time series plots of the QC datasets, the SS method generally overestimated the highs and lows for all time steps of the 10-40 cm and 40-100 cm soil layers and underestimated the peaks of the 0-10 cm soil layer data. On the other hand, the peaks of the MC datasets were generally overestimated by the SS method, whereas the lows were underestimated. The middle soil layer (10-40 cm) of both locations was poorly forecasted, compared to the other 2 layers.
The magnitude of the outliers seen in the forecasts' box plots are less than a few degrees, and as seen in the time series plots (Figures 8 and 9), they can be neglected. Moreover, these outliers were seen in the QC second soil layer (40-100 cm) fs1 and fs6 and MC first soil layer (10-40 cm) fs6 while they were removed for the fs12 forecasts. As mentioned earlier, the reasons for the formation of outliers were changes in level components and smoothing parameters. However, while the model was updated with the new real data in fs12 forecasts, and the weight of the seasonal parameters increased, the level component was able to correct itself and dampen the errors; therefore, no outliers were produced. The R 2 additionally increased from 0.63 to 0.84. The self-correction feature of the SS model through time represents a positive feature; however, as it requires long time series, it can additionally be viewed as its weakness.     The magnitude of the outliers seen in the forecasts' box plots are less than a few degrees, and as seen in the time series plots (Figures 8 and 9), they can be neglected. Moreover, these outliers were seen in the QC second soil layer (40-100 cm) fs1 and fs6 and MC first soil layer (10-40 cm) fs6 while they were removed for the fs12 forecasts. As mentioned earlier, the reasons for the formation of outliers were changes in level components and smoothing parameters. However, while the model was updated with the new real data in fs12 forecasts, and the weight of the seasonal parameters increased, the level component was able to correct itself and dampen the errors; therefore, no outliers were produced. The R 2 additionally increased from 0.63 to 0.84. The self-correction feature of the SS model through time represents a positive feature; however, as it requires long time series, it can additionally be viewed as its weakness.
The SS method could not produce equally precise results for the last soil layer (40-100 cm). This performance is additionally valid for fs6 periods, when compared to the different soil layer depths and forecasts' steps. The overall performance is acceptable for temporal lag compensation and future state forecasts. This method can produce reliable results, compared to black box AI methods and even other linear methods such as stochastic models. Since it has a simple structure, it is one of the most generalizable methods. The SS method was used to model data from October 1982 to October 2019. Given the fact that the FLDAS data were updated to October 2020, the estimated parameters were used to model data from November 2019 to October 2020 and perform an out-of-sample forecast to estimate T S values from November 2020 to October 2021. The results are introduced in Figure 10. The shaded areas are the out-of-sample forecasts. As seen in this figure, the~2-3 • C error remained in the forecasts of the 2020 data. Considering this error, an uncertainty bound equal to 3 • C can be expected for the 2021 values. that the FLDAS data were updated to October 2020, the estimated parameters were used to model data from November 2019 to October 2020 and perform an out-of-sample forecast to estimate TS values from November 2020 to October 2021. The results are introduced in Figure 10. The shaded areas are the out-of-sample forecasts. As seen in this figure, the ~2-3 °C error remained in the forecasts of the 2020 data. Considering this error, an uncertainty bound equal to 3 °C can be expected for the 2021 values.

Discussion
It was mentioned that a significant seasonal trend exists in these time series. The monthly values of different years were separated from the series to exhibit this seasonal trend and assess its significance. The MK test was then applied again. The test results indicated a trend in each month of the TS time series, some of which are significant. In Quebec City, the months of August, September, October, and November and in Montreal, September, October, and November had a significant positive trend for different soil layer depths. Each location had a positive trend for seven months of the year. This means that the soil is generally getting warmer. Figure 11 presents the time series of the first soil layer (0-10 cm) for August, September, October, and November in Quebec City, along their linear trend lines, as examples of the progressive warming in the TS time series.

Discussion
It was mentioned that a significant seasonal trend exists in these time series. The monthly values of different years were separated from the series to exhibit this seasonal trend and assess its significance. The MK test was then applied again. The test results indicated a trend in each month of the T S time series, some of which are significant. In Quebec City, the months of August, September, October, and November and in Montreal, September, October, and November had a significant positive trend for different soil layer depths. Each location had a positive trend for seven months of the year. This means that the soil is generally getting warmer. Figure 11  Qian et al. [56] performed extensive research on the TS trend and its association with the trend in climatical variable changes in Canada. They investigated the TS trend in 30 stations across Canada, from 1958 to 2008. Their research showed a positive trend in TS in 60% of stations, significantly in spring and summer means. They additionally reported a positive correlation between TS trend and changes trend in snow cover thickness, air temperature, rain and snowfall, and total precipitation. The correlation between these parameters was additionally surveyed in other studies [10,57]. Later, Assani et al. [58] performed Qian et al. [56] performed extensive research on the T S trend and its association with the trend in climatical variable changes in Canada. They investigated the T S trend in 30 stations across Canada, from 1958 to 2008. Their research showed a positive trend in T S in 60% of stations, significantly in spring and summer means. They additionally reported a positive correlation between T S trend and changes trend in snow cover thickness, air temperature, rain and snowfall, and total precipitation. The correlation between these parameters was additionally surveyed in other studies [10,57]. Later, Assani et al. [58] performed a similar investigation on the maximum and minimum air temperature and rainfall, exclusively to Quebec province. They reported significant trends in these variables during summer, indicating that summer warming in southern Quebec is more pervasive than winter warming. Considering these findings, the progressive warming in soil temperature can be justified by reduced rainfall during the warm season and decreased soil moisture, especially in the mentioned months. Consequently, water exploitation typically increases during this season. The increase in air temperature and, at a large scale, the impact of global warming have caused changes in the overall trend of soil temperature. Some researchers even reported a stark trend in soil temperature compared to air temperature [59,60], which means the trend in the former is more noticeable than that in the latter. Monitoring and controlling soil moisture can be a short-term solution to controlling the T S upward trend. However, unmanaged water exploitation can significantly reduce groundwater and decrease soil moisture. This factor is essential as it directly impacts the soil's thermal conductivity and can lower it to one-eighth [61]. Moreover, the heat-retaining feature of soil is negatively affected since the specific heat of pure water (4.2 J/g • C) is required compared to that of clay minerals (0.76 J/g • C), a dominant component of the soil [10]. To understand the magnitude of the impact, it should be mentioned that an increase of 1 • C of 1 gram of dry clay soil needs 1.1 J whereas 1 gram of wet clay soil needs 2.21 J (average) of energy under laboratory conditions with 25% of water content and 1300 kg/m 3 density. However, this value varies for different conditions and depends on soil texture, water content, and soil density [62].
Soil moisture and temperature are the two essential parameters that affect soil respiration. The relationship between these parameters and respiration can be estimated using several models, e.g., quadratic and exponential models. As mentioned earlier, these parameters have a strong correlation. Several studies have been carried out to distinguish them from one another. In this case, the estimation of soil moisture given the available soil temperature at multiple or limited depths has been debated by many scholars, and several methods have been proposed in this regard [63][64][65][66]. These parameters are essential in the long-term prediction of drought and food monitoring [39,67,68]. The other applications include irrigation planning, water resources management, flow forecasting, and hydrological land-atmospheric studies.
Higher temperatures are linked to climate change [69]. Crop water demand and evapotranspiration increase as the temperature rises. Consequently, soil moisture decreases, and soil droughts result from this phenomenon [70]. As the T S level increases over time at both Quebec City and Montreal, soil moisture reduction disrupts irrigation scheduling in terms of amount and time intervals. More water pumping is needed to compensate for the reduced soil moisture, and the additional supply of water leads to more energy consumption [71]. However, the interactions between crop water requirement, soil temperature and moisture, evapotranspiration, and other hydro-climatic factors are complex. Access to the parameter records can additionally be limited. Thus, additional studies about these factors, their interactions, and the use of new data collection methods are required. Methods such as the FLDAS, development of land surface models, integration of deep learning methods in the processing and simulating of different hydro-climatic factors, as well as access to satellite data (i.e., data-driven Earth system science) are needed [72][73][74].

Validation of the SS Method
Access to high-quality field data has always been problematic, due to financial and operational resource constraints. Therefore, remotely sensed data and model-based simulations have been used for the past couple of decades. On the other hand, evaluating and validating these methods have been equally crucial for scholars and users. Hence, different validation studies are always carried out. In our case, the Noah LSM, the central part of the FLDAS, has been the center of attention for scholarly work with field-data alternatives. For instance, Xia et al. [75] validated Noah soil temperature products by comparing them with field datasets from 137 stations in the United States over different time scales (e.g., daily, monthly, and annually) and different soil layer depths, from 0 to 200 cm. These authors declared that model products generally emulated observational field data. Although they found some bias for greater soil layer depths and a decreased accuracy of model results, they stated that Noah modeling "Kills" is acceptable. Other articles additionally validated the Noah products for different soil conditions and locations [76,77]. The FLDAS LD TK inputs are additionally monitored and evaluated by The National Aeronautics and Space Administration (NASA) and many other scholars. Therefore, in this paper, we concentrated on validating the results of our model at different locations. Following the assessments of the obtained time series characteristics for these two locations, the proposed methodology modeled the datasets. The results are provided in Tables 5 and 6 for the dynamic and 12-step-ahead forecasts. The same prediction pattern as QC and MC is seen in these tables. The reduction in accuracy with an increase in soil layer depth is obvious. An intercomparison of the models reveals that the smoothing parameters related to trend (β) and seasonal factor (γ) are decreased with an increase in the soil layer, as they are almost equal to 0 or very small for the 10-40 cm and 40-100 cm layers. For instance, in Tables 3 and 4, the seasonal parameter for soil layer 10-40 cm is equal to 0.008, and the trend factor is almost equal to 0.014 for the MC datasets. Nevertheless, after raising the seasonal parameter to 0.907 and increasing its involvement in the modeling equation, the correlation of the models was enhanced. In the QC results, this pattern can be seen for both β and γ parameters; however, it is not as obvious. After modeling the datasets of the validation points, the same problem is seen in the models. This means that these important factors are not adequately involved in the modeling procedure, which can be due to the nature of the data or the disability of the optimization method in finding the appropriate parameters. The reason for this problem can be investigated by using other datasets of a different nature (e.g., soil moisture) and/or changing the optimization method and using metaheuristic optimization methods to find the best-suited parameters (e.g., utilizing a genetic algorithm, particle swarm, or teacher-learner optimization methods) [38]. In the end, as with previous datasets, the future forecasts based on the trained model parameters were conducted, and the results are displayed in Figure 12. Shaded areas in this figure are the out-of-sample forecasts (12-step-ahead (fs12)) from 1 November 2020 to 1 October 2021. the best-suited parameters (e.g., utilizing a genetic algorithm, particle swarm, or te learner optimization methods) [38]. In the end, as with previous datasets, the futur casts based on the trained model parameters were conducted, and the results a played in Figure 12. Shaded areas in this figure are the out-of-sample forecasts (1 ahead (fs12)) from 1 November 2020 to 1 October 2021.

Conclusions
Soil temperature (TS) is crucial in environmental processes. However, obtaining quality field data is challenging due to resource constraints. Remotely sensed da

Conclusions
Soil temperature (T S ) is crucial in environmental processes. However, obtaining highquality field data is challenging due to resource constraints. Remotely sensed data and model-based simulations have been used as alternatives; however, they have limitations such as missing data and temporal lags. The FLDAS project addresses this issue by providing reliable data collection for regions without soil temperature measurements, although it still has a temporal lag. This study focused on T S at two locations (QC and MC) across different soil layers. An SS method was developed to model the time series, and the proposed techniques were validated through 1-, 6-, and 12-step-ahead forecasts of T S data. The results showed that the proposed SS model could fill in the gap with an average accuracy of R 2 of 0.88 and RMSE of 2.073 • C for the dynamic model, R 2 of 0.834 and RMSE of 2.979 • C for 6-steps-ahead, and R 2 of 0.921 and RMSE of 1.865 • C for different depths at the QC and MC locations, respectively. The SS method can adjust its parameters as the model progresses, resulting in increased accuracy with more forecasting steps. However, the model's level component is sensitive to extremes, and the SS seasonal parameter is not adequately incorporated, leading to over-and underestimations. Future studies should explore alternative optimization methods and different types of time series to address this issue. Additionally, a trend assessment revealed a significant warming trend in the datasets of both locations. It emphasizes the need for a close monitoring of T S in these areas and consideration of influencing factors such as groundwater level, surface soil moisture, air temperature, and evapotranspiration. The proposed methodology does not require data preprocessing, as the SS model performs multiple preprocessing steps simultaneously, and it can be applied to any study location using the FLDAS. However, a major drawback is its reliance on historical data, and the FLDAS only provides monthly datasets starting from 1982. It is recommended to additionally model and compare the FLDAS data with other linear and machine learning models.