Knowledge-guided machine learning for improving daily soil temperature prediction across the United States

Data-driven models used for predicting soil temperature usually have increasing errors with increasing depth. By exploring the integration of knowledge-based and machine learning approaches, this study used a novel transformation of meteorological variables to increase prediction accuracy of soil temperature with increasing soil depth. Using datasets for two soil textures (silty clay loam and loamy coarse sand) at two locations with different climates, predictive models were developed for five depths (5, 10, 20, 50, and 100 cm) as a function of meteorological features using an adaptive neuro-fuzzy inference system (ANFIS). For each depth, soil temperature was predicted with nontransformation (NT), autocorrelation (AC), moving average (MA), and a combination of transformations (NT-ACMA) of meteorological features. Across all depths, the predictive accuracy of NT-ACMA models was significantly higher than that of NT, AC, and MA models for both soil textures ( R 2 = .99, RMSE = 1 ˚C for silty clay loam; R 2 = .99, RMSE = 1.2 ˚C for loamy coarse sand), with increasing prediction accuracy as soil depth increases. Results for different soil textures and climates in 11 locations across the contiguous United States show that, except for 100-cm depth, there seems to be significant positive linear relationships between the best moving average and the solar inclination. This makes our NT-ACMA technique transferable to any location in the contiguous United States irrespective of the location, climate, and soil texture. We conclude that integrating lag times and moving averages of meteorological features can lead to better prediction of soil temperature at different soil depths.


INTRODUCTION
There is a growing need to accurately predict soil temperature as a function of depth, location, and time due to its vital role in agriculture (Hatfield & Prueger, 2015;Talaee, 2014), geology (Singh et al., 2017), geothermal applications (Yener et al., 2017;Zajch & Gough, 2021), as well as in geotechnical and geoenvironmental engineering (Attah & Etim, 2020). In agriculture, soil temperature is important for estimating belowground ecosystem processes such as plant germination, root growth and respiration, decomposition, and nutrient transformation and uptake by crop roots (Pathak, 2011;Stottlemyer, et al., 2001;Sun et al., 2018;Waring & Running, 2007;Yi et al., 2008). In geology, soil temperature affects the decomposition of soil organic carbon as well as the carbon protection capacity and mechanisms of various clay soils (Hao et al., 2016;Singh et al., 2017). In geothermal and geoenvironmental engineering, prediction of soil temperature is important for determining the engineering performance of soils and for power generation and heat exchanger applications (Attah & Etim, 2020;Chandrasekharam & Bundschuh, 2008;Ozgener et al., 2013).
Soil temperature is a function of exchange processes that occur primarily through the soil surface (Ghahreman et al., 2010). Heat exchange between the atmosphere and the ground surface is often a dominant driving mechanism for mesoscale circulations (Alvalá et al., 2002). Owing to the low thermal conductivity of ground, heat can be released to the atmosphere from soil throughout the cooling season. Heat can also be absorbed by soil from the atmosphere during the heating season (Bilgili, 2010). With climate change and the potential for more extreme temperature events, expected warmer temperatures will affect plant productivity (Hatfield & Prueger, 2015). A study by Kukal and Irmak (2018) showed that climate trends, regardless of their statistical significance, can have substantial implications on agricultural crop production by affecting soil temperature, planting dates, and yields. Their analyses showed that temperature alone was able to explain maize (Zea mays L.) yield variability in northwest and southeast Nebraska as well as southern counties in Iowa. It also explained sorghum [Sorghum bicolor (L.) Moench] yield variation in several counties in Texas, except a few counties in the southeastern part.
Measured soil temperature are important for understanding land-atmosphere interactions, regional eco-environmental conditions, and climate change (Wu et al., 2013). However, direct measurement of soil temperature (using thermometers, thermocouples, or thermistors) are usually not available at most weather stations. When available, they are often limited both spatially and temporally. Since soil temperature responds to the net effect of daily surface energy balance (Waring & Running, 2007), indirect spatial and temporal estimation of soil temperature with mathematical equations and models using related meteorological variables easily measured in situ or at weather stations as inputs (Tabari et al., 2011), as well as remotely sensed variables (Huang et al., 2020;Xu et al., 2020), have been the focus of most environmental studies as an alternative to direct measurement. Besides the ease of measurement of meteorological variables that are related to soil temperature (e.g., air temperature, solar radiation, and relative

Core Ideas
• Knowledge-based ANFIS models were developed to better predict daily soil temperature. • Models were developed for five soil depths using four feature transformation methods. • Prediction errors reduced with depth for both moving average (MA) and combination of transformations (NT-ACMA) models. • Models can be used in the United States based on latitude irrespective of the climate and soil texture.
humidity) and are usually available at most weather stations, a major advantage of indirect methods over direct methods is the spatial extent that remotely sensed variables give, thus making them more useful in agriculture and earth sciences. Spatial interpolation techniques such as kriging, inverse distance weighting, and smoothing spline have been used for predicting soil temperature at unsampled sites using a set of observed samples (Burrough, 1986;Webster & Oliver, 1992). Huang et al. (2020) estimated soil temperature at different depths (0-40 cm) using remotely sensed data such as Moderate Resolution Imaging Spectroradiometer (MODIS) Aqua/Terra land surface temperature and normalized difference vegetation index together with solar declination in univariate and multivariate linear regression models. Xu et al. (2020) also estimated surface soil temperature using combined satellite observations and in situ measurements with an enterprise algorithm. Although several analytical and numerical solutions to heat transfer problems are also available, in many practical cases, the materials, geometry, and the special boundary conditions are too complex (Kapjor et al., 2014). In addition, precise in situ measurements of soil properties such as thermal conductivity and bulk density are challenging because composition and density change with depth, and in cultivated soils, significant changes often occur near the soil surface. However, the complexity in determining soil properties is not necessary for data-driven models since the soil properties may not be accurate.

DATA-DRIVEN APPROACH
Several studies have developed data-driven models based on neural networks for predicting soil temperature at different depths using observed meteorological features (Bilgili, 2011;George, 2001;Li et al., 2020;Ozturk et al., 2011;Samadianfard, Asadi, et al., 2018;Samadianfard, Ghorbani, et al., 2018;Shamshirband et al., 2020;Sihag et al., 2020;Talaee, 2014). Li et al. (2020) predicted soil temperature at depths of Vadose Zone Journal 5, 10, and 15 cm over 6, 12, and 24 h by connecting multiple channels in a long short-term-memory model and an autoregressive integrated moving average model. Shamshirband et al. (2020) predicted daily soil temperature at different depths with support vector machines and multilayer perceptron (MLP) using air temperature, wind speed, relative humidity, and sunshine hours. Samadianfard, Ghorbani, et al. (2018) predicted monthly soil temperature at multiple depths with a hybrid of artificial neural network (ANN) and firefly optimizer algorithm using monthly mean air temperature, atmospheric pressure, and solar radiation as input variables, as well as month number. Talaee (2014) selected mean, minimum, and maximum air temperatures, relative humidity, sunshine hours, and solar radiation as features. Although Bilgili (2011) estimated monthly mean soil temperature at soil depths of 5, 10, 20, 50, and 100 cm based on previous monthly mean meteorological variables, Ozturk et al. (2011) used both monthly mean meteorological and geographical data. For weekly soil temperature predictions, George (2001) used air temperature, relative humidity, and wind speed.

ANNs and fuzzy logic
Data-driven techniques such as ANNs have been widely used for soil temperature estimation at multiple depths (George, 2001;Kisi et al., 2015;Mazou et al., 2012;Mihalakakou, 2002;Ozturk et al., 2011;Samadianfard, Asadi, et al., 2018;Sihag et al., 2020;Tabari et al., 2011). Artificial neural networks are mathematical models consisting of a network of computation nodes called neurons with established connections between them (Sattari et al., 2017). Compared with conventional deterministic approaches, ANNs do not require any a priori assumptions about the relationships between features and outputs as well as the functions to be used (Wu et al., 2013). Sihag et al. (2020) and Tabari et al. (2011) predicted daily soil temperature at different soil depths in Iran using a MLP model of the ANN and other data-driven methods. With mean daily meteorological parameters, including air temperature, solar radiation, relative humidity, and precipitation as features for ANN and regression models, Tabari et al. (2011) showed that the ANN method forecasts were superior to the corresponding values obtained by the regression models. Kisi et al. (2015) used different neural networks such as MLP, radial basis neural networks (RBNN), and generalized regression neural networks (GRNN) to predict soil temperature in Turkey using air temperature, wind speed, solar radiation, and relative humidity. Mazou et al. (2012) used dynamic neural networks, which are recurrent neural networks with feedback loops that include time-delay elements. The data used for the neural network's training, validation and testing were hourly values for the period 2002-2005 obtained from a weather station in Athens, Greece. Bilgili (2010) used linear and nonlin-F I G U R E 1 Adaptive neuro-fuzzy inference system (ANFIS) structure with two features, one output, and two rules (Azamathulla & Ab Ghani, 2011). x and y = two typical input values fed at the two input nodes; w i = output from a node; w i -bar = average of outputs from a node; M i and N i =fuzzy sets associated with nodes x and y; f i = firing strength; f = final network output ear regressions as well as ANN methods to model monthly soil temperature in Adana, Turkey. The results showed that monthly soil temperatures can be successfully estimated from atmospheric temperature, atmospheric pressure, and global solar radiation. In addition, ANN models were found to perform better than regression models. Wu et al. (2013) estimated monthly mean soil temperature at a depth of 10 cm using ANNs over a large complex terrain. The results indicated that ANN is an effective method for predicting soil temperature. Fuzzy logic is an alternative technique to ANN that is capable of generating models that integrate expert knowledge and available measurements for a system by using a set of easily comprehensible rules in the form of a fuzzy inference system (FIS) (Zadeh, 1965). A FIS is a nonlinear mapping of a given features vector to an output using fuzzy logic based on a set of membership functions and rules. Better performance of predictive models can be achieved by integrating fuzzy systems and the ANN approach to deal with large and imprecisely defined complex systems (Sattari et al., 2017).

ANFIS
An adaptive neuro-fuzzy inference system (ANFIS) is one of the most successful methods, which combines the benefits of fuzzy logic and ANN into a single model that will correctly associate features with outputs (Jang, 1993;Naderloo et al., 2012;Rudnick et al., 2015). Five separate layers are used to describe an ANFIS model structure ( Figure 1). The first layer is the fuzzification layer; the second layer is the rule base layer; the third layer is for the normalization of membership functions; the fourth and fifth layers are the defuzzification and summation layers, respectively (Jang, 1993). Construction of a data-driven ANFIS model generally requires division of the features and output data into rule patches (Guillaume, 2001). A number of methods such as fuzzy c-means (FCM) (Bezdek, 1981;Dunn, 1973), subtractive clustering (Chiu, 1994;Yager & Filev, 1994), and grid partitioning (Giotis & Giannakoglou, 1998;Simon, 1991) have been used to get membership functions in the creation of FIS. These methods allow the classification of the features into groups with each group having similar properties that help to discern the correlation between the data, thus simplifying the prediction process (Benmouiza & Cheknane, 2018). For each method, two different FIS models (Mamdanitype FIS and Sugeno-type FIS) have been developed (Nayak et al., 2013). A drawback of ANFIS is that the number of rules increases rapidly as the number of features increases, thus increasing both computer processing load and memory requirements. It should be noted that effective partitioning of the feature space can decrease the number of rules and thus increase the speed in both learning and application phases (Abdulshahed et al., 2015).
Few studies have used ANFIS in developing models for predicting soil temperature at different depths. Fathololoumi et al. (2020) compared ANFIS with other machine learning (ML) techniques at 5-and 10-cm depths in Iran. Behnia et al. (2017) and Talaee (2014) integrated a fuzzy system and ANN using a neuro-fuzzy inference system to estimate soil temperature at six depths of 5, 10, 20, 30, 50, and 100 cm using meteorological data from stations located in the arid and semiarid regions of Iran. Overall, the results showed a high performance of the ANFIS model in estimating soil temperature in arid and semiarid regions. Sattari et al. (2017) used a subtractive clustering approach to identify the structure of the ANFIS to estimate daily soil temperature in semiarid conditions in central Iran.

2.3
The quest for higher accuracy with depth Although ANFIS is more robust than ANN, results of the aforementioned ANFIS studies by Behnia et al. (2017), Sattari et al. (2017), and Talaee (2014) showed that daily estimation error (e.g., RMSE) tended to increase with depth, with average prediction errors ranging from 1.7˚C at 5 cm to 2.5˚C at 100 cm. In fact, most recent studies that developed models for predicting daily or subdaily soil temperature (from top soil to a depth of about 100 cm) based on advanced ML algorithms have generally reported increasing prediction errors as soil depth increased (Alizamir et al., 2020; Feng et al., 2019; Moazenzadeh & Mohammadi, 2019;Nahvi et al., 2016). This is often because using a purely data-driven approach to model complex physical processes of heat transfer in deeper soils can be problematic without incorporating scientific principles. Thus, to address this limitation requires shifting focus from developing advanced and more complex ML algorithms to developing simple knowledge-based feature engineering and/or transformation techniques. This is particularly important in soil temperature prediction because an ML model that is based on explainable heat transfer theory stands a better chance at achieving more generalizable performance irrespective of variations in space (climatic region, latitude, or soil depth), time (daily, monthly, or annual) and soil properties (soil texture or bulk density).
For any soil texture in any climate, observed soil temperature at different depths often shows that the amplitude of daily or seasonal harmonic variation of soil temperature decreases with depth, whereas the lag time with respect to the ambient air temperature increases with depth (Tsilingiridis & Papakostas, 2014). This is due to the thermal diffusivity of the soil, which is a parameter that controls the rate at which temperature change is transferred within the soil. Although most heat transfer solutions are based on the assumptions of soil thermal properties that are constant or varying with depth, these are mere approximations, since the theory of heat transfer in unsaturated soils is complex and has a lot of ambiguity due to the multicomponent nature of soil, comprising of soil grains, air, and water (Moradi et al., 2016). Moreover, more complexity can be added as a result of significant changes that often occur near the soil surface in cultivated soils due to tillage operations and changing canopy cover during growing seasons.
Since most process-based analytical heat transfer solutions treat the daily change of temperature as a smooth sinusoidal oscillation of air temperature at the surface about a mean value for the sake of simplicity, this made it possible for our study to examine the fusion of both lag times and moving averages (which are filters with smoothing effect) via feature engineering in a data-driven model. Thus, we hypothesized that a data-driven approach based on knowledge-guided ML, which incorporates both lag times and moving averages of important meteorological features (predictors), can help develop better generalizable models for predicting soil temperature, especially in deeper soil layers. The specific objectives of this study were (a) to develop neuro-fuzzy models to predict daily soil temperature at five depths (5, 10, 20, 50, and 100 cm) in two different soil textures and climates as a function of meteorological features, (b) to evaluate the performance of two clustering methods (FCM and subtractive) for predicting daily soil temperature at the five depths, and (c) to discuss the potential for using knowledge-guided ML approach for predicting soil temperature at similar depths across the contiguous United States irrespective of the soil textures and climates. The collected meteorological variables were maximum, minimum, and mean air temperatures (Max T, Min T, and Mean T), relative humidity (Rel Hum), solar radiation (Sol Rad), precipitation, and soil temperature (Soil T). At these two weather stations, five Stevens HydraProbe Analog 2.5V (Stevens Water Monitoring Systems) sensors measured Soil T at depths of 5, 10, 20, 50, and 100 cm. The sensors have accuracy and precision of ±0.3˚C (from −30 to 60˚C). Although the USDA-NRCS conducted data quality assessments of the meteorological records before delivering it to users, these records were checked for missing values and inconsistencies for this study. For single missing data (for any meteorological variable and Soil T), an average of the data for the previous day and the following day was used. Short-term missing Soil T data at any depth were replaced using a moving average of Mean T, provided other meteorological variables were available on those days. Short-term missing Soil T data at any depth were not replaced when other meteorological variables were also missing, and those missing days were removed completely from the dataset. Lastly, long-term missing data for any variable were not replaced and the days with missing data were removed from the dataset. For Lancaster station, 1.1% of the data was missing, and 0.05% of missing data was gapfilled. For Polk station, 11.0% of the data was missing, and 0.05% of the missing data was gap-filled.

Feature selection and transformation
A major characteristic of the daily time series of some meteorological variables (e.g., air temperature) and soil temperature is that measurements at time t (in days) are highly dependent on and correlated to the values from previous days. This is also true for the correlations between the daily time series of some meteorological variables and their moving averages (i.e., with increasing periods from short-term to long-term moving averages). For instance, daily air temperature (Max T, Min T, and Mean T) and Soil T time series usually have higher autocorrelations when compared with Sol Rad and Rel Hum. However, this autocorrelation is not observed in daily precipitation time series. This is because daily precipitation time series often looks like spikes (caused by rainfall events) separated by dry spells. Although there may exist a shortterm (1-d lag) autocorrelation (Serinaldi & Kilsby, 2014 ), daily precipitation values are similar to random noise and temporal autocorrelations are assumed negligible (Abbott et al., 2016;Buishand, 1977;Foufoula-Georgiou & Lettenmaier, 1987). Figure 2 shows the correlograms of some of the features (input variables) and Soil T (output variables) at Lancaster weather station. When there is no lag, a correlogram has a maximum value of unity r 0 = 1 (i.e., a time series is perfectly correlated with itself), and it decreases with an increase in lag k and eventually approaches zero. Precipitation was not selected as a feature for model development in this study because (a) it was similar to random noise, and (b) the autocorrelations were assumed negligible since most of the autocorrelation coefficients fell between the lower and upper confidence limits (blue horizontal lines above and below r 0 = 0) in the correlogram and 99% of the lag times for 100 d were <0.1 (Figure 2).
The existence of autocorrelation in most meteorological features allows for developing autoregressive ML models (Equation 1) for predicting a time series ( ) given n past values of ( ). It is also important to note that measurements of some meteorological variables at time t (in days) are often highly dependent on and correlated to the values of other meteorological variables from previous days. This allows for the development of regressive ML models (Equation 2) for predicting a time series ( ) given n past values of ( ) and n past values of another (or multiple) highly correlated time series ( ). Furthermore, regressive ML models can be developed to predict a time series ( ) given n past values of ( ) using Equation 3 if the time series ( ) is not available. (1) where (  Since Soil T data are usually not available at most weather stations, our study selected Equation 3 as the basis for ANFIS model development. Although the lag times for deeper soils can be several days, instead of using many time series (with different lag times) of multiple meteorological features as inputs for predicting Soil T at any depth, we used the following steps in this study.
Step 1: Determination of the correlation of Soil T with all meteorological features with high autocorrelation (features with low or no autocorrelation were removed at this stage).
Step 2: Selection of the feature which had the highest correlation [ ( ) highest ] with Soil T time series at every soil depth.
Step 3: Calculation of the best lag times between Soil T series (at every depth) and ( ) highest by increasing the lag time by daily increments until the maximum correlation is reached at lag max (i.e., "best lag time"), which is usually the global maximum.
Step 4: Shifting of feature ( ) highest and other autocorrelated features by lag max (values are different for different soil depths).
Based on these four steps, Soil T time series at any soil depth can be written as a function of the "best past series" of autocorrelated meteorological features as in Equation 4.
(4) where Soil ( ) is the Soil T time series at a particular depth, are the meteorological features, and lag max is the best lag time for that depth.
In addition to autocorrelation, simple moving average was also used as a technique for smoothing out Soil T time series by filtering out the "noise" from random daily fluctuations. Based on different time periods (from short-term to longterm moving averages), the best moving average periods for the meteorological features were also selected based on steps similar to the four steps mentioned above but using moving averages instead of lag times. This implies that Soil T time series at any soil depth can also be written as a function of the "best moving average periods" of autocorrelated meteorological features as in Equation 5.
(5) where 1 (MA max ) is the moving average series of meteorological feature 1 based on the moving average period of feature ( ) highest which has the highest correlation with Soil T time series at a particular depth.  5 were used, respectively. For NT-ACMA models, the features on the right hand side of Equation 4 and Equation 5 were combined with the features in NT models. That is, the best lag times and the best moving averages of the features were combined with the nontransformed (1-d lag) features. A major challenge in training ML models is knowing how long to train them. Using too little training would lead to models that underfit the training and the testing datasets, whereas too much training would produce models that overfit the training dataset and have poor performance on the testing dataset. To avoid overfitting, we tried 5, 10, 20, 50, and 500 epochs to determine the maximum epoch above which model performance did not increase significantly. To obtain an adequate number of fuzzy rules for the ANFIS models, both subtractive clustering (with range of influence = 0.4) and FCM clustering (number of clusters = 5) were applied using MATLAB (MathWorks, 2019) with Gaussian membership function type. The magnitudes of the other internal tuning parameters of each FCM model were partition matrix exponent = 2 and maximum iteration number = 100. The magnitudes of the other internal tuning parameters of each subtractive clustering model were scale factor = auto, squash factor = 1.25, acceptance ratio = 0.5, and rejection ratio = 0.15. For all models developed using both FCM and subtractive clustering models, training error goal = 0, initial step size = 0.01, and step-size increase rate = 0.001. To develop the ANFIS models, the datasets for the two weather stations (n = 2717 for Lancaster, n = 5255 for Polk) were randomly divided into training dataset (90% of the total data) and testing dataset (10% of the total data) for 10-fold cross-validation. In 10-fold cross-validation, nine folds were used for training and the last fold was used for validation. This process was repeated 10 times, leaving one different fold for validation each time.

Model performance evaluation
For the results to be valid, the performance of each feature transformation method (NT, AC, MA, or NT-ACMA) were averaged on five trials for each soil depth and clustering method. The coefficient of determination (R 2 ) and RMSE statistics were used for comparing the performance of the different modeling methods (Equations 6 and 7, respectively). The RMSE was chosen for evaluating model performance because it penalizes large errors more aggressively than small ones. More importantly, it was chosen since it is widely used in most ML literature on soil temperature prediction (Alizamir et al., 2020;Feng et al., 2019;Sihag et al., 2020), thus making it possible to compare our results with similar studies.
where is the observed daily soil temperature (˚C),̂is the predicted daily soil temperature (˚C),̄is the mean of the observed daily soil temperature (˚C), and n is the total number of daily soil temperature values considered.

Statistical significance testing
Since this study compared four feature transformation methods (NT, AC, MA, and NT-ACMA) on a single domain for two locations (Lancaster and Polk), paired t tests were conducted to determine if the RMSE were significantly different. This also helped in understanding the degree to which the RMSE results represented the general behavior of the clustering and feature transformation methods. A summary of model evaluation and the description of the paired t test can be found in Japkowicz and Shah (2011). As stated in Section 3.3, five trials were run for each soil depth, clustering method, and feature transformation method. The overall performance of any feature transformation method was based on the average across the five soil depths.

Feature transformation across the contiguous United States
This study also examined a simple approach for applying the overall best feature transformation method at 11 SCAN weather stations across the contiguous United States (Figure 3a). Figure 3b shows the soil textural triangle with circles indicating soil classes for all stations and depths. Although the circles are placed in their respective soil classes, they are not arranged based on the exact percentage compositions of clay, silt, and sand in order to avoid overlapping of circles.  The Köppen climate classification, which is a widely used vegetation-based empirical climate classification system, was used in selecting the extra nine stations so that different climate zones and soil textures were represented. Table 1 shows the elevation, climate class, and soil texture per depth.
In addition to climatic and soil textural variations, station selection was also based on the availability of long (in terms of period length) and relatively continuous (with short gaps of missing data) meteorological data as well as soil temperature data for 5, 10, 20, 50, and 100 cm. The aim of this was to see if any trends (linear or nonlinear) or patterns of lag times and/or moving averages existed across different climates and soil textures, which could be used in the future for developing better generalizable ANFIS models for predicting Soil T accurately in deeper soils. Variables such as elevation above sea level, latitude, and longitude were considered for examining trends and patterns.

Meteorological features and soil temperature
The descriptive statistics: minimum, maximum, mean, and standard deviation of the features and target variables are provided in Table 2. As expected, for each weather station, the

F I G U R E 4
Lancaster time series of observed soil temperature at five depths ranging from 5 to 100 cm mean values of Soil T at the five depths are relatively the same since temperature fluctuates about a mean value irrespective of the depth. The mean values are different across the two weather stations due to climatic and soil textural differences. For similar depths, the standard deviations of Soil T for both stations are slightly different (with relatively equal shift of about 0.5˚C) and tend to decrease with depth since the amplitude of daily or seasonal harmonic variation of Soil T decreases with depth as shown in Lancaster example in Figure 4.

Correlation analysis
The correlations between Soil T and features were examined ( Table 3). The highest correlation was found between Soil T and mean air temperature (Mean T), indicating that Mean T is the most influential variable on Soil T. This is similar to those of previous studies (Mihalakakou, 2002;Tabari et al., 2011;Talaee, 2014). The correlation between Soil T and Mean T ranged from 0.93 at 5-cm depth to 0.72 at 100-cm depth at Lancaster while ranging from 0.90 at 5-cm depth to 0.76 at 100-cm depth at Polk. The second and third highest correlations were Max T and Min T. Among the meteorological features, Rel Hum had the lowest correlation with Soil T. However, it was included as a feature in our study due to the existence of some autocorrelation as shown in Figure 2. Table 4 shows the correlation of Soil T at five depths with the best lag times and best moving averages of Mean T for Lancaster and Polk weather stations. The table shows that whereas the correlation of best lag time with Mean T decreased with depth for both stations, the correlation of best moving average with Mean T increased with depth.

Performance evaluation
For all feature transformation methods, it was observed that using 20 epochs was adequate because higher epochs did not increase model performance significantly. The performance evaluation of the models indicated that the predicted Soil T values at Lancaster station were highly correlated with the observed values with all R 2 values (training and testing) greater than .79 (Table 5). For FCM models, the testing R 2 ranged from .80 to .99, whereas the RMSE ranged from 0.9 to 3.6˚C. For subtractive models, the testing R 2 ranged from .81 to .99 whereas the RMSE ranged from 0.8 to 3.5˚C. A paired t test at a significance level of α = .01 showed that subtractive clustering was better than FCM clustering (p < .0001) over the five depths.
Similarly, predicted Soil T values at Polk station were also highly correlated with the observed values with all R 2 values (training and testing) greater than .83 (Table 6). For FCM models, the testing R 2 ranged from .83 to .99, whereas the RMSE ranged from 0.8 to 2.9˚C. For subtractive models, the testing R 2 ranged from .84 to .99, whereas the RMSE ranged from 0.7 to 2.8˚C. A paired t test at a significance level of α = .01 showed that subtractive clustering was better than FCM clustering (p < .0001) over the five depths.
In general, NT models had the highest errors for each depth when compared to other feature transformation methods, whereas NT-ACMA models had the lowest errors (Tables 5  and 6). These results were validated by conducting one-tailed paired t tests (α = .05) with a hypothesized mean difference of zero between NT-ACMA and MA (which was the second best feature transformation method). The resulting p values for each pairwise comparison showed that the NT-ACMA models were significantly better than MA models with p values of .05 and .03 for Lancaster and Polk, respectively. For both Lancaster and Polk stations, the scatter plots (testing data only) between actual and predicted daily Soil T at 5-cm and 100-cm depths using all feature transformation methods with subtractive clustering are shown in Figures 5-8. Training and testing results for all soil depths and feature transformation methods are shown in the figures in supplemental materials. Irrespective of the climate and soil texture, the performance of the NT models in our study were similar to recent data-driven model results of Alizamir et al. (2020), Sihag et al. (2020), Feng et al. (2019, Moazenzadeh and Mohammadi (2019), Xing et al. (2018), Sattari et al. (2017), Nahvi et al. (2016), and Talaee (2014 in terms of the increasing magnitudes of error from top soil to about 100 cm. As stated earlier, this is because these studies did not incorporate scientific principles (i.e., physical processes of heat transfer) such as lag times and moving averages when predicting Soil T in deeper soils. As a result, their results lacked generalization.
In contrast, the NT-ACMA models in our study showed increasing prediction accuracy with increasing soil depth because the NT-ACMA method is a combination of two feature transformation methods (with lag times and moving averages), as well as the NT method. Each transformation method contributed to the overall robustness of NT-ACMA. Results indicated that, on the average across five depths, our NT-ACMA method had prediction RMSE values of 1 and 1.2˚C at Lancaster and Polk, respectively.
A higher prediction accuracy with depth would be of great importance, especially in studies related to geology, geothermal engineering, agriculture, and climate change. For instance, a study by Meehl et al. (2007) showed that daily Min T will increase more rapidly than daily Max T over the next 30 yr. This would lead to increases in daily Mean T and could have a negative impact on crop yields. The need for more accurate models for predicting Soil T especially in the crop root zone cannot be overemphasized because these models are important for better understanding of the potential impacts on plant growth and development. Better predictions could help in the development of adaptation strategies to counterbalance the impacts of climate change.
T A B L E 6 Performance of the adaptive neuro-fuzzy inference system (ANFIS) models in predicting soil temperature at multiple depths at Polk, MN  Figure 9 shows the best lag times and best moving averages for the 11 weather stations across different climates and soil textures. Although initial results (not shown) indicated that there were no linear relationships between either best lag time or best moving average and both elevation and longitude, there seemed to be some weak linear relationships between best lag time and both Max T and latitude. These weak relationships could probably be attributed to the existence of fractional (<1 d lag) or non-integer lag times for 5-to-20-cm depths. Initial results for 5-cm depth showed that there appears to be fairly strong linear relationships between best moving average and both long-term Max T and latitude with R 2 values of .69 and .57, respectively. However, since using these two variables together in a multiple linear regression (MLR) model with small sample size (n = 11) would reduce the degrees of freedom of the residuals and lead to an increase in the p-values for the regression coefficients, we decided to combine Max T and latitude as a single ratio with latitude converted to solar inclination (90˚− latitude). Solar inclination was used instead of latitude since the amount of heat energy received at any location on the Earth's surface is a function of sun angle on that location. This implies that a location on the Earth's equator (latitude 0˚) would have a maximum solar inclination of 90˚(since the sine of 90˚is 1), and a location on any of Earth's geographical poles (latitude 90˚) would have a minimum solar inclination of 0˚(since the sine of 0˚is 0). Even though the daytime length at a location on the Equator is 12 h in all seasons, and the daytime length at all other latitudes varies with seasons, our study assumed that the effect of Earth's tilt was negligible since the average daylight length at any latitude is approximately 12 h for all seasons. This assumption was only made for the sake of simplicity.

Prediction approach for the contiguous United States
Using the ratio [(Max T)/(90˚− latitude)] as the only predictor variable increased the R 2 to .71 for the MLR equation at a 5-cm depth ( Figure 10). For all depths, MLR results also showed stronger linear relationships between the best F I G U R E 5 Lancaster actual temperature vs. testing predicted soil temperature at 5 cm using subtractive clustering. Top: nontransformation (NT), autocorrelation (AC). Bottom: moving average (MA), combination of NT, AC, and MA transformations (NT-ACMA). GENFIS, generalized network-based fuzzy inferencing systems; ANFIS, adaptive neuro-fuzzy inference system F I G U R E 6 Polk actual temperature vs. testing predicted soil temperature at 5 cm using subtractive clustering. Top: nontransformation (NT), autocorrelation (AC). Bottom: moving average (MA), combination of NT, AC, and MA transformations (NT-ACMA). GENFIS, generalized network-based fuzzy inferencing systems; ANFIS, adaptive neuro-fuzzy inference system F I G U R E 7 Lancaster actual temperature versus testing predicted soil temperature at 100 cm using subtractive clustering. Top: nontransformation (NT), autocorrelation (AC). Bottom: moving average (MA), combination of NT, AC, and MA transformations (NT-ACMA). GENFIS, generalized network-based fuzzy inferencing systems; ANFIS, adaptive neuro-fuzzy inference system F I G U R E 8 Polk actual temperature versus testing predicted soil temperature at 100 cm using subtractive clustering. Top: nontransformation (NT), autocorrelation (AC). Bottom: moving average (MA), combination of NT, AC, and MA transformations (NT-ACMA). GENFIS, generalized network-based fuzzy inferencing systems; ANFIS, adaptive neuro-fuzzy inference system F I G U R E 9 Best lag times and best moving averages for different locations in the United States F I G U R E 1 0 Best moving average vs. ratio of maximum temperature (Max T) to solar inclination at 5-cm depth for different locations in the United States moving averages and this ratio with highly significant p values (α = .05) of .001, .002, .004, .007, and .08 for the slopes of the regression equations at depths of 5, 10, 20, 50, and 100 cm, respectively. Using these MLR equations for different soil depths would make it possible to estimate the best moving averages at any latitude and soil depth (5-100 cm) in the contiguous United States provided meteorological data are available. This would also make our NT-ACMA (or MA) feature transformation method transferable to any location in the contiguous United States and other parts of the world irrespective of the climate and soil texture.
Moreover, the method could be used to help provide soil temperature data to spin up land surface models (Meng et al., 2017;Nair & Indu, 2019;Niraula et al., 2017), and also to do interpolation of soil temperature data from network of weather stations such as the Nebraska Mesonet stations which provide a soil temperature guide for optimal timing of planting (Shulski et al., 2018). Furthermore, the advancement in remote sensing technologies would help in the collection of global dataset of meteorological variables derived from satellite remote sensing. Coupling remote sensing data with data collected at more weather stations around the world would help in the development of generalizable global models.

CONCLUSIONS
In this study, knowledge-based ANFIS models were developed to improve the prediction accuracy of daily soil temperature at 5-, 10-, 20-, 50-, and 100-cm depths by using meteorological features at two weather stations with different climates and soil textures in the U.S. ANFIS models based on four feature transformation methods (NT, AC, MA, and NT-ACMA) were developed using FCM clustering and subtractive clustering at the five soil depths. Although the prediction errors of the models (subtractive clustering with testing datasets) for both Lancaster and Polk gradually increased from top soil to 100 cm (from 2.4 to 3.1˚C) for the NT models, the prediction errors decreased from top soil to 100 cm for both MA (1.8-0.8˚C) and NT-ACMA (1.2-0.8˚C) models. This is due to the use of smoother moving averages of the features. In general, the overall best model accuracy was achieved using subtractive clustering and NT-ACMA, which combined the strengths of three transformation methods. The findings of this study showed that ANFIS-based NT-ACMA models were able to reduce prediction RMSE with increasing soil depth for both Lancaster and Polk weather stations. Overall, these models seem to be adequate to reduce prediction errors at different depths to about 1.1˚C on the average. After transferring this approach to nine other weather stations across different climates and soil textures in the contiguous United States, results showed that lag times could be estimated for any location in the United States using Max T and latitude. The accuracy of the estimated time lags could be improved as more soil temperature data are available across different latitudes in the future.
Coupling lag times and moving averages of features could help improve soil temperature prediction with depth irrespective of the climate and soil texture of any location. Our knowledge-based models could be combined with weather forecasting models for predicting soil temperature for agricultural production, land surface models, building energy savings, construction industries, shallow geothermal applications, and climate studies. As more meteorological data become available globally with the advancement of remote sensing technologies, future studies could focus on using new ML algorithms with NT-ACMA feature transformation method for predicting soil temperature in the United States and other climates and soils around the world.

C O N F L I C T O F I N T E R E S T
The authors declare no conflict of interest.