Spatial modeling of long-term air temperatures for sustainability: evolutionary fuzzy approach and neuro-fuzzy methods

This paper investigates the capabilities of the evolutionary fuzzy genetic (FG) approach and compares it with three neuro-fuzzy methods—neuro-fuzzy with grid partitioning (ANFIS-GP), neuro-fuzzy with subtractive clustering (ANFIS-SC), and neuro-fuzzy with fuzzy c-means clustering (ANFIS-FCM)—in terms of modeling long-term air temperatures for sustainability based on geographical information. In this regard, to estimate long-term air temperatures for a 40-year (1970–2011) period, the models were developed using data for the month of the year, latitude, longitude, and altitude obtained from 71 stations in Turkey. The models were evaluated with respect to mean absolute error (MAE), root mean square error (RMSE), Nash–Sutcliffe efficiency (NSE), and the determination coefficient (R2). All data were divided into three parts and every model was tested on each. The FG approach outperformed the other models, enhancing the MAE, RMSE, NSE, and R2 of the ANFIS-GP model, which yielded the highest accuracy among the neuro-fuzzy models by 20%, 30%, and 4%, respectively. A geographical information system was used to obtain temperature maps using estimates of the optimal models, and the results of the model were assessed using it.


INTRODUCTION
Regional climate is critical to the efficiency of crops as it deeply influences their yield potential. Rainfall, solar irradiance, and the temperature of the air are factors affecting crop yield, improvement, and growth (Ali-Nezhad & Eskandari, 2012). With regard to the phonological levels of the ripening of a crop, temperature is considered vital (Rosenzweig & Liverman, 1992). Moreover, crop growth is done within only specific temperature ranges. While the growth of a plant relies heavily on temperature, the plant species is the most significant determinant of the range of temperature ideal for its growth (Hasanuzzaman, Nahar & Fujita, 2013). Temperature determines the crop's survival and growth in each region (Cobaner et al., 2014). Brunetti et al. (2014) discuss the pros and cons of three spatial models such as multi-linear regression with local improvements (MLRLI), regression kriging (RK), local weighted linear regression (LWLR) considering elevation input for temperature assessment in Italy. Gonzalez-Hidalgo et al. (2015) employ spatial trend analysis based on the interpolation method such as radial weight with a Gaussian shape using angular weight for temperature data analysis. Serrano-Notivoli, Beguería & De Luis (2019) evaluate spatial, temporal variability of temperature using methods such as Generalized Linear Mixed (GLMM) and Generalized Linear Models (GLM) with the inclusion of spatial predictors such as latitude, longitude, altitude, and distance.
Techniques from artificial intelligence (AI), such as the adaptive neuro-fuzzy inference system (ANFIS) and artificial neural networks (ANN), have been used in such subjects as agro-hydrology, agro-meteorology, and engineering for water resources. This study focuses on the relevant literature. Zhu et al. (2019) proposed a model using different machine learning techniques such as multilayer perceptron neural network models (MLPNN), adaptive neuro-fuzzy inference systems (ANFIS) with fuzzy c-mean clustering algorithm (ANFIS_FC), ANFIS with grid partition method (ANFIS_GP), and ANFIS with subtractive clustering method (ANFIS_SC) for prediction of water temperature. Khosravi et al. (2018) employs a model using multilayer feed-forward neural network (MLFFNN), radial basis function neural network (RBFNN), support vector regression (SVR), fuzzy inference system (FIS) and adaptive neuro-fuzzy inference system (ANFIS) for solar radiation prediction.
The abductive neural network method was utilized by Abdel-Aal (2004) to predict air temperature every hour. Using data concerning seasonality and changing the parameters of an ANN model, Smith, McClendon & Hoogenboom (2005) developed an ANN for temperature forecasting. Dew-point temperature was modeled by Shank, Hoogenboom & McClendon (2008) using neural networks, and Turkey's long-term temperature forecasting was applied by Bilgili & Sahin (2009) to each month. A new method utilizing the Yule-Walker equation and ANNs was proposed by Chattopadhyay, Jhajharia & Chattopadhyay (2011) to model the time series of monthly maximum temperatures in northeast India. Turkey's mean air temperature for each month was modeled by Şahin (2012) using ANN and remote sensing. By using geographical data, the performance of ANNs and ANFIS was compared in terms of forecasting air temperature in Iran by Kisi and Shiri. In another study, the monthly mean of Turkey's air temperature was predicted by Cobaner et al. (2014) using ANNs and ANFIS. All past assessments have used ANFIS and ANNs, and none of them has evaluated fuzzy methods for long-term modeling of temperature by applying geographical information. Thus, this paper aims to model long-term temperatures using geographical data and compare an evolutionary fuzzy method with three approaches based on ANFIS.
Some researchers in climatology, biogeography, hydrology, meteorological, agriculture, and ecology have applied temperature modeling using spatial/geographical information systems (GIS) (Ninyerola, Pons & Roure, 2000). Goodale, Aber & Ollinger (1998) applied predictors to interpolate temperature and precipitation in Ireland using spatial information (latitude, longitude, and altitude). Geostatistical model kriging was employed by Benavides et al. (2007) to model temperature. Chuanyan, Zhongren & Guodong (2005) compared the results of conventional predictive models, such as linear regression and geostatistical interpolation techniques (e.g., ordinary kriging, splines, and inverse distance weight), in terms of spatial distribution in modeling surface air temperature. Ninyerola, Pons & Roure (2000) also employed GIS-based techniques for the accurate prediction and mapping of the nonlinear behavior of air temperature over space and time.
To predict long-term monthly air temperature with spatial data for achieving sustainability, in this research the evolutionary fuzzy genetic (FG) model applicability is examined. The results of the proposed model were evaluated by comparing with those of three ANFIS models: ANFIS using fuzzy c-means clustering (ANFIS-FCM), ANFIS using grid partitioning (ANFIS-GP), and ANFIS using subtractive clustering (ANFIS-SC). In this regard, training and testing the FG and ANFIS models were conducted on 71 weather stations data in Turkey. The results were also evaluated using the GIS method.

STUDY AREA
The 37th largest country in the world, Turkey is located at 38.9637 • N latitude and 35.2433 • E longitude with an area of 780,000 km 2 , 25 million hectares of which is suited for agriculture. Turkey is a mountainous country with large plains. The highest mountain, Ararat, is 5165 m high. The distribution of temperature and weather patterns are significantly influenced by the orientation and elevation of mountains in the country.
This study used monthly mean temperatures gathered from 71 stations of the Turkish State Meteorological Service. This information included a 40-year dataset from 1970-2011 for each station. Table 1 shows the temperature ( • C), longitude, latitude, and altitude above sea level as recorded by each station. The temperature varied from −11.5 • C in January (Ardahan) to 32 • C in July (Urfa) in the long term. The high variation may have occurred because of the sea rounding these areas, the high mountains along the coasts of the Black Sea and the Mediterranean Sea, and the mountainous eastern Anatolia region (Cobaner et al., 2014). Figure 1 shows the meteorological stations distribution in Turkey.

Fuzzy approach
Fuzzy logic has been used in such fields as business, engineering, and the sciences (Zadeh, 1965). Figure 2 shows units of fuzzification of a general fuzzy system including a fuzzy inference engine, a fuzzy rule base, and defuzzification. Fuzzy logic is based on the idea of using hybrid systems, where part of one system and part of another are used instead of only one. The degree of dependence on a system is a number between zero and one.
In the inference mode, the fuzzy system takes inputs and outputs data. By using a map of association, namely fuzzy associative memory, the inputs are transformed into the equivalent output sets by fuzzy system learning (Kosko & Toms, 1993). Some ''black box'' approaches such as ANNs can function, for example, in regression, but fuzzy systems are clearer and more pliable. Thus, it is clear how fuzzy systems perform and modify processes (Russell & Campbell, 1996).
The fuzzy method used here is as follows. Several subsets using Gaussian member function are formed from the input and output parameters. ck fuzzy rules are available, where c and k are the subsets and inputs numbers, respectively. Efficiency increases with  the number of subsets, and the rule base increases in weight, rendering construction more complex. In the case of only one input, x, and k subsets, the rule base yields yn (n = 1, 2, . . . , k2) (Şen, 1998).
In the case of one variable input, x with four values of ''low,'' ''medium,'' ''high,'' and ''very high,'' four rules are possible: The output is single weighted, y, as a weighted average of the outputs of these four rules as follows: where the degree of membership, w n , is investigated for each x to be assigned an equivalent y n after triggering each rule.
Therefore, having formed the rule base, any assortments of subsets of the parameters used as input to the fuzzy system, Eq. (1), can be used to compute the output values (y) (Şen, 1998).
The proposed fuzzy base rule can be computed using the datasets used as input and output as follows: 1. using the smallest number of inputs; 2. assigning a specific membership function to each input; 3. computing the value of membership (w n ) of x in all fuzzy subsets; 4. calculating the output y n simultaneously with the weight set w n ; 5. renewing all other data points; and 6. calculating the weighted average using Eq. (1) (Kiszka, Kochanskia & Sliwiska, 1985a;Kiszka, Kochanskia & Sliwinska, 1985b). As any change in the subsets makes a direct effect on the functionality of the fuzzy model, forming the fuzzy subsets is among the most challenging problems in the area. It is thus vital to define the membership functions optimally to maximize modeling efficiency (Kisi & Cengiz, 2013). The optimal membership functions are defined in this research using the genetic algorithm.

Genetic algorithm
GAs have been researched in engineering since 1960, when they were introduced by Holland ( Abraham & Jain, 2005;Ortiz Jr et al., 2004). In general, GAs try to imitate the Darwinian concept of natural selection. A GA first generates a set of possible solutions and tries to find the best approach to survival to form a new population of solutions, which assesses the real solution better than before. The challenge of creating better and fitter solution sets is the basic principle of GAs. They are utilized to find the best and most optimal solutions to solve difficult problems. Utilizing this approach provides a useful option for solutions (Aijun et al., 2004;Tsai, Liu & Chou, 2004). While other methods are restricted due to their suppositions, GAs have fewer such limits (Jean, Lin & Chou, 2007).
Adaptive neuro-fuzzy inference system ANFIS, a general investigation tool with the quality to approximate continuous real functions on compact sets, was developed in 1993 by Jang. In its structure, nodes are attached to one another by directional links. A node function containing variable or constant parameters introduces the node (Jang, Sun & Mizutani, 1997).
The example below presents a typical fuzzy inference system (FIS) with an output f, three inputs x,y, and z, and two if-then fuzzy rules of the Takagi and Sugeno type: Rule 1: If x is A1, y is B1, and z is C1, then f 1 = p 1 x + q 1 y + r 1 z + s 1 Rule 2: If x is A2, y is B2, and z is C2, then f 2 = p 2 x + q 2 y + r 2 z + s 2 where f 1 and f 2 are the output functions of rules 1 and 2, respectively. Figure 3 shows the structure of ANFIS. The functions of the nodes are as follows. O l,i = φA i (x) indicates the node function for each square node i of layer 1 that is adaptive, and i = 1, 2, x, is the ith input node; A i is a linguistic label (i.e., ''small'' or ''big'') for this node function. When input x satisfies quantifier A i , O l , i , is the membership function (MF) of fuzzy set A (e.g., A 1 , A 2 , B 1 , B 2 , C 1 , C 2 ) and indicates its degree. φA i (x) is mainly a Gaussian function ranging between zero and one as the minimum and maximum levels, respectively: where {a i ,b i } is the parameter set. These layer parameters are considered to be assumed. The circle nodes of layer 2 have index , which indicates the multiplication of the inputs and the sending of the product. For instance, w i = φA i (x)φB i (y)φC i (z), i = 1, 2. The output of each node shows the impression level of a rule. Circle nodes of layer 3 take the label N. Following this, the ratio of the level of impression of the ith rule on the sum of the levels of all rules is computed by the ith node using The node function of the square nodes of layer 4 is O 4,i =w i f i =w i p i x + q i y + r i z + s i , wherew i represents the layer 3 output and the parameter series is p i ,q i ,r i , s i . These layer parameters are called consequent parameters. The single circle nodes of layer 5, labeled '' ,'' sums all incoming signals and returns the result as the final output: The ANFIS network acts similarly to a first-order Sugeno FIS (Kisi, 2015). Linear or fixed-valued functions are used in ANFIS to generate the output. Detailed information concerning ANFIS is available in Jang's study (1993).

Grid partitioning
This approach provides independent partitions of prior variables (Jang, 1993). The membership functions of all prior variables can be determined by prior knowledge and experience. Expressing the meaning of the linguistic terms of a context is the goal of designing these partitions. However, in many systems, there is no specific information accessible to these partitions. Thus, the ranges of the prior variables may easily be divided into membership functions that are equally shaped and spaced. The membership functions may be located suitably in case the system's input-output data are available. The rule base should be generated in a manner that perfectly covers previous fuzzy set combinations. The membership functions of each variable are built apart from those of others, and this is the main disadvantage of this method as this causes it to overlook the relationship among variables (Vernieuwe et al., 2005).

Subtractive clustering
The ANFIS subtractive clustering (ANFIS-SC) model is defined by merging ANFIS with subtractive clustering. In this model, a possible cluster center is each data point, which is not a grid point (Chiu, 1994). It is thus an extended model of Yager & Filev (1994), the mountain clustering method.
In this approach, the number of effective ''grid points'' that should be investigated are similar to the number of data points independently of the number of dimensions of the problem. One of the advantages of this approach is that a tradeoff between computational complexity and accuracy is unnecessary because the need to define a grid resolution is removed. The scale of the mountain method to accept or reject the cluster centers is also extended in subtractive clustering.
The impressive radius is vital to defining the number of clusters. Too many smaller clusters require more rules if a small radius is chosen, and vice versa. Thus, selecting a suitable impressive radius for data space clustering is crucial. Defining the number of fuzzy rules and presumptions of the fuzzy MF is the next stage. In the end, the results in the output MF are generated by utilizing linear least squares, which builds a valid FIS (Cobaner, 2011;Sanikhani & Kisi, 2012).

Fuzzy c-means clustering
C-means fuzzy clustering is a form of flexible clustering model in which the data points are combined by calculating possible data points in the feature space. In this model, the mountain clustering approach can be used to calculate the number of clusters and cluster centers (Chiu, 1994;Cobaner, 2011). This model is based on the k-means algorithm, which is unsuitable for big datasets. C-means fuzzy clustering reduces the intra-cluster variance to a minimum (Ayvaz, Karahan & Aral, 2007) and combines data using its clustering algorithm. The c-means fuzzy clustering minimizes either the distance or the objective squared error function (Kisi, 2015).

Spatio-temporal modeling
The use of GIS for the spatial modeling of many real-world phenomena toward implementation for sustainability (Childers et al., 2015;Lundgren & Kjellstrom, 2013) has attracted research attention. Spatial modelling acts like a critical tool to examine the nature or property of real-world phenomena whether being sustained or not. Much of real-world phenomena are dynamic, and involve spatial and temporal changes. Thus, the use of a GIS for temporal and spatial modeling plays an important role in improving model visualization and tracking how much far from sustainability indicators (Fig. 4). The GIS is used for various purposes, such as creating the required spatial data (geographical location) used as input for the prediction model. Using the spatial functions generates a map related to the modeling phenomena to assess their behavior and at different locations. Such spatial statistical functions as zonal statistics assess variation in phenomena as well as minimum and maximum mean values at different locations. A geospatial function, such as the interpolation method, enables the elimination of possible model defects (e.g., data loss in some areas). Geostatistical analysis, such as trend analysis functions, provides forecasting trends and patterns of phenomena.

Model evaluation statistics
Four statistics were used to evaluate the models: mean absolute error (MAE), root mean square error (RMSE), Nash-Sutcliffe efficiency (NSE), and the determination coefficient (R 2 ). RMSE, MAE, and NSE can be expressed as where AT M and AT o are modeled and observed air temperatures, respectively, n is the number of time steps, and AT o is the mean observed air temperature.

Modeling long-term air temperatures using evolutionary fuzzy and neuro-fuzzy approaches
Four hybrid fuzzy methods-evolutionary FG, ANFIS-GP, ANFIS-SC, and ANFIS-FCMwere compared in terms of predicting long-term air temperatures. The inputs of the models were latitude, longitude, altitude and month of the year. To train and test models, the data of 71 weather stations in Turkey were used. The entire dataset (the 70 stations ×12 months = 840 data items) was divided into three subsets. The first 23 stations were used in the first part, the second set of 23 for the second, and the other 24 stations were used for the last part. In the first application, training and testing were conducted using the first two parts (subsets) and the third part respectively; it was called model 1 (M1). In the second application, training was conducted using the second and third parts and testing was conducted using the first part, and was called model 2 (M2). In the last application, training was conducted using the first and third parts the testing was conducted using the second part, and was called model 3 (M3). ANFIS-GP was applied to the datasets, and two and three Gaussian membership functions (MFs) were used to find the optimal one. More than three MFs led to worse results and memory-related problems. The number of iterations was varied from 10 to 100 with increments of 10 for each number of MFs. Different values of the radius were used (from 0.1 to 1 in increments of 0.1) for the ANFIS-SC model, and different numbers of clusters were used (from two to eight, incremented by one) for the ANFIS-FCM model. The numbers of iterations used were similar to those in the ANFIS-GP and ANFIS-SC Table 2 Optimal parameters and structures of the ANFIS-GP, ANFIS-SC, ANFIS-FCM and FG models.
A graphical comparison of the four methods is provided in Fig. 5-Fig. 7 for M1, M2, and M3. The obvious finding from Fig. 4 is that the estimates of the FG model were closer to the corresponding observed temperatures than those of the ANFIS-GP, ANFIS-SC, and ANFIS-FCM models. Considerable under/over-estimations were observed in the ANFIS-FCM model. The scatterplots show that the fit line equation of the FG model was y = 0.9910x-0.906, closer to the exact line (y = x) with a higher R 2 than the other three models. ANFIS-GP was better than the other two neuro-fuzzy models. Similar trends were seen for M2 and M3 (see Figs. 6 and 7).

Spatial modeling of long-term air temperatures using GIS
Air temperature is a spatio-temporal phenomenon, and thus changes with time and place. Based on their capabilities, GIS spatial functions can be used to improve the model. The GIS can be used to provide additional input for air temperature modeling in the form Table 3 Training and testing results of the FG, ANFIS-GP, ANFIS-SC and ANFIS-FCM models.  of spatial data (latitude and longitude). This spatial modeling leads to greater flexibility in the proposed model and renders it more adaptable for practical use. Moreover, the GIS is used to create temperature map-related locations to enable better visualization and interpretation of the outputs. A classified temperature mapping enables a comparison of the variation in temperature ranges of all methods with respect to M1 (Fig. 8), M2 (Fig. 9), and M3 (Fig. 10). The lowest temperature ranges according to the classified temperature map in M1 were ANFIS-SC: 4.52-8.74, ANFIS-FCM: 6.14-8.  Furthermore, the temperature map helps experts evaluate the temperature in different locations to identify places with maximum or minimum temperatures. For example, in M1, Osmaniye, in M2, Adana, and in M3, Hatay had the highest temperatures in the best FG method. The unique functionality of other GISs in this section helps address defects in the proposed model. Following this, if there is information loss in some areas, the maps use interpolation functions to yield the temperature.

DISCUSSION
As mentioned in the previous section, the ANFIS-GP has memory-related problems when it has much membership functions (more than 3 in this study). In fact, this also changes with respect to input numbers. In this study, 4 inputs were used and, in this case, the use of 4 membership functions causes memory problem. This is due to the fact that ANFIS-GP consider all rules combinations, and it therefore has much more premise parameters, which show the membership functions' shape and location and consequent parameters that compose the equations of considered rules. However, the main advantage of this method is that it is more flexible because of high number of weights compared to other ANFIS methods, ANFIS-SC and ANFIS-FCM. The main advantages of the ANFIS-SC and ANFIS-FCM methods are the use of less parameters or weights because they use clustering algorithms, and they can therefore optimize rule or membership functions' numbers. However, this property may cause inappropriate learning of these methods compared to ANFIS-GP.
The FG model was found to be the best method in modelling long-term temperatures, followed by the ANFIS-GP method. The FG also uses all rules combinations similar to ANFIS-GP and therefore, it has much more premise parameters which show the membership functions' shape and location and consequent parameters. The main advantage of the FG due to its evolutionary training algorithm (genetic algorithm, GA). In ANFIS-GP, gradient descent (GD) algorithm is utilized for the optimization of membership functions' parameters (premise parameters). The GD can be trapped into local optima, while this does not occur for the GA algorithm.

CONCLUSIONS
In this paper, the capabilities of evolutionary fuzzy genetics were compared with three neuro-fuzzy techniques in terms of estimating long-term air temperatures for sustainability. Data from 71 stations from Turkey were used and divided into three equal parts. The applied models were tested using each part. Each method was hence tested three times with all the data. Finally, a GIS was used to produce air temperature maps based on the results of the optimal models. Three main results can be drawn from the application: i-The evolutionary FG model is superior to the ANFIS-GP, ANFIS-SC, and ANFIS-FCM at modeling long-term air temperatures. ii-Of the neuro-fuzzy methods, the ANFIS-GP outperformed the ANFIS-SC and ANFIS-FCM. iii-The accuracy of the best neuro-fuzzy model, the ANFIS-GP, in terms of RMSE, MAE, NSE and R 2 increased by 20%, 30%, 4%, and 4%, respectively, using the FG model.