Assessing Soil and Land Suitability of an Olive–Maize Agroforestry System Using Machine Learning Algorithms

: Exponential population increases are threatening food security, particularly in mountainous areas. One potential solution is dual-use intercropped agroforestry systems such as olive ( Olea europaea )–maize ( Zea mays ), which may mitigate risk by providing multiple market sources (oil and grain) for smallholder producers. Several studies have conducted integrated agroforestry land suitability analyses; however, few studies have used machine learning (ML) algorithms to evaluate multiple variables (i.e., soil physicochemical properties and climatic and topographic data) for the selection of suitable rainfed sites in mountainous terrain systems. The goal of this study is therefore to identify suitable land classes for an integrated olive–maize agroforestry system based on the Food and Agriculture Organization (FAO) land suitability assessment framework for 1757 km 2 in Khyber Pakhtunkhwa province, Pakistan. Information on soil physical and chemical properties was obtained from 701 soil samples, along with climatic and topographic data. After determination of land suitability classes for an integrated olive–maize-crop agroforestry system, the region was then mapped through ML algorithms using random forest (RF) and support vector machine (SVM), as well as using traditional techniques of weighted overlay (WOL). Land suitability classes predicted by ML techniques varied greatly. For example, the S1 area (highly suitable) classified through RF was 9% ↑ than that of SVM, and 8% ↓ than that through WOL. The area of S2 (moderately suitable) classified through RF was 18% ↑ than that of SWM and was 17% ↓ than the area classified through WOL; similarly, the S3 (marginally suitable) class area via RF was 27% ↓ than that of SVM, and 45% ↓ than the area classified through WOL. Conversely, the area of N2 (permanently not suitable class) classified through RF and SVM was 6% ↑ than the area classified through WOL. Model performance was assessed through overall accuracy and Kappa Index and indicated that RF performed better than SVM and WOL. Crop suitability limitations of the study area included high elevation, slope, pH, and large gravel content. Results can be used for sustainable intensification in mountainous rainfed regions by expanding intercrop agroforestry systems in developing nations to close yield gaps.


Introduction
Due to rapid population increase, demand for food has also risen nationally.Similarly, rising income in developed countries has increased demands for high-value food products [1].Additionally, demand for edible olive oil in the U.S. has increased three-fold over the last 20 years [2].The population of Pakistan has substantially increased over the past 20 years along with demands for edible oil, as Pakistan is the fourth largest edible oil importer [3].Agriculture worldwide is a major contributor to deforestation and land degradation [4]; however, agroforestry systems may protect and mitigate soil losses, improve soil health, provide ecosystem services, and improve global carbon sequestration [5].In addition, intercropping agroforestry systems have major potential to maximize food production per unit area, provide ecosystem services, and improve economic viability.Agroforestry is a collective name for land-use systems and practices in which woody perennials are deliberately integrated with crops and/or animals on the same land-management unit.Such practices may attain more sustainable land use and can enhance production per unit area, while improving the livelihood of rural economies [6].Maize is one of the prominent cereal crops and is utilized for food, fodder, feed, and other value-added products, with 46% of maize being produced under rainfed conditions [7].Therefore, to enhance the production of rainfed systems, combat population growth rises by 3% from the last two decades, and address concurrent food security issues in Pakistan, researchers set out to evaluate land suitability for an integrated olive-and maize-crop agroforestry system.
Every region has varying agroforestry systems which depend on crop suitability and climatic factors [8][9][10][11][12][13].In the District of Lower Dir Pakistan, 54.4% is rainfed and 50.5% of farmers follow agro-silvo-pastoral systems where tree species (Ailanthus altissima, Morus alba, Juglans regia, and Populous nigra) are integrated with grain and fodder crops to enhance ecosystem services [14].However, little work has investigated the suitability of maizeolive intercrop agroforestry systems in Pakistan for closing yield gaps and improving food security.Production of agricultural lands in developing countries is very low compared to developed countries [15], and to overcome this issue, proper land suitability analyses are needed to close yield gaps and improve soil health.Such indices have a multitude of functions, spanning land-use planning to matching crops and soil for optimization, and soil conservation.The land assessment framework's goals aim to identify negative impacts and gains of agricultural lands and there is now a strong focus on environmental impacts, as well as on benefits from ecological and environmental resources [16].Further, land suitability analysis is the first step towards agricultural land-use planning for testing suitability of a particular application quantitatively [17].The qualitative evolution comprises data on climatic, hydrological, topographic, vegetative, and edaphic characteristics of the study area, with the quantitative evolution of land suitability focusing on the productivity of soils [18].The FAO land evolution framework and Land Suitability, part III: crop requirements [19,20] are widely used for land suitability analysis of integrated annual crops and perennial tree species.
Land suitability mapping offers knowledge to land managers and policymakers to make important decisions for reducing soil erosion and assessing sustainable land use.There is a deficiency of land suitability maps in Pakistan because land suitability surveys and mapping have not adopted modern approaches [21][22][23][24].Moreover, increasing demand for high-resolution soil maps around the world is being met by digital soil mapping as a method for generating spatial soil information.In developing countries, digital soil mapping is becoming increasingly important as field surveys are less frequent, timeconsuming, and costly, and soil surveys of individual federation units can no longer be afforded, but can be integrated with agroforestry-crop suitability mapping [25].
Machine learning (ML) models can strategically combine data from multiple sources and identify relationships by learning from these massive datasets [26].These ML models can play a vital role in creating a relationship between soil data and ancillary data, which help to understand the spatial and temporal changes during land suitability classification based on soil, climatic, and auxiliary variables [18,27].Artificial neural networks, partial least squares regression, support vector machine (SVM), generalized additive models, genetic programming, regression tree models, k nearest neighbor regression, adaptive neuro-fuzzy inference scheme, and random forests (RF) are some machine learning models that are used for establishing such relations [18,[28][29][30][31][32].Among these machine learning models, RF and SVM are the most widely used techniques for digital soil mapping because of their high accuracy, ease of use, and robustness.Ancillary data for crop suitability mapping are most widely obtained from digital elevation models, geo-spatial data, and remote sensing data [18,[32][33][34].
ML models have been extensively used to build digital soil maps in the past few years [35], although few researchers use these ML models for mapping land suitability in agroforestry systems [14,36].As an example, Chen et al. [37] used classification and regression tree algorithm to predict a suitable tree species for afforestation in Jinping County, Guizhou Province of China.Similarly, Taghizadeh-Mehrjardi et al. [18] assessed optimum land suitability for a rainfed wheat and barley system in the Kurdistan province, western Iran using ML algorithms from data obtained from 100 soil samples along with climatic and topographic variables.Jiang et al. [38] used k-means clustering to more precisely establish and manage agroforestry systems based on digital soil mapping and tree data (e.g., absolute growth rate and diameter at breast height).Kharel et al. [39] used ML to tease apart factors that affect system-level output (i.e., plant production (tree and forage), soil factors, and animal response based on grazing preference) and found that RF models helped identify system drivers in integrated tree-livestock systems.However, no studies to date have tested and evaluated ML models [i.e., RF, SVM, and traditional techniques of weighted overlay (WOL)] for their ability to identify suitable land classes for an integrated olive and maize agroforestry system for 1757 km 2 in Khyber Pakhtunkhwa province, Pakistan.Therefore, the goal of this study is to identify suitable land classes for an integrated olive and maize agroforestry system based on the FAO land suitability assessment framework, in an effort to close yield gaps.

Description of Experimental Region
The study was conducted throughout the district Dir Lower (34.9161 • N, 71.8097 • E) of Malakand division in the Khyber Pakhtunkhwa province of Pakistan at an altitude of 567 m from sea level.It has a total area of about 1757 km 2 (Figure 1) and the geographical boundary of the district is connected to Dir upper in the northeast and north, with district Swat to the east, district Malakand to the south, district Bajaur to the southwest, and Afghanistan to the west through the Durand line.Climatic conditions of the study area are characterized by relatively medium to high annual rainfall (1000-1200 mm).In the winter, temperature is reported as low as −6 • C min and as high as 38 • C, and in summer it ranges from 15 to 40 • C. The land-use landcover of the study area consists of 30% agricultural land, 15% barren land, 29% forests, 14% rivers, and 12% urban area.
ML models have been extensively used to build digital soil maps in the past few years [35], although few researchers use these ML models for mapping land suitability in agroforestry systems [14,36].As an example, Chen et al. [37] used classification and regression tree algorithm to predict a suitable tree species for afforestation in Jinping County, Guizhou Province of China.Similarly, Taghizadeh-Mehrjardi et al. [18] assessed optimum land suitability for a rainfed wheat and barley system in the Kurdistan province, western Iran using ML algorithms from data obtained from 100 soil samples along with climatic and topographic variables.Jiang et al. [38] used k-means clustering to more precisely establish and manage agroforestry systems based on digital soil mapping and tree data (e.g., absolute growth rate and diameter at breast height).Kharel et al. [39] used ML to tease apart factors that affect system-level output (i.e., plant production (tree and forage), soil factors, and animal response based on grazing preference) and found that RF models helped identify system drivers in integrated tree-livestock systems.However, no studies to date have tested and evaluated ML models [i.e., RF, SVM, and traditional techniques of weighted overlay (WOL)] for their ability to identify suitable land classes for an integrated olive and maize agroforestry system for 1757 km 2 in Khyber Pakhtunkhwa province, Pakistan.Therefore, the goal of this study is to identify suitable land classes for an integrated olive and maize agroforestry system based on the FAO land suitability assessment framework, in an effort to close yield gaps.

Description of Experimental Region
The study was conducted throughout the district Dir Lower (34.9161°N, 71.8097° E) of Malakand division in the Khyber Pakhtunkhwa province of Pakistan at an altitude of 567 m from sea level.It has a total area of about 1757 km 2 (Figure 1) and the geographical boundary of the district is connected to Dir upper in the northeast and north, with district Swat to the east, district Malakand to the south, district Bajaur to the southwest, and Afghanistan to the west through the Durand line.Climatic conditions of the study area are characterized by relatively medium to high annual rainfall (1000-1200 mm).In the winter, temperature is reported as low as −6 °C min and as high as 38 °C, and in summer it ranges from 15 to 40 °C.The land-use landcover of the study area consists of 30% agricultural land, 15% barren land, 29% forests, 14% rivers, and 12% urban area.

Soil Sampling and Lab Analysis
A random soil sampling approach was used to ensure accurate representation of soil properties in the study area [40].The total number of soil samples collected throughout the study area was 701 (dependent variables).Soil samples were collected with a tube auger (Figure 1), and the spatial position of each soil sample was recorded through a handheld Global Positioning System (GPS) device.The climatic data primarily included annual temperature and rainfall.The topographic data were retrieved from DEM, and indices like NDVI, SAVI, and BI were generated from satellite imagery, as shown in Table 1.The analysis for this study included multiple steps, with a flowchart on methodology provided in Figure 2.

Soil Sampling and Lab Analysis
A random soil sampling approach was used to ensure accurate representation of soil properties in the study area [40].The total number of soil samples collected throughout the study area was 701 (dependent variables).Soil samples were collected with a tube auger (Figure 1), and the spatial position of each soil sample was recorded through a handheld Global Positioning System (GPS) device.The climatic data primarily included annual temperature and rainfall.The topographic data were retrieved from DEM, and indices like NDVI, SAVI, and BI were generated from satellite imagery, as shown in Table 1.The analysis for this study included multiple steps, with a flowchart on methodology provided in Figure 2.  Soil samples were placed in bags and sent to the laboratory for further examination.Soil samples were dried, crushed, and then sieved through a 2 mm sieve [41].Soil physical properties such as soil texture were identified through the hydrometric method [42].Chemical properties, including soil organic carbon, were determined through wet combustion in Crops 2024, 4 312 which the evolved CO 2 was passed through a carrier stream in which potassium iodide and silver sulfate (KI, Ag 2 SO 4 ) were present, which prevents Cl interference.To trap traces of water vapor H 2 SO 4 , granular Zn and anhydride were used after CO 2 was weighed as absorbed by Mikohbite in a Nesbitt bulb.In calcareous soil, CO 3 was removed by a pretreatment to determine the organic carbon.Then, soil EC and pH were measured potentiometrically in a 2:1 water-volume-to-soil mass [43][44][45][46].

Determining Land Suitability Index Using Parametric Methods
Following determination of land suitability classes for an olive-maize agroforestry system, classification of land suitability classes through the weighted overlay technique was done by stacking all layers of climatic, soil, and topographic data, and then weights were assigned.Thereafter, selection of important features with ancillary and collected data for the land suitability classes was performed through a Boruta RF algorithm [47].A map of land suitability classes was then created using ArcGIS PRO 2.5.0 to compare results of WOL, RF, and SVM classification.Soil pH, organic carbon, soil texture, and slope along with climatic data (temperature and rainfall) were used for developing land suitability indices and then scored [48].Soil EC and CaCO 3 had no growth limitation for olive and maize crops, so those data were excluded during land suitability classification.
The square root technique was used to calculate land suitability indices for a maizeolive agroforestry system after choosing variables.This approach first assesses the climate.As a result, the climatic index is computed based on a rating of climate features.This climatic index was transformed to a climate rating, which was then combined with the soil and topographic ratings to create the final land suitability index.Computing the land suitability index employs the square root approach in the following Equation (1).
I L is the land index, Rmin is the minimal rating, and the other rating values were A, B, and C. The parametric technique assigns a value from 0 to 100 to each soil property depending on the limitation imposed by each soil feature [49].

Construction of Models for the Classification of the Land Suitability Classes 2.4.1. Traditional Method of Weighted Overlay (WOL)
The land suitability classification for an olive-maize agroforestry system was mapped through a traditional method of weighted overlay and then analyzed with machine learning algorithms (RF and SVM) in ArcGIS PRO 2.5.0 [50].After determining land suitability classes, each data layer was overlayed through WOL analysis based on its relative influence [22,51].WOL is one of the most common techniques used for the determination of suitability classes.The weights of each feature were determined through the AHP model (analytic hierarchy process) [52].

Machine Learning Models: Random Forest (RF) and Support Vector Machine (SVM)
Random forest is a machine learning algorithm using multiple decision/classification trees.The main idea is to utilize the bootstrap sampling technique to choose k samples at random from the original dataset, which are then brought back in as a new training set.The classification trees are then built to build a random forest of k decision tree, and the affiliation of new samples is selected by majority vote based on the findings of each classification tree.The algorithm's aim is to strengthen decision trees by integrating numerous decision trees, with each tree being created based on an autonomously generated sample set.
To evaluate land suitability classes for olive-maize agroforestry systems, two ML algorithms were employed on a set of auxiliary variables including terrain characteristics and satellite imagery in conjunction with climatic data.The final ML model was built on the training data, and the testing data (point data) were used to control the construction process and select the optimal parameters.Because of their successful applicability in previous research [18,33,34,53], we chose two machine learning models: RF and SVM.Both RF and SVM function effectively when data are not available on a large scale.In this work, the extensive linkages between the predicted classes of land suitability and auxiliary variables that may be deduced by the exploration of relevant auxiliary variables are noteworthy.
The basic principle behind SVM is to map data to a high-dimensional space in order to explore the ideal classification hyperplane by reducing the upper limit of classification error.SVM focuses on determining the separating hyperplane ω T x + b for linearly separable data.Equation (2) describes its objective function.
where ω is the normal vector that specifies the hyperplane's direction, label is the category label, and label (ω T x + b) is the constraint.Equation ( 2) may be modified into Equation (3) by using Lagrange multipliers. maxα where α is the displacement that determines the distance of the hyperplane from the origin and C is the relaxation variable.The separation hyperplane can be obtained by solving α and ω.
The SVM approach handles nonlinear issues by defining an appropriate kernel function, which effectively computes the similarity between samples and landmark points to establish new features to train complicated nonlinear decision limits.There are four kernel functions in SVM: linear, polynomial, hyperbolic tangent (sigmoid), and Gaussian radial basis function (RBF).Because multiple kernel functions were employed to identify hyperplanes across different data dispersion, the kernel function parameters, as well as their accompanying penalty coefficients C and gamma parameters, impact model accuracy.

Selection of Important Features
All features obtained from auxiliary data, i.e., terrain attributes and satellite imagery of the sentinel 2A along with soil properties, were processed through a feature selection Boruta algorithm in the R statistical package.Boruta is a wrapper algorithm built around the RF.The Boruta algorithm formed at least five copies of the original auxiliary dataset along with soil and climatic data, also known as shadow attributes.To remove the correlation between auxiliary data and land suitability classes, the added feature data were shuffled.Then the algorithm was run with RF classifier over the added feature data and computed using the Z score for each variable and corresponding shadow variable.The Z score represents the importance of each variable for land suitability classes.Those variables which score better than the maximum Z score was assigned with a hit and marked "confirmed" and were considered important.Variables with a lower Z score were assigned 0 hits and marked "Rejected" and were considered unimportant [47].

Machine Learning Algorithms
RF Models in ArcGIS Pro 2.5.0 were built and predictions made using a modification of Leo Breiman's random forest technique, which is a supervised machine learning technique [50].Predictions were made for categorical variables, as well as continuous variables.Predictor variables can be added to the attribute table of training samples.Raster datasets then determined neighborhood values for usage as auxiliary variables.Predictions were made to either features or a prediction raster, in addition to evaluation of prediction accuracy based on training data.The SVM classifier tool in ArcGIS Pro 2.5.0 was also used as a supervised classification algorithm.For conventional image inputs, the tool in ArcGIS Pro 2.5.0 takes multiband imagery with depth and does SVM classification on a pixel basis, depending on input training feature file, which was utilized in the classification process.

Model Evaluation
In this study, classification accuracy was assessed for each classified image by using three different classification techniques (WOL, RF, and SVM), which were then computed from two common performance metrics in the confusion matrix (i.e., overall accuracy and kappa index) for each classification.The overall accuracy was evaluated as the ratio between the sum of all correctly classified land suitability classes and all other classes, as shown in Equation (4).

Overall Accuracy =
Σ TP + TN ΣTP + FP + TN + FN (4) The kappa index was used, which is a robust indicator that incorporates the probability that a class is categorized by chance.It is a simple statistic that quantifies the ratio of all potential occurrences of inclusion or exclusion that are predicted by a model after taking into consideration chance predictions.A higher kappa index, like a higher overall accuracy, represents a strong model performance and is given in Equation (5).

Summary of Statistics
Descriptive statistics of soil properties of the study area are shown in Table 1.The mean value of pH was 7.83 and ranged between 6.40 and 8.70, which shows that the average soil of the study area was sodic.The mean value of EC was 0.22 dsm −1 and ranged between 0.03 and 0.68 dsm −1 , illustrating a low electrical conductivity.The CaCO 3 ranged between 0.03 and 0.68 and had a mean of 0.22%, which represents calcareous soil in the study area.The OC ranged between 0.03% and 0.59% which shows that soil fertility was low.The predominant soil textures in the study area were sandy loam, loam, and silty clay loam.The coefficient of variation (CV) of the EC, CaCO 3, clay, and sand were high, i.e., greater than 35%, likely owing to variation in topography and parent material.However, the CV for the rest of the soil properties was moderate, i.e., less than 35%.Therefore, point data used to build a regression (soil physiochemical properties) had a wide range of values in the study area.
After the data modeling of all these features, they were classified by RF and SVM by dividing the data into 70% sample training dataset, 20% testing dataset, and 10% validation dataset.For the prediction of each suitable classes, auxiliary features like MRTF and MRVBF illustrated steep terrain, where high values represent flat areas [with high land suitability class (S1)], while lower values represent high steep areas [with marginal and non-suitable land class, i.e., (S3) and (N2)].The greater relative significance of auxiliary data retrieved from the DEM for land suitability class prediction illustrated the influence of topography as one of the major soil formation components of land suitability for rainfed olive and maize.Almost all the mechanisms involved in soil formation were influenced by topography.Many researchers have indicated that topography has a substantial influence on soil attributes.Rezaei et al. [60] observed that slope plays a meaningful part in soil property changes and soil development.Zadeh et al. [61], during estimation of soil organic carbon (SOC), showed that rainfall, valley depth, terrain surface texture, temperature, channel network base level, and terrain vector roughness were the most relevant ancillary factors.
Similarly, Taghizadeh et al. [18] used various auxiliary features for predicting land suitability classes in the Kurdistan province of Iran.The auxiliary features were MrRTF, MrVBF, NDVI, SAVI, BI, valley depth, cross-sectional curvature, plan curvature, and slope, which were used to assess land suitability classes.Other researchers have observed similar findings [38,39].Nabiollahi et al. [62] used a digital soil mapping method to analyze spatial variation in SOC stocks under land-use change in western Iran and found that the important auxiliary features were: TWI, MRRTF, MRVBF, NDVI, Band 3, and Band 4 of Landsat 8. Suleman et al. [63] conducted land suitability assessments for olive cultivation in Turkey and also used the auxiliary data of elevation to mitigate the effect of lower temperature and snow, which restricts growth period and production of olives.Soil with slope less than 25% is preferred for cultivation, and aspect data were used to identify flat and south-facing areas.
cultivation in Turkey and also used the auxiliary data of elevation to mitigate the effect of lower temperature and snow, which restricts growth period and production of olives.Soil with slope less than 25% is preferred for cultivation, and aspect data were used to identify flat and south-facing areas.

Comparison of the ML and Traditional Technique
The performance of two ML models (SVM and RF) and the traditional technique of WOL was employed for the land suitability classification of the agroforestry system of rainfed olive and maize based on overall accuracy and Kappa index (Table 2).The overall accuracy and Kappa index for prediction of land suitability classes for the rainfed olive through RF (0.94 and 0.90, respectively) were higher than SVM (0.92 and 0.88, respectively) and the classification done by WOL (0.91 and 0.87, respectively).On the other hand, the overall accuracy and Kappa index for the land suitability classification for rainfed maize crop performed by RF (0.94 and 0.91, respectively) was higher than SVM (0.93 and 0.90, respectively).

Comparison of the ML and Traditional Technique
The performance of two ML models (SVM and RF) and the traditional technique of WOL was employed for the land suitability classification of the agroforestry system of rainfed olive and maize based on overall accuracy and Kappa index (Table 2).The overall accuracy and Kappa index for prediction of land suitability classes for the rainfed olive through RF (0.94 and 0.90, respectively) were higher than SVM (0.92 and 0.88, respectively) and the classification done by WOL (0.91 and 0.87, respectively).On the other hand, the overall accuracy and Kappa index for the land suitability classification for rainfed maize crop performed by RF (0.94 and 0.91, respectively) was higher than SVM (0.93 and 0.90, respectively)   Based on overall accuracy and the Kappa Index [64,65], the two ML algorithms (RF and SVM) and the traditional technique of WOL had strong, moderate, and poor capabilities for predicting soil and land suitability classes for a rainfed olive-maize agroforestry system (Table 3).Statistically, the performance of RF was better than the other two methods in terms of predicting suitability classes.When compared to single classifiers, ensemble learning techniques (e.g., RF) indicated effectiveness and robustness to noise could handle a large amount of input data [66].The area classified as highly suitable through ML (RF and SVM) and traditional method (WOL) was 5%, 9%, and 25%, respectively, of the total area for olive and about 6%, 7%, and 21%, respectively, of the total area classified in the highly suitable class (S1).The area of S1 class had ideal conditions for an olive and maize agroforestry system.This S1 categorization was due to lower slope, optimal temperature and rainfall amount, highly suitable soil pH, and appropriate soil texture, while only the SOC was less than the required range but could be augmented by adding manure or fertilizers.Most of the S1 class mainly represents valley areas, which have almost no limitation for such a system and do not require any special treatment for soil improvement.A study conducted by Mukhtar Elaalem [67], through the comparison of parametric and fuzzy multicriteria methods for evaluating land suitability for olive in Libya, classified the area of S1 as 30% (92,819 ha), S2 as 50% (154,698 ha), S3 as 6% (18,564 ha), and N1 which was 9% (27,846 ha).

Moderately Suitable Class (S2)
The rating scale for the S2 class between ML algorithms ranged between 65 and 85 [48].The percentage of area which was classified as S2 through RF, SVM, and WOL for olive production was 21%, 21%, and 57%, respectively, while the area classified for the maize crop through RF, SVM, and WOL was 34%, 49%, and 63%, respectively, of the total area.As compared to the S1 class, the S2 class had slight limitations owing to slope, elevation, pH, SOC, temperature, and rainfall.Soil texture was optimal for olive but had limitation for maize and thus was classified into S2.The majority of these limitations can be overcome by management; for example, slope could be improved with the construction of terraces, SOC could be improved by adding manure or green manure, and pH could be reduced by adding lime in S2 areas [68].

Marginally Suitable Class (S3)
The rating scale of the S3 class ranged between 40 and 60 [48].The percentage of the area which was classified under this class through RF, SVM, and WOL for olive was 56%, 51%, and 16%, respectively, while the percentage of the area classified through RF, SVM, and WOL for maize was 40%, 27%, and 16%, respectively.The limitations of soil physicochemical attributes, topography features, and climatic factors were lower than in the S2 class, and thus the potential success of olive and maize crops was lower than in the S2 class.Since, in the S3 class, slopes are steep and the area is largely located on high elevation where widespread watering is not viable during times of drought, these areas would not be optimum for integrated maize-olive systems (Table 4).The rating scale for N2 ranged between 0 and 30 [48].The percentage of area which was classified as N2 through RF, SVM, and WOL for olive was 18%, 19%, and 2%, respectively, while the area classified through RF, SVM, and WOL for maize was 20%, 17%, and 0%, respectively.This area has the limitations of steepness, shallow soil depth, high pH, and low temperature.Such areas cannot be improved by any means and should be avoided for plantation cultivation.The overall land suitability classification for the olive and maize agroforestry system in rainfed areas is shown in Figure 5. Ostovari et al. [69] used GIS and multi-criteria decision-making for analyzing and creating a land suitability map of 51,831 ha in the northwest region of East Azerbaijan province, Iran for rapeseed (Brassica napus).Their findings revealed that 0.81% (420.8 ha), 42.3% (21,940 ha), and 11.7% (6104 ha) of the study region were highly suitable (S1), moderately suitable (S2), and marginally suitable (S3), respectively, and 39.72% (20,586) and 0.95% (492.1 ha) were currently and permanently unsuitable for rapeseed cultivation.In their studied region, soil characteristics and topographical variables (elevation and slope) were the most influential variables for identifying the N2 class.

Discussion
Land suitability classification for an integrated olive-maize agroforestry system predicted four classes in the study area: highly suitable area (S1), moderately suitable area (S2), marginally suitable area (S3), and permanently non-suitable area (N2).The classification was performed by ML methods (RF and SVM) and a traditional method (WOL).Statically, the ML RF method performed better than the SVM and weighted overlay WOL.

Discussion
Land suitability classification for an integrated olive-maize agroforestry system predicted classes in the study area: highly suitable area (S1), moderately suitable area (S2), marginally suitable area (S3), and permanently non-suitable area (N2).The classification was performed by ML methods (RF and SVM) and a traditional method (WOL).Statically, the ML RF method performed better than the SVM and weighted overlay WOL.The area of S1 was predicted through RF to be 9%↑ than that of SVM and was 8%↓ than the area predicted by WOL.The area of S2 predicted through RF was 18%↑ than that of SWM and was 17%↓ than the area predicted by WOL.Similarly, the area of S3 class predicted by RF was 27%↓ than that of SVM, and 45%↓ than the area predicted by WOL.On the other hand, the area of N2 predicted by RF and SVM was 6%↑ than area predicted through WOL.Thus, the WOL technique under-predicted the N2 class and over-predicted the S2 class.Similar results were produced by Mubashir et al. [6] during their land suitability analysis for sugar cane (Saccharum officinarum) in the Bijnor district in India.
From this research, it was confirmed that a sizeable area of the district had great potential for an integrated olive and maize agroforestry system, and that the unrealized area (currently uncultivated) could be used for cultivating this system.Therefore, by using the maps of this study, farmers, especially those who are involved in maize farming, can also incorporate olive trees within rows to enhance profitability and ecosystem services.Such a plantation of olive can also fulfill fodder needs and reduce the pressure on existing forest resources for fuelwood requirements.Thus, such dual-use cropping systems will enhance land productivity and help meet demands of the local population by ensuring food security in this mountainous region.
Maps produced by ML are recommended for land suitability evaluation.This is especially true in a country like Pakistan where soil data are scarce.As a result, ML and ancillary data may be an appealing strategy for large-scale land suitability analyses.From the results, maps produced through ML algorithms predicted more area of N2 and S3 classes compared with weighted overlay.However, the weighted overlay predicted more area of S1 and S2 classes than the ML technique.Future studies could evaluate the potential production of an integrated olive-maize agroforestry system.Finally, the same method presented in this paper could be conducted for land suitability assessments of different crops.
Climate change and shifting agricultural land use threaten agriculture production and food security.Agroforestry systems can enhance land productivity while reducing climate change effects, such as through improved soil moisture loss and reduced belowcanopy temperature [38,39].Production potential from rainfed agriculture is low, but this dual-use cropping system may increase productivity and improve the livelihood of farmers.Future land suitability assessments can be used to analyze suitable areas for various agroforestry systems in different regions of Pakistan according to the suitability of climate, soil, and topography.Future additional research may include implementing the establishment of maize-olive in highly suitable land and verifying responses, as well as conducting additional agroforestry intercropping suitability assessments for the region under investigation.

Conclusions
This research identified suitable land for an olive-maize intercropped agroforestry system in rainfed areas for diversifying systems and markets.The elevation, valley depth, aspect, SAVI, NDVI, BI, flow accumulation, MRTF, MVBF, cross-sectional curvature, land slope factor, EVI, plane curvature, and longitudinal curvature were the most important ancillary data.Considering the Kappa index and overall accuracy scores, the RF algorithm outperformed SVM and WOL in determining land suitability of an integrated olive-maize rainfed agroforestry system in the Khyber Pakhtunkhwa district.Overall, RF provided several advantages over other modeling techniques and was deemed optimum for determining land suitability classes.

Figure 1 .
Figure 1.Study area along with soil sample (n = 701) locations in Khyber Pakhtunkhwa province.Figure 1. Study area along with soil sample (n = 701) locations in Khyber Pakhtunkhwa province.

Figure 1 .
Figure 1.Study area along with soil sample (n = 701) locations in Khyber Pakhtunkhwa province.Figure 1. Study area along with soil sample (n = 701) locations in Khyber Pakhtunkhwa province.

Figure 2 .Figure 2 .
Figure 2. Flow chart methodology represents the land suitability classification through machine learning (RF, SVM) and traditional technique (WOL).

Figure 5 .
Figure 5.Land suitability maps for an integrated olive-maize agroforestry system in rainfed areas through ML algorithm (A) random forest (RF), (B) support vector machine (SVM), and through traditional method of (C) weighted overlay (WOL).

Table 1 .
Data types and sources used in ML and traditional techniques.

Table 1 .
Data types and sources used in ML and traditional techniques.

Table 2 .
Descriptive statistics for soil properties at 30 cm depth and 701 regions.

Table 2 .
Descriptive statistics for soil properties at 30 cm depth and 701 regions.

Table 3 .
Error criteria used for the land suitability classification for agroforestry system of olive and maize crop through ML (RF: random forest; SVM: support vector machine) and traditional technique (WOL: weighted overlay).

Table 4 .
Area of the land suitability classes in hectare for the agroforestry system of olive and maize through ML (RF, SVM) and traditional technique of weighted overlay.