Remote Sensing–Based Urban Green Space Detection Using Marine Predators Algorithm Optimized Machine Learning Approach

Information regarding the current status of urban green space is crucial for urban land-use planning and management. -is study proposes a remote sensing and data-driven solution for urban green space detection at regional scale via employment of state-ofthe-art metaheuristic and machine learning approaches. Remotely sensed data obtained from Sentinel 2 satellite in the study area of Da Nang city (Vietnam) are used to construct and verify an intelligent model that hybridizes Marine Predators Algorithm (MPA) and support vector machines (SVM). SVM are employed to generalize a decision boundary that separates features characterizing statistical measurements of remote sensing data into two categories of “green space” and “nongreen space”. -e MPA metaheuristic is used to optimize the SVM training phase by identifying an appropriate set of the SVM’s hyperparameters including the penalty coefficient and the kernel function parameter. Experimental results show that the proposed model which processes information provided by all of the Sentinel 2 satellite’s spectral bands can deliver a better performance than those obtained from the model based on vegetation indices. With a good classification accuracy rate of roughly 93%, an F1 score� 0.93, and an area under the receiver operating characteristic� 0.98, the newly developed model is a promising tool to assist local authority to obtain up-to-date information on urban green space and develop plans of sustainable urban land use.


Research Background and Motivation
In many regions around the globe, fast pace of urbanization leads to various problems including traffic congestion, poor air quality, and noise pollution. As pointed out by Xian et al. [1], urbanization significantly transforms the landscape from the natural surface types to impervious surface such as housing, commercial building, and infrastructures. is transformation is happening in a large spatial extent and an increasing speed due to a burgeoning pressure on additional housing and commercial/industrial areas. Moreover, developments of urban lands consume areas of green land areas and bring about negative impacts on the urban environments [2][3][4]. Worsening living environment caused by lack of green space is a major issue for human heath because roughly 54% of people in the world are living in urban areas [5].
Urban green space is generally defined as green infrastructure that contains vegetated spaces including urban parks, road, and workplace green space [6]. Previous works have recognized the crucial role of green space for reducing the adverse impacts of urbanization in both aspects of urban ecosystem and socioeconomics [7][8][9][10][11][12][13][14][15]. e World Health Organization has identified that green spaces are innovative methods for enhancing the quality of urban environments via improvements of local resilience and promotions of sustainable lifestyles [15]. Turaga et al. [16] state that urban green spaces become a critical asset because they deliver various benefits including aesthetics enhancement, pollution reduction, positive effects on physical and mental health of citizens, urban heat island reduction, and groundwater recharge.
erefore, there is a raising societal support for protection and development of green space in urban areas. erefore, up-to-date spatial information regarding the current status of urban green space is crucial for urban landuse planning and management. is information has become increasingly difficult to obtain via conventional landscape surveying approaches since green spaces have been constantly modified, fragmented, and dispersed due to the fast pace of urbanization. Moreover, surveying tasks at a regional scale are daunting because of both time and labor consumptions required for field data acquisition, processing, and report.
us, there is a pressing need for advanced methods to automate the green space surveying task.
Recently, medium-resolution imagery coupled with advanced machine learning methods has provided effective solution for urban landscape survey [17][18][19][20][21][22]. Remote sensing data used with geographic information system (GIS) can be used to generate thematic maps to assess green vegetation cover at a regional scale. Data extracted from such map can be helpful for further data analyzing processes regarding the size, shape, and other landscape pattern of urban green space [23][24][25][26][27][28].
Rafiee et al. [13] relies on Landsat ematic Mapper images to study the patterns of green areas; this study employs combined techniques of remote sensing image classification, landscape metrics assessment, and vegetation indices. El Garouani et al. [29] employ the maximum likelihood supervised classification to analyze data obtained from Landsat's bands with a spatial resolution of 30 m; the authors investigate the relationship between urbanization and land use changes as well as the effect of the increase in impervious surface areas. Urban green space distribution has been modeled in [7] with the use of remote sensing, GIS technology, and normalized difference vegetation index. Do et al. [3] relies on Landsat 8OLI (Operational Land Imager) image datasets provided the United States Geological Survey to study green space patterns; the support vector machine (SVM) has been used by the authors for the task of image data classification; the overall accuracy of the proposed machine learning model is 82.70%.
Li et al. [30] construct land-use and land-cover maps including green spaces using Landsat Operational Land Imager (OLI) and Enhanced ematic Mapper Plus (ETM+) imagery; convolutional neural network, random forest, and SVM are the employed machine learning models used for image data classification; this study reports a classification accuracy of 84.40% on the testing dataset. Dinda et al. [31] construct an integrated model for studying urban growth and associated green space loss; the model relies on maximum likelihood classifier, artificial neural network, and SVM for performing pattern recognition task; the SVM model has attained the most desired classification accuracy and the area under the receiver operating characteristic curve (0.906).
It is noted that besides SVM, deep neural networks (DNNs) have also been successfully applied in remote sensing-based land-use classification [32][33][34]. DNNs are highly appropriate for image categorization due to its convolution operator based autonomous feature extraction phase [35]. However, successful implementations of DNNs often require a large number of training image samples. e computational expense of DNNs is generally significant due to the time-consuming training process used to fine-tune the networks' weights. Moreover, because deep learning models have a quiet large number of hyperparameters (e.g., the number of hidden layers, the number and the size of convolution operations, the size of the pooling operations, etc.), the process of identifying a suitable network architecture can be tedious. Moreover, since the extracted features are represented as numerical data in this study, the application of SVM can be highly appropriate. It is because SVM has been proven to be a capable tool for classifying extracted numerical datasets [19,[36][37][38][39].
Based on literature review, there is an increasing trend of applying machine learning in remote sensing-based urban green space study. Since the problem of interest is challenging due to the involvement of multivariate and nonlinear data analysis, other advanced machine learning solutions need to be investigated to improve the urban green space detection accuracy. Moreover, the current literature also points out that individual machine learning methods are the commonly employed approach. Hybrid machine learning models that harness advantages of various computational intelligence techniques are rarely investigated to construct urban green space detection models. Specifically, previous studies have mainly relied on the individual machine learning approach [3,13,29,31], and the employment of metaheuristic algorithms used for optimizing machine learning based remote sensing data classification has rarely been proposed and investigated. erefore, the original contribution of the current work is proposing a hybridization of SVM machine learning and metaheuristic optimization used for remote sensing-based urban green space detection.
SVM [40] is considered to be a capable pattern recognizer with excellent generalization capability. It is due to the fact that the model structure of this machine learning method is learnt via the framework of structural risk minimization which is resilient to overfitting and noisy data [41]. Nevertheless, the model construction phase of a SVM model requires a proper setting of its two hyperparameters including the penalty coefficient and the kernel function parameter. e former specifies the amount of penalty imposing on data samples having classification errors. e later determines the locality of the employed kernel function which directly influences the generalization of the constructed model. e task of determining hyperparameters of a machine learning model is known as model selection [41] and can be modeled as an optimization problem. For the case of a SVM model, this is a challenging task because of several reasons. First, the landscape of the objective function is unknown and not differentiable. Second, the hyperparameters must be searched in continuous space; therefore, there is an indefinite number of feasible solutions. is fact means that an exhaustive search on the hyperparameters is infeasible. erefore, various scholars have resorted to metaheuristic algorithms for dealing with the model selection problems. e role of metaheuristic algorithms in the task of hyperparameter setting (also called model selection) is indeed crucial. ese algorithms are used to optimize the performance of machine learning model to achieve a balance between model accuracy and model generalization.
Marine Predators Algorithm (MPA), first introduced in [60], is a recently proposed nature-inspired metaheuristic inspired from the foraging strategy of marine predators. is metaheuristic is characterized by a novel combination of Lévy and Brownian movements used for enhancing the optimization performance. e capability of MPA has been demonstrated by various optimization tasks [60]. Nevertheless, the performance of this metaheuristic used in optimizing machine learning models has rarely been investigated. Hence, this study proposes to hybridize MPA and SVM to form an integrated intelligent model used for remote sensing-based urban green space detection.
Remote sensing data obtained from Sentinel 2 satellite in the study area of Da Nang city is used to train and verify the MPA-SVM hybrid model. In this work, the MPA optimized SVM model trained by remote sensing data with all of the Sentinel 2's spectral bands is compared with the models that use commonly employed vegetation indices including normalized difference vegetation index (NDVI) [61], normalized difference water index (NDWI) [62], soil-adjusted vegetation index (SAVI) [63], and MERIS terrestrial chlorophyll index (MTCI) [64]. e rest of the article is organized as follows: Section 2 reviews the research methodology and material. Section 3 presents the proposed MPA optimized SVM used for remote sensing-based urban green space detection. Experimental results are reported in Section 4. Concluding remarks of the current study are summarized in Section 5.

General Description of the Study Area and Remote Sensing
Data. As mentioned earlier, urban green spaces play a significant role in the urban living environment; they serve a variety of functions including climatic modification, aesthetics, recreation, and physical/mental health improvement. Nevertheless, due to the physical expansion of Da Nang city (Vietnam), certain areas of green spaces have been replaced by impervious surface such as buildings and roads. erefore, the current status of urban green space in this city needs to be updated in a timely manner and this city has been selected as the study area of this research work.
Da Nang is a crucial coastal city located in Central Vietnam. Da Nang's location is at 15 o 55' to 16 o 14'North and 107 o 18' to 108 o 20' East [3]. It is the third largest city within the nation with a population of about 1 million. Da Nang urban center (refer to Figure 1) is located in the eastern section of the area and consists of six districts: Hai Chau, Cam Le, anh Khe, Lien Chieu, Ngu Hanh Son, and Son Tra [65]. e rural districts of Hoa Vang and Hoang Sa also belong to Da Nang city but are not included in the Da Nang urban center; therefore, these two rural districts are excluded from the study area.
To survey the urban green space status of Da Nang city, remote sensing data in form of spectral bands have been collected from Sentinel 2 on July 16, 2020. ese spectral bands (see Table 1) are provided openly by USGS [66]; they can be processed and analyzed by Sentinel Application Platform (SNAP) software package [67] as well as ENVI software package [68]. Using the open-accessed tools of SNAP, the original Sentinel 2's spectral bands are converted to TIF format via the geometric operation of resampling. Moreover, it is noted that the used map projection of the obtained images is Universal Transverse Mercator (UTM) within Zone 48 N-Datum World Geodetic System (WGS) 84. Images of the Sentinel-2 spectral bands are demonstrated in Figure 2. is figure demonstrates the 13 spectral bands obtained from the Sentinel-2 on July 16, 2020. ese bands are coastal aerosol, blue, green, red, red-edge 1, red-edge 2, red-edge, near infrared, near infrared narrow, water vapour, shortwave infrared/ cirrus, shortwave infrared 1, and shortwave infrared 2. e wavelength range and resolution of each spectral band are provided in Table 1.

Remote Sensing-Based Vegetation Indices.
In remote sensing field, vegetation indices have been widely used to extract vegetation biophysical information from satellite image data [69]. Previous works have demonstrated the effectiveness of vegetation indices in remote sensing-based green space mapping [7,13,[70][71][72]. erefore, this study relies on such conventional indices as a means of urban green space detection. e employed vegetation indices include normalized difference vegetation index (NDVI) [61], soil-adjusted vegetation index (SAVI) [63], normalized difference water index (NDWI) [62], and MERIS terrestrial chlorophyll index (MTCI) [64].

e Used Metaheuristic and Machine Learning Approaches
, is a stochastic global optimization algorithm inspired from the widespread foraging strategy of marine species such as sharks and tunas. e foraging strategy of these marine species effectively utilizes Lévy and Brownian movements along with optimal encounter rate policy in biological interaction between predator and prey [76][77][78].
e searching process of MPA consists of three phases considering three scenarios: (i) high velocity ratio when a prey is moving faster than a predator, (ii) unit velocity ratio when the rates of movement of a prey and a predator are similar, and (iii) low velocity ratio when the rate of movement of a predator is higher than that of a prey. e searching operation of the MPA metaheuristic is demonstrated in Figure 3. Let XE be the position of predators and XP be the position of preys within a marine ecosystem. e 1 st phase aims at search space exploration and is applied for the first one-third of the searching iteration number; the The study area Da Nang urban area   Mathematical Problems in Engineering 5 mathematical equation used to revise the prey position is given by where ⊗ is an entry-wise multiplication operator. R B is a vector including random numbers generated from a normal distribution which mimics the Brownian motion.  where R L denotes a vector of random numbers generated from the Lévy distribution which represents the Lévy movement. e positions of the second half of the population members are updated as follows: where CF � (1 − (Iter/MaxIter)) (2(Iter/MaxIter)) ; Iter and MaxIter are the current iteration count and the maximum number of iterations, respectively. e last phase of the optimization process aims at exploitation of the search space. e population members' positions are updated in the following equation: In addition, to model behavior shift in marine predators according to the eddy formation or Fish Aggregating Devises (FADs) effects [79], the MPA metaheuristic employs the following operation: where FADs � 0.2 denotes the probability of the FADs effect. Update the agents using equation (5) If MaxIter/3 < Iter < 2xMaxIter/3 Update the 1 st half of the agents using equation (6) Update the 2 nd half of the agents using equation (7) If Iter > 2xMaxIter/3 Update the agents using equation (8) Update the agents using equation (9) If r > FADs Update the agents using equation ( Mathematical Problems in Engineering 7 boundaries of the searched variables. r1 and r2 denote two random indices. [40], support vector machines (SVM) have gained attentions of the academic community and have become a preeminent pattern recognition approach [55,[80][81][82][83][84][85][86][87][88][89][90]. Given a data sample set S drawn from a data universe X U , a hidden target function f: X ⟶ 0, 1 { }, we first create a labeled training dataset D, where D � (x, y)|x ∈ S and y � f(x) . e SVM machine learning can be used to estimate the target function f(x) by constructing a function f(x):

Support Vector Machine. Introduced by Vapnik
Herein, for the task of urban green space detection, the data label can be modeled as "0" � nongreen space (the negative class) and "1" � green space (the positive class). e input data X are properties of the Sentinel 2's spectral bands.
To construct a SVM model, it is required to solve the following constrained optimization problem [91]: where w ∈R n and b ∈R are used to construct a classification hyperplane used for pattern classification. e is the vector of slack variables. C denotes the penalty coefficient. φ(x) denotes a nonlinear data mapping used for dealing with data that cannot be linearly separated. One advantage of the SVM method is that the explicit formula of φ(x) is not required. To construct a SVM model, only the dot product of φ(x) is necessary. e dot product of two data samples x k and x l is represented as a kernel function K(x k , x l ): For multivariate and nonlinear data classification problem, the radial basis kernel function (RBKF) is commonly utilized: where σ denotes a hyperparameter of the RBKF. By solving a Lagrangian dual of the aforementioned constrained optimization problem and using a quadratic programming solver, the SVM model used for data classification can be expressed compactly as follows [91]: where α k represents the solution of the optimization problem; SV denotes the number of support vectors.

The Proposed Marine Predators Algorithm Optimized Machine Learning Approach for Urban Green Space Detection
is section of the article is dedicated to describing the integrated model used for remote sensing-based urban green space detection. e core of the proposed model is a hybridization of the MPA metaheuristic and the SVM machine learning. ese two methods work synergistically to analyze patterns hidden in a set of remotely sensed data collected for the study area of Da Nang urban center. In detail, SVM is used to construct a decision boundary that separates the input data space into two distinctive regions of "nongreen space" and "green space".
To further enhance the performance of the SVM model, MPA is utilized to autonomously fine-tune the SVM training process by identifying a set of appropriate model hyperparameters.
e optimized hyperparameters include the penalty coefficient and the RBKF parameter. In this study, the searching range of the penalty coefficient is [1,100]; the searching range of the RBKF parameter is [0.1, 100]. ese two hyperparameters strongly influence the learning and the predictive capability of the integrated urban green space detection model. A too large penalty coefficient or a too small RBKF parameter leads to overfitted models. On the other hand, a too small penalty coefficient and a too large RBKF parameter tends to construct underfitted models [92]. erefore, the role of the MPA is to find a set of the penalty coefficient and the RBKF parameter which features a balance between predictive accuracy and modeling generalization. Accordingly, it is expected that the constructed model will not suffer from either overfitting or underfitting. e overall model structure is presented in Figure 4. e 13 spectral bands are obtained from the Sentinel 2 satellite and processed via SNAP [67] and ENVI [68] software packages. To accelerate the computing process, the images of spectral bands are divided into blocks with the size 5 × 5 pixels. For the purpose of data classification, the average (μ c ) and the standard deviation (σ c ) of gray intensity of each image block are computed and used as numerical features by the integrated MPA-SVM model. e average and standard deviation of gray intensity are given by [93] where I i,c � 0,1,2, . . ., 255. NL � 256 represents the number of discrete color values. P(I) is the first-order histogram of an image block [94].
To construct the integrated MPA-SVM model for urban green space detection, it is necessary to prepare a training dataset with assigned ground truth labels. is study has  performed sampling process to collect data in the nongreen space and green space areas within the study area (demonstrated in Figure 5). It is noted that the ground truth label of each image data in this study has been verified by field trips and Google Earth Engine. Each block is sampled with the size of 25 × 25 pixels to generate nonoverlapped image patches with the size of 5 × 5 pixels. After the data sampling process, there are 1,000 image patches available for the feature extraction operator. Herein, each class label contains 500 image patches to guarantee a balanced classification. e collected dataset is illustrated in Table 2. Notably, each spectral band yields two statistical measurements (i.e., the mean and standard deviation) and there are 13 bands. us, the total number of features used for classification is 13 × 2 � 26.
It is worth noticing that the extracted dataset including the input features which characterize statistical properties of the spectral bands and the corresponding class labels has been randomly separated into a training (70%) dataset and a testing dataset (30%) [95]. e first set is used for model training and the second set is used to inspect the model predictive capability. In addition, to standardize the input data range, the Z-score equation is used as follows [96]: where X Z and X D denote the normalized and the original features, respectively. M X and STD X denote the mean value and the standard deviation of the features, respectively.
To optimize the SVM model used for urban green space detection, the objective function of the MPA metaheuristic has employed a 5-fold cross validation process and the indices of false negative rate (FNR) and false positive rate (FPR). is objective function (OF) is described as follows [96]: where FNR k and FPR k denote FNR and FPR computed in the kth run, respectively. e FNR and FPR indices are given by [96] FNR � FN FN + TP , where FN, FP, TP, and TN are the false negative, false positive, true positive, and true negative data samples, respectively.
In this study, the source code of the MPA metaheuristic is provided by Faramarzi et al. [60]. For the purpose of model optimization, the integrated MPA-SVM has been constructed in MATLAB. e SVM model with an optimized set of hyperparameters is then developed in Visual C# .NET framework 4.7.2 to process and analyze remote sensing data. e SVM model in Visual C# .NET has been built with functions provided by the Accord.NET Framework [97]. Moreover, the program has been implemented with the ASUS FX705GE-EW165T (Core i7 8750H and 8 GB Ram) platform.

Experimental Results and Discussion
In this section, a set of performance measurement indices is used to express the model predictive accuracy. is set includes classification accuracy rate (CAR), precision, recall, negative predictive value (NPV), F1 score, and area under the receiver operating characteristic curve (AUC) [98,99]. e calculation of AUC is described in [99]. e formulas used to compute CAR, precision, recall, NPV, and F1 score are given by where N C and N A are the numbers of correctly predicted data and the total number of data, respectively.
Besides the MPA-SVM model which utilizes information provided by 13 spectral bands, this study has employed the MPA-SVM models using the aforementioned vegetation indices as benchmark models. erefore, the feature extraction phase of the benchmark models is similar to that of the MPA-SVM-13B.
is feature extraction phase also computes the two indices of mean and standard deviation of image patches. e model optimization processes of the constructed models are demonstrated in Figure 6. Herein, the maximum number of searching iteration (MaxIter) of the MPA metaheuristic has been set to be 100; the number of population members (NP) is fixed to be 20. e detailed   optimization results are reported in Table 3 which shows the best penalty coefficient and RBFK parameters for each model used for urban green space detection. In addition, the best found cost function values of the MPA-SVM-13B, MPA-SVM-NDWI, MPA-SVM-NDVI, MPA-SVM-SAVI, and MPA-SVM-MTCI are provided in Figure 7. It can be observed from Figure 7 that the MPA-SVM using all of the 13 bands results in the lowest value of cost function.
As stated earlier, the constructed dataset has been randomly divided into a training set (70%) and a testing set      (30%). e first set is used for model training and the second set is reserved for model validation. Moreover, in order to reliably evaluate the model predictive performance, this study has repeated the model training and prediction processes 20 times. It is noted that the training and testing datasets are resampled in each run. e statistical measurements obtained from this multiple model construction and validation phases are used for model assessment. is repeated process aims at diminishing the variation caused by the randomness in data sampling. e model prediction outcomes are summarized in Table 4 which shows the mean and standard deviation (Std) of        Figure 14: Urban green space detection map of the study area.
In addition, to confirm the superiority of the proposed MPA-SVM model that employs all of the Sentinel 2's spectral bands, the Wilcoxon signed-rank test [100] with the significant level (p value) � 0.05 is utilized in this study to express the statistical significance of the model performance indices. e test outcomes of pair-wise model comparison with respect to CAR, F1 score, and AUC are shown in Tables 5-7, respectively. Observably, with p values <0.05, the null hypotheses of insignificant model performances can be rejected. erefore, MPA-SVM-13B is confirmed to be the best model which provides the classification performance on the collected dataset. Accordingly, the MPA-SVM-13B model is employed to construct an urban green space map for the whole study area. e mapping outcome is demonstrated in Figure 14. Based on the constructed map, it can be found that the green space occupies roughly 34.40% of the study area. Nevertheless, the green space is not evenly distributed in Da Nang. e majority of the green space is located in Son Tra peninsula within the Son Tra district.

Concluding Remarks
Urban green space plays a crucial role in improving the living quality of urban environment and has a positive effect on citizens' physical/mental health. Nevertheless, few researches have been dedicated to detecting, locating, and quantifying green space in the study of Da Nang urban center. is study is an attempt to fill this knowledge gap by developing a remote sensing and data-driven approach for urban green space detection applied in the study area. Remotely sensed data obtained from the Sentinel 2 satellite are used to train and validate a hybrid metaheuristic-machine learning approach of MPA-SVM.
is hybrid method is employed to construct a decision boundary that separates the input space into two distinctive regions of green space and nongreen space. e experimental results supported by the Wilcoxon signed-rank test show that the MPA-SVM model employing all of the spectral bands is superior to those of the models relying on individual vegetation indices. Good green space detection results with CAR � 93.100%, precision � 0.916, recall � 0.947, NPV � 0.947, F1 score � 0.931, and AUC � 0.979 demonstrate that the proposed method is highly suited for the task at hand. Moreover, the MPA metaheuristic is confirmed to be a capable method for optimizing machine learning models. Accordingly, the green space mapping of the entire study area can be constructed by the proposed hybrid approach. e information provided by the newly developed model can be helpful for local authority to evaluate the status of green spaces in Da Nang city.
Although MPA-SVM has attained a good predictive performance in urban green space mapping in the study area, the proposed approach also has several limitations. e first limitation is that the MPA-SVM model has not been integrated with feature selection algorithms used for dimensionality reduction. In addition, although the RBFK is widely used for SVM-based pattern recognition, the effectiveness of other sophisticated kernel functions (e.g., hybrid kernel functions [101,102]) in urban green space detection should be investigated. Accordingly, future extensions of the current study may include the following: (i) Investigating other state-of-the-art metaheuristic algorithms used for optimizing data-driven urban green space detection (ii) Studying the effects of the maximum number of searching iterations and the number of population members on the performance of the SVM-based urban green space detection models (iii) Employing other advanced texture descriptors to further meliorate the detection accuracy (iv) Performing detection tasks at different time periods to inspect changes and trends in urban green space (v) Performing urban green space detection using highresolution satellite images (vi) Incorporating advanced feature selection algorithms and kernel functions into the current model structure

Data Availability
e dataset used to support the findings of this study has been deposited in the repository of GitHub (https://github. com/NDHoangDTU/MPA-SVM-UGSD).

Conflicts of Interest
e authors confirm that there are no conflicts of interest.