Assessment of urban growth changes in Klang District using Support Vector Machine by different kernel

The growth of urbanization in Klang District was considered to be fast and has increased the concern of policy makers and town planners. This paper assess the changes of urban development in Klang District using Support Vector Machine (SVM) classification by different kernel for the purpose of studying the built up area changes within the year 2017 to 2021. At the initial stage of image processing, Land Use Land Cover (LULC) has been classified based on the use of SVM by different kernel (RBF, Polynomial, Linear, and Sigmoid) which was then reclassify into the built up and non built up after the most accurate kernel has been identified, thus the study was focused on the growth of urbanization. As results, the highest accuracy is RBF Kernel which the LULC that has been classified were 88% in 2017 and 90% in 2021. The RBF Kernel was then used for the classification of built up area and also for the analysis of urban growth. It can be seen that there have been changes for every land use, particularly urban growth by 9.39% (5451.77 Ha). Hence, the pattern of urban sprawl would assist planners and policymakers in planning and managing a better city.


Introduction
The changes of urban can also be known as the process of urbanization. Urbanization is the process of city growth based on the population of people residing in urban area [1]. Urbanization can be characterized to be higher in population in the city and the factor that leads to this growth, is because of the migration from rural to urban. The process of urbanization will involve the mixture of urban fringes. Agriculture is constantly and continuously converting to urban in the process of urban development all around the world [2].
This research required the use of remotely sensed data which have been widely utilized to provide information about the land use land cover, for example the urbanization rate, forest level of degradation and also the changes of human induced [3]. Previous study has shown that Sentinel-2 satellite imagery being widely available and accurate to differentiate land cover classes [4] and it outperforms other satellite images [5]. Sentinel-2 has a spatial resolution of 10m, it manages to capture more fine points and details of the land cover categories making it a good and less expensive data [6]. This research may need an advanced type of spatial technique that may consist of manipulation, analysis and modelling to solve problem [7].
Provided the use of GIS along with the data from remote sensing proved to be useful in so many ways in identifying the growth of the urban area and land use land cover that allows authorities to plan IOP Publishing doi: 10.1088/1755-1315/1051/1/012023 2 and manage the city growth based on the related factors such as the capacity of environment, availability of infrastructure and consideration in the terms of socioeconomic [8]. In remote sensing, the processing and analysis of the satellite imagery data always involves the task of classification. When it comes to classification of the imagery, supervised and unsupervised will always come into frame. SVM classification is used to settle with multiclass problems of satellite imagery and has found that the accuracy can reach to 90%. SVM is a binary classifier that creates optimum hyper plane between one desired class and other classes [9]. SVM classification has been proved to be accurate, with the accuracy of 93.42% [10]. Thus, making it a good classifier to be implemented in this study. Furthermore, while kernels have nine mathematical functions, only four types of kernels are typically employed for remote sensing application which involved agriculture, forestry, surface changes and water related. However, these types of kernels would suffice the needs of classifying the built-up and non built-up area. To classify the built-up area, it was referred to the best kernel in terms of accuracy as demonstrated in this study.

Literature review
To begin with, Sentinel satellite imagery is well known by its Multi Spectral Instrument (MSI), which consists of 13 spectral bands. It ranges within the visible range to shortwave infrared (SWIR). Hence, Sentinel-2 has a mission which comprised two polar orbiting satellites constellation in the same room with sun synchronous orbit and it phased 180 degree to one another. In essence, Sentinel-2 has a function in which to monitor the condition of land surface changes with its swath width of 290km. The occurrence of satellite revisiting the earth is also high, to be precise, at equator 10 days and also 5 days which involved 2 satellites under the condition free from clouds and resulting being at the mid latitudes in 2 to 3 days. The previous study confirmed that Sentinel-2 satellite imagery is also suitable for classification of land cover classes compared to Landsat-8 satellite imagery, thus making it a versatile kind of data to be used [5]. For the image classification of the land use land cover, there are actually many ways to do the classification and there is no definite method of classification that is internationally accepted. In general, there are 2 ways of classification that is well known globally which is supervised and unsupervised classification.
SVM is non parametric statistical method for dealing with supervised classification and the problem of regression. There are only four types of kernel that are commonly used to develop different SVM which involved Linear Kernel, Polynomial Kernel, Radial Basis Function (RBF) Kernel and also Sigmoid Kernel [11]. Linear Kernel can also be known as the most basic kernel compared to other kernel and the nature of this kernel is one dimensional. Besides, it is also good to be used when lots of feature involved. As a mathematical function in SVM, it is also much faster compared to other function. To continue, Polynomial Kernel is the manifestation of Linear Kernel but it is a generalized version of Linear Kernel. Plus, mathematical function of Polynomial Kernel is not recommended to be used as it not efficient and accurate. Then, the most preferable function to be used for SVM is RBF. This function always been chosen to solve a non linear data because it can make a good separation of data even without prior knowledge of related data. The last function is the Sigmoid Kernel, this function is preferably used for neural networks. That being so every functions of kernel will produce different outcomes for the image classification.

Materials and methods
This study was conducted several methods as described in Figure 1. It started from data collection, then, proceed with image pre-processing which involved geometric correction and radiometric correction. After that, the process continued by doing image classification based on the use of different kernel. Next, all kernels will go through a process of accuracy assessment and then will be compared to identify the highest overall accuracy of kernel. Once the most accurate kernel has been identified, the process continued with classification of built up area based on the use of the most accurate kernel. Lastly, urban growth analysis was done based on the use of post classification.

Area of study
Basically, the study conducted in Selangor and will be focused on the district of Klang. Klang District is located around the central part of Malaysian Peninsular west coast which covers 580.502 km 2 of area. Figure 2 shows the enlarged area of Klang District. Klang District, it can also be considered as highly developed place and also the fastest region to grow in Malaysia. The region has faced serious urbanisation and environmental problem. The growth of new urban outside the city always being subjected to serious cases of land use for example scattered development, urban sprawl, land shortage and others. The data of Sentinel-2 was collected for 2 different year (2017 and 2021). Both of this satellite consists of 13 spectral bands, with minimum of 10m spatial resolution. Rather, 4 bands were used for this research which involved band 2(490nm), 3(560nm), 4(665nm) and NIR band 8(842nm). Furthermore, four of these bands are the only band that has 10m spatial resolution, due to this fact, it is the most suitable one for urban growth monitoring [12].

SVM classification by different kernel
Image classification is a process to understand the features that can be found in the satellite imagery either in grey scale or colour. That being said, the feature somehow represents the object or land cover that can be found on the earth's surface. It is an important process in digital image analysis. Thus it is important to be applied in this study. For the classification of the imagery, SVM classification was used to classify the LULC. For this stage, the training sample was created by using Region of Interest (ROI), the training sample is important to identify the training region of each class. The training sample created of each land use classes was based on the use of reference image. In this case, Google Earth Pro was used to define the training sample created, this app can be used as it provides with 1m spatial resolution [12] and it was used for both of the satellite imagery of 2017 and 2021.
In addition to the land use classes, it was determined based on the most general of land use classes that can be found on the earth's surface. Furthermore, 30 training samples were taken for each of the land use classes. The processing was then continued by using 4 different kernels which involved Linear Kernel, Polynomial Kernel, RBF Kernel and also Sigmoid Kernel. There were several parameters ( Table 1) that need to be applied for each of the kernel, which it plays an important role in increasing the accuracy of each kernel. After that, the classification using SVM by different kernel will go through a process called accuracy assessment based on the use of confusion matrix or can also be known as an error matrix. Test samples were then generated for each of the classified images and 50 test samples have been generated for both classified images. Besides, the accuracy assessment was based on several measures for instance overall classification accuracy, user's accuracy and producer's accuracy. All of these measures represented in percentage and was calculated from the confusion matrix. The classification for built up area is based on the highest accuracy kernel.

Urban growth analysis
The method used for this part was the post classification comparison. This method was to classify the images that have been corrected in separate form from two different year and at the same time giving suitable marks to different features on the earth's surface. The Post Classification Comparison was done using a tool known as Matrix Union, and Matrix Union is functioned to create matrix from two maps of thematic (2017 and 2021). The process of change detection was done by inserting the thematic map of 2017 and 2021. This process was applied to the best accuracy of classified images based on the outcome produced by different kernels. The data was then presented in area and percentage of the changes occurred for each of the classes.

Results and discussion
The  [15], it was stated that private corporation repopulate the land with trees to ensure that the biodiversity in Malaysia can be restored. According to [16], Gross Domestic Product (GDP) of Agriculture can be seen to decrease started from 2017 and then continue to go downhill as it reached 2021 and this explains why the result of Agriculture to decrease its coverage area in 2021. Moreover, the Agricultural sector will decrease as the growth of Urban increased [17]. The increasing coverage area of urban can be seen in Figure 3, the LULC map were classified by SVM of different kernel.

Comparison of accuracy by different kernel
The highest overall accuracy of classified map is RBF with 88% (2017), 90% (2021) while the lowest accuracy is Sigmoid with 76% (2017) and 81.67% (2021). Overall Kappa (κ) ranges between 0.7 to 0.8. Table 2  The increasing or decreasing of classification accuracy is possibly due to the influence of features on surface and atmospheric [18]. In this case, the increasing accuracy of RBF in 2021 may caused by the lower percentage of cloud cover which is 2.8%. The error penalty (C) and Gamma (γ) that were used in this are 100 and 0.03 and according to [19], big values of (C) and (γ) can aid in achieving a superior classification result. As a result, both the value of (C) and (γ) are crucial in optimising the classification's performance. Polynomial Kernel is more efficient on dealing with a non linear data compared to Linear Kernel, this happens because the kernel function of Polynomial represents the similarity of vectors (training samples) in a feature space over polynomials of the original variables, allowing non-linear models to be learned [20] Linear Kernel happens to be less accurate because, Linear Kernel tends to only deal with a linear data [21], which LULC data usually have a different set of classes because the LULC was developed in different directions or in other words different set of spectral profile at the same time [22] [23]. These studies came to conclusion that Sigmoid Kernel is  Additionally, Overall Kappa that is considered to be very good is above 0.8 and by referring to the Table 2 (2017) and Table 3 (2021), RBF, Polynomial and Linear Kernel can be categorized in a very good range of Kappa. In terms of overall accuracy, 85% is the rule of thumb that is considered to be accurate and in this case RBF and Polynomial for both years have scored the highest overall accuracy. However, for Sigmoid Kernel in 2017 and 2021, the Overall Kappa were 0.7059 (2017) and 0.7800 (2021) which can be categorized in a good range of Kappa.

Classification map of built-up and non built-up area
Since the study is focused on assessing the urban growth, reclassification has been made to LULC of 2017 and 2021. All the LULC classes were reclassified into Built-Up and Non Built-Up classes (Figure 4). The Built-Up area was 14257.129 Ha in 2017, which is 24.56%, while the Non Built-Up area was 43793.071 Ha. Furthermore, in 2021, the Built-Up area, which is 19382.962 Ha, expand by roughly 9%, which is around 33.39%, causing the Non Built-Up area to shrink in terms of coverage area to 38661.433 Ha.
Due to the fact that RBF has produced the highest accuracy, RBF Kernel has been used to produce the map of built-up area. Hence, SVM is really powerful with the use of RBF Kernel [24]. Based on the previous study, RBF Kernel can be used to solve a multiclass problem, thus making it a versatile type of kernel to be used to extracting urban from any other land use [24]. On top of that, RBF Kernel was the best kernel to be used for urban classification as it provided the best result for urban classification and it has the ability to differentiate the urban and bare land [25]. Moreover, in 5 years time, built-up area has increased roughly 9% and the biggest contribution to the urbanization are caused by the increase of natural population and also migration from rural to urban [26]. Thus, Klang also known as Greater Kuala Lumpur (GKL) and rapidly urbanized throughout the years due to the pull and push factor [27]. The example of pull factor may caused by job opportunities, education, social attraction and, amenities and facilities. The example of push factor caused by the unemployment, lack of services and also poverty.

Assessment of urban growth in Klang District
The following results show the changes that has occurred from the year of 2017 and 2021. On top of that, 5 types of classes have been created to represent every class of land use for map of urban growth ( Figure 5). Based on the Table 4, the changes will occur regardless of the circumstances, as land use changes are the result of a complex interplay between humans and their surroundings [28]. Moreover, Deforestation (1.15%) in Klang District has occurred since 2015 and it was continuously to decrease in area until the year of 2021 [29]. Additionally, NASA's website stated that, the factors that leads to deforestation around the world are the activity of agriculture, logging and also the expansion of urban area. In relation to the factors stated, the reasons of why deforestation occurred in Klang is caused by the cultivation of palm oil and also unsustainable logging [30].  Other than that, based on the result above it can be seen that the Afforestation (0.41%) activity is slower in Klang District within 5 years time. Regardless, the authorities have taken the initiative to plant trees, given the fact that the pull and push aspect of migration has put pressure on the land for urbanisation to occur in order to adapt to the expanding population in Klang District. The Free Tree Society (FTS) has been working really hard to increase the tree population in Klang District, as well as throughout the country, in order to maintain or restore the biodiversity in Malaysia [15]. This initiative was to make sure that the ecosystem in Klang District finds balance despite the expanding of urban area due to the growth of human population.
Over and above that, Table 4 indicates that the rate of change in urbanisation is 9.39%. The increase rate of urbanization has also caused the human population to be increased. According to the United Nations (UN), Klang's population was around 287500 in 2017, however, the population in Klang District began to climb around 1 million in 2021 [31]. Based on the previous study, the expanding rate of population in Klang District caused by the pull and push factor and sometimes referred as pull and attractive factors, which has made people from less developed area to move to the most developed area [32]. Besides, this situation also happened due to the imbalance development among regions in Malaysia. Some of the factors that have caused people to migrate to Klang District are good infrastructure, new area of residential, some areas have an acceptable cost of living, near to office, much better environment and physical features, good community, areas that have a good planning, job and also good opportunities for education.
Therefore, based on the factors stated above, the most significant contributors to urbanization are natural population growth and migration from rural to urban areas. Next, based on Table 4, Others in this case is referred to classification that has an unrelated changes for example Water Body to Urban, Urban to Water Body, Urban to Forest and others. However, all of these changes could also happen as some of the changes occur due to the demolition of features on land for example a demolished building can change the feature of land to bare land. Misclassification of agriculture also happened as some of the paddy field was misclassified as water body, due to the fact of the paddy just planted and started to grow.

Conclusion
The population of Klang has increased drastically from 287500 in 2017, and around 1 million in 2021. This happened due to the pull and push factor. Malaysia's population distribution trend is expected to increase until 2050. The existence of GIS technology has helped on masking the area of Klang District and also the existence of the Klang District shapefile helped to better analyze the LULC and the built up area in Klang District. Meanwhile, Remote Sensing technology was used for a much deeper analysis on the area of every land use, accuracy for every algorithm, and also how many changes that had occurred throughout the year of 2017 and 2021.The best kernel to be used for classifying the LULC and Built up area in Klang District was the RBF Kernel, as this kernel has been said to have the ability on solving non linear type of data and some studies also supported that RBF was the best kernel to be used in separating the built up and non built up area, thus making it the most versatile type of kernel to be used in any type of land use separation and at the same time making it the best kernel to be used for the urban growth analysis.