Improvement in the Accuracy of the Postclassification of Land Use and Land Cover Using Landsat 8 Data Based on the Majority of Segment-Based Filtering Approach

Improvement in the accuracy of the postclassification of land use and land cover (LULC) is important to fulfil the need for the rapid mapping of LULC that can describe the changing conditions of phenomena resulting from interactions between humans and the environment. This study proposes the majority of segment-based filtering (MaSegFil) as an approach that can be used for spatial filters of supervised digital classification results. Three digital classification approaches, namely, maximum likelihood (ML), random forest (RF), and the support vector machine (SVM), were applied to test the improvement in the accuracy of LULC postclassification using the MaSegFil approach, based on annual cloud-free Landsat 8 satellite imagery data from 2019. The results of the accuracy assessment for the ML, RF, and SVM classifications before implementing the MaSegFil approach were 73.6%, 77.7%, and 77.5%, respectively. In addition, after using this approach, which was able to reduce pixel noise from the results of the ML, RF, and SVM classifications, there were increases in the accuracy of 81.7%, 85.2%, and 84.3%, respectively. Furthermore, the method that has the best accuracy RF classifier was applied to several national priority watershed locations in Indonesia. The results show that the use of the MaSegFil approach implemented on these watersheds to classify LULC had a variation in overall accuracy ranging from 83.28% to 89.76% and an accuracy improvement of 6.41% to 15.83%.


Introduction
Land use and land cover (LULC) is a key driver of environmental change and can describe the conditions of changing phenomena resulting from human interaction with the environment [1]. It is important information that has implications for the sustainable use of resources in watershed management activities, as it generally reflects irreparable degradation or loss of land and water resources [2]. e utilisation of land, space, and resources for settlement, agriculture, tourism, industry, and transportation will continue to increase for some time to come [3,4]. Remote sensing data can be used for the analysis, monitoring, mapping, and classification of LULC information. Its availability provides a choice of resolution variations (spectral, spatial, radiometric, and temporal) to detect land changes on the earth's surface, by comparing current multitemporal conditions with those of previous years [5][6][7][8].
e object classification of LULC on the earth's surface based on remote sensing data can be processed with two digital classification methods, namely, supervised and unsupervised classification [9]. e supervised version involves the classification of objects based on training sample input from object classes that appear on satellite images, which are then run using an algorithm to generate LULC information.
ere are several limitations and problems with digital classification results; for example, pixel noise can affect the spatial accuracy and quality of LULC information. In the postclassification stage, LULC can be produced using a spatial filter to reduce this noise and to obtain better results. ese include the mean, standard median, adaptive wiener, Gaussian, adaptive median filters [28][29][30], majority filter [31,32], and object filter based on the topology and feature approach [33]. With the use of spatial filters, pixel noise is sometimes still left. To overcome such an obstacle, majority segment-based filtering (MaSegFil) is proposed in this study as a spatial filter stage in the postclassification, used to classify objects on the earth's surface based on the digital classification results. e purpose of using the MaSegFil approach is to reduce pixel noise from these results and to obtain better information on object classification results.
In this study, we first analyse whether incorporating the MaSegFil approach at the postclassification stage improves the accuracy of the digital classification and reduces the resulting pixel noise. Furthermore, accuracy assessment based on reference data is used to compare the postclassification results before and after the implementation of the MaSegFil approach. e principles of this approach are that (a) the area boundary of the class of objects on the earth's surface seen in the satellite image data will be separated by patterns based on the segment mean shift process (segmentation process); (b) the results of the digital classification that has been made are used as inputs to fill in the attribute class of objects in each area segment that has been separated, based on its segmentation pattern; (c) the extraction of the attribute value of the digital classification results in each area segment (object segmentation results) is made by taking the majority value, or the most dominant object, and using it as the spatial filter in the area based on the zonal statistics spatial analyst calculation with the type majority; and (d) the final results of the calculation are used as the output of object classification at the postclassification stage.

Study Area.
To be able to understand and implement the proposed approach method, we chose to study areas in the Citarum, Ciliwung, and Cisadane watersheds, which are part of the 15 national priority watersheds in Indonesia, located in the provinces of Banten, Jakarta, and West Java, Indonesia (on Java island) (Figure 1(a)). e study area has a wide variety of LULC classes that can be used as samples, representing some of the objects that will be classified based on the remote sensing satellite imagery. Furthermore, the proposed method was also applied to several other locations, including the 12 national priority watersheds in Indonesia, which represent the characteristic variations of LULC and are located on the islands of Sumatra (Asahan Toba, Siak, Musi, and Sekampung), Java (Serayu, Bengawan Solo, and Brantas), Kalimantan (Kapuas), Sulawesi (Saddang, Jeneberang, and Limboto), and West Nusa Tenggara (Moyo) (Figure 1(b)).

Methods
e proposed method used in this study is presented in Figure 2, comprising several sections consisting of data availability, LULC classification, postclassification, accuracy assessment, and comparison of classification performance results.

Data Availability.
Objects on the earth's surface covered by clouds are a major problem in the use of optical image data. is can be overcome by creating cloud-free satellite imagery data annually. In this study, data processing was performed using the Google Earth Engine (GEE) platform. e input data were obtained from the USGS Landsat 8 Surface Reflectance Tier 1 data collection.
is dataset comprises the atmospherically corrected surface reflectance from the Landsat 8 OLI/TIRS sensors, which is based on the Landsat Ecosystem Disturbance Adaptive Processing System (LaSRC). e various stages of the process consist of cloud, shadow, water, and snow mask, which are produced using CFMASK [34][35][36]. Information and detailed technical explanations can be accessed at https://developers.google.com/ earth-engine/datasets/catalog/ LANDSAT_LC08_C01_T1_SR. Filter dates are needed to determine the date range selection to get the annual Landsat 8 in 2019. In this case, the filter dates were limited to between 1 January and 31 December 2019. Furthermore, high-resolution imagery mosaic SPOT 6/7 from 2019, which can be obtained from the Remote Sensing Technology and Data Center, LAPAN, was used as an input training sample and reference data for assessing the accuracy of the LULC classification results produced by the study.

LULC Classification Approach.
In this study, we used digital classification to reproduce the LULC information. e classification approaches included maximum likelihood (ML), random forest (RF), and support vector machine (SVM) classifiers. Furthermore, LULC classification resulting from the digital classifications was evaluated by assessing the accuracy based on the reference data. is was done to determine the optimal classification approach to classifying LULC in the study area. Eleven LULC classes were employed, which refer to the National Standardization Agency for Indonesia [37]; detailed related information is presented in Table 1.
A training sample and reference maps were produced referring to the annual mosaic image data of SPOT 6/7 2019 images obtained from the Remote Sensing and Technology Data Center, LAPAN. Arrangement of the Grid Feature Index (GIF) with a size of 2 km × 2 km was made to determine the training sample and reference data more systematically, based on visual interpretation. Furthermore, the centre point of each GIF block, which had been buffered with a distance of 200 m to obtain the polygon area, was used 2 e Scientific World Journal as a location for the training samples and also as a reference for LULC classes in the study area ( Figure 3).

Maximum Likelihood (ML) Classifier.
e ML classifier is a digitally supervised classification approach that applies the Gaussian threshold in several class signatures to assign every pixel class. e approach assumes that the probability of the model class distribution is multivariate normal. In detail, the maximum likelihood classifier formulation is presented in equations (1) and (2) [1,2,7,[38][39][40]: where G i (x) is the discriminant function in the ML algorithm; ω i is the class (where i � 1, . . . , n); and M is the total number of classes. x is a pixel in the n-dimensional vector (where n is the number of bands on the satellite image is used). p(ω i ) is the true class opportunities, in ω i for pixel positions x; |Σ i | is the decisive determinant of the covariance matrix of data in the ω i class; Σ i is the inverse covariance matrix of the data in the ω i class; and m i is a vector average.

Random Forest (RF) Classifier.
e RF classifier is a digitally supervised classification approach, consisting of a combination of tree classifiers. Each classifier is created using a random vector sampled independently of the input    3 Wetland forest Forests that grow in wetland habitats, such as swamps (brackish, peat). Wetland areas have lowland characteristics that extend along the coast, low elevation, and are influenced by tides and other seawater. 4 Fields Areas used for agricultural activities with the type of crops in the dry land.

5
Rice fields Agricultural areas and waterlogged or given water with irrigation technology, rain, valleys, or tides characterised by a pattern of ridges, with the planting of short-lived Canaan food (rice). 6 Settlements Areas of land used as a residential environment and places for activities that support life. 7 Open fields Open land without cover that is natural, seminatural, or artificial.

Plantations
Land used for agricultural activities without replacement crop for two years. Harvests usually take place after a year or more. 9 Shrubs Dryland areas that have been overgrown with a variety of heterogeneous and homogeneous natural vegetation with sparse to dense density. e area is naturally dominated by low vegetation. 10 Fishponds Land for fishing or salting activities that appear with a bund pattern around the coast. 11 Water bodies All types of water areas, such as seas, rivers, lakes, or reservoirs.  vector. Furthermore, each tree cast will provide calculations on the most dominant class unit to classify certain classes corresponding to the input vector. In detail, the RF classifier formulation is presented as follows [18-20, 36, 41-43]: where h is the result of the random forest classification; x is the input sample; and θ k is the random vector sample as a class in the random forest classification.

Support Vector Machine (SVM) Classifier.
e SVM classifier is also a digitally supervised classification approach that is based on the principle of structural risk minimisation and statistical learning to determine the location of boundaries in order to obtain an optimal class separation. It is usually used for pattern classification and nonlinear regression. To be linearly separable, SVM will select a linear decision boundary that leaves the largest margin as the sum of the distances to the hyperplane from the closest point between the two classes. If there are two nonlinear classes, the SVM classifier approach tries to find a hyperplane that maximises margins and minimises a quantity proportional to the number of misclassification errors. In detail, the SVM classifier formulation for linearly inseparable data to find the separating hyperplane is presented as follows [21,44,45]: where x is the input data in the input space I into a high dimension space H and ϕ(x) is the kernel function.

e Majority Segment-Based Filtering (MaSegFil) Approach.
In this study, the MaSegFil approach is proposed as a spatial filter stage in the postclassification of the digital classification results used to classify objects on the earth's surface. In this case, ML, RF, and SVM classifiers are used for the LULC classification. e purpose of using MaSegFil is to reduce pixel noise or error from the digital classification results and to obtain better information on the object classification results. e principles of using the MaSegFil approach are that (a) the area boundary of the class of objects on the earth's surface seen in the satellite image data will be separated by pattern based on the segment mean shift process (segmentation process), (b) the results of the digital classification classifications that have been carried out are used as input to fill in the attribute class of objects in each area segment that has been separated based on its segmentation pattern, (c) the extraction of the attribute value of the digital classification results in each area segment (object segmentation result) is done by taking the majority value or the most dominant object and used as the spatial filter in the area based on the zonal statistics spatial analyst calculation with the type majority, and (d) the final result of the calculation is used as the output of classifying objects at the postclassification stage. In detail, the illustration stages of the MaSegFil approach proposed in this study are presented in Figure 4.

Accuracy Assessment and Comparison of the Classification
Performance Results. Accuracy assessment was made to evaluate the results of the digital classifications generated from the ML, RF, and SVM classifiers, together with the optimisation results from using the MaSegFil approach for the postclassification stage as a spatial filter, as proposed in this study. A confusion matrix was used to evaluate the accuracy assessment procedure, which took into account user accuracy, producer accuracy, kappa, and overall accuracy [36,42,[45][46][47].

Results
ML, RF, and SVM classifiers were used as an approach to classify LULC classes, 11 of which were used in the study (Table 1). e training sample and reference map were produced referring to the annual mosaic image data of SPOT 6/7 from 2019, with a GIF arrangement with a size of 2 km × 2 km. Furthermore, the centre point of each GIF block was buffered with a distance of 200 m to obtain the polygon area and was used as a location for the training samples and also as a reference for the LULC classes in the study area ( Figure 3). Finally, the MaSegFil approach ( Figure 4) was used in a spatial filter stage in the postclassification of the digital classification results that are implemented in the LULC classification results from the ML, RF, and SVM classifiers.

LULC Classification Based on the ML Classifier.
e results of the LULC classification based on the ML classifier are presented in Figure 5, while a comparison before and after the MaSegFil approach stage is presented in  Table 2 shows the results of the accuracy assessment of the LULC classification based on the ML classifier without the MaSegFil approach. In addition, the results of the accuracy assessment of the LULC classification based on the ML classifier with the MaSegFil approach are shown in Table 3.

LULC Classification Based on the RF Classifier.
e results of the LULC classification based on the RF classifier are presented in Figure 6, and a comparison of the LULC classification based on the RF classifier before and after the MaSegFil approach stage is presented in Figures 6(a) and 6(b). Table 4 shows the results of the accuracy assessment of the LULC classification based on the RF classifier without the MaSegFil approach. Moreover, the results of the accuracy assessment of the LULC classification based on the RF classifier with the MaSegFil approach can be seen in Table 5.

LULC Classification Based on the SVM Classifier.
e results of the LULC classification based on the SVM classifier are presented in Figure 7. In addition, a comparison of the LULC classification based on the SVM machine classifier before and after the MaSegFil approach stage is shown in Figures 7(a) and 7(b). Table 6 shows the results of the accuracy assessment of the LULC classification based on the    Table 7.

Discussion
In this study, we have improved the accuracy of the LULC classification based on the mosaic cloud-free Landsat 8 satellite imagery that can be obtained from GEE, and its popular method for filling gaps in cloudy images using median metrics or the temporal aggregation method [36]. ML, RF, and SVM, which have been widely used for image classification [36,39,40,42,43], were employed as methods to classify LULC in the study area. We chose the Citarum, Ciliwung, and Cisadane watersheds as test case study areas to be able to understand and implement the proposed method; these areas are included in the 15 national priority watersheds in Indonesia. e accuracy of the assessment results based on reference data from the use of the three     e Scientific World Journal methods shows that the overall accuracy and kappa values for the ML classifier for the LULC classification in the study area were 73.60% and 0.684; for RF, they were 77.70% and 0.731; and for SVM, they were 77.5% and 0.730. Spatially, the classification results are shown in Figures 5(a)-5(c), 6(a)-6(c), and 7(a)-7(c). e accuracy of the assessment results for RF and SVM is similar; it has been reported by Rana and Venkata Suryanarayana [45] and Phan et al. [36] that RF and SVM are the latest developments in the computational aspect of image classification and can minimise errors in classification, making them superior to parametric classifiers such as ML.
e results of the LULC classification using the three classifiers still contain pixel noise, which affects the accuracy and quality of LULC information [28,[31][32][33]. To overcome this obstacle, the MaSegFil approach was proposed as a spatial filter stage in the postclassification, which is used to classify objects on the earth's surface based on digital classification results. e results of the assessment accuracy calculations based on reference data from the use of the      e highest overall accuracy in relation to the Ciliwung, Citarum, and Cisadane watersheds without using the MaSegFil approach was obtained using the RF classifier, with an overall accuracy of 77.7% and kappa of 0.731, slightly different from the use of the SVM classifier, which had an overall accuracy of 77.5% and kappa of 0.730. e MaSegFil approach utilises segments formed in the segmentation process as the boundary of a class area. When several LULC classes appear in a segment, the most dominant class within it will become the class for the segment area, and the classes that are not dominant will be eliminated or replaced by the most dominant one.
is principle is useful because the classification results will usually form small noise classes in a      C_2 C_3 C_4 C_5 C_6 C_7 C_8 C_9 C_10 Total User accuracy Kappa  C_0  108  0  0  0  0  1  0  0  0  0  0  109  0.991  0.000  C_1  36  277  27  0  0  14  0  0  0  0  0  354  0.782  0.000  C_2  26  5  90  0  0  3  21  19  3  0  0            For more comprehensive applications, the method that has the best accuracy RF classifier was applied to several national priority watershed locations in Indonesia, with a comparison made of conditions before and after the spatial    18 e Scientific World Journal filter process was conducted using the MaSegFil approach (Figures 8-11). Based on Table 8 and Figure 12, the results show that the use of the MaSegFil approach in the priority watersheds to classify LULC had a variation in overall accuracy ranging from 83.28% to 89.76% and an improvement in accuracy from 6.41% to 15.83%. In this study, mosaic cloud-free Landsat 8 satellite imagery data were used for the input data to perform the classification. e data applied the median pixel value based on the filter date data as input. e quality and the accuracy will be different if the data used are from single date Landsat 8 data. e use of a single Landsat 8 image is not always possible because several watersheds need more than one path to cover all of their areas, which is not the case in one single date image. Seasonal variations are not considered for the input data, as this would be more challenging because of   the varying LULC classes in different conditions. Other optical sensors have yet to be tested, but hypothetically the method would also improve classification accuracy.

Conclusion
Improvement in the accuracy of the postclassification of LULC is important in order to meet the need for the rapid mapping of such information. is study has proposed the MaSegFil approach, which can be used for spatial filters of supervised digital classification results. ree digital classification approaches (ML, RF, and SVM) were applied to test the improvement in the accuracy of LULC postclassification using the MaSegFil approach. e use of a single Landsat 8 image is not always possible because several watersheds need more than one path to cover all of their areas, which cannot be obtained from one single date image. Mosaic cloud-free Landsat 8 satellite imagery data were used for the input data to make the classification. e data applied the median pixel value based on the filter date data used as input for the LULC classification in the study area. Assessment of the accuracy based on the reference data was made to compare the postclassification results before and after the addition of the MaSegFil approach. e results show that, before applying the MaSegFil approach, the results of the ML, RF, and SVM classifications obtained accuracy were 73.6%, 77.7%, and 77.5%, respectively. However, the MaSegFil approach can reduce pixel noise from the ML, RF, and SVM classifications, with an increase in accuracy of 81.7%, 85.2%, and 84.3%, respectively. Furthermore, the method that has the best accuracy RF classifier was applied to several national priority watershed locations in Indonesia, with a comparison of conditions before and after the spatial filter process was applied using the MaSegFil approach. e results show that the use of the MaSegFil approach implemented on several national priority watersheds in Indonesia to classify LULC had a variation in overall accuracy ranging from 83.28% to 89.76% and an improvement in accuracy from 6.41% to 15.83%. e results of the study can be used to support the acceleration of medium-scale mapping at 1 : 50,000-1 : 100,000, which currently is often performed manually by digitizing on-screen. e development and application of the next method become input for future research in the use of other optical image data that have a higher spatial resolution, such as Sentinel-2, SPOT 6/7, or Pleiades.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request. Disclosure is paper is part of the study activities entitled "Integration of Remote Sensing Data for Flood Impact Analysis, Environmental Management and Disaster Mitigation in the Citarum Watershed, West Java Province, Indonesia."  e Scientific World Journal