Fine Land Cover Classification in an Open Pit Mining Area Using Optimized Support Vector Machine and WorldView-3 Imagery

Fine land cover classification in an open pit mining area (LCCOM) is essential in analyzing the terrestrial environment. However, researchers have been focusing on obtaining coarse LCCOM while using high spatial resolution remote sensing data and machine learning algorithms. Although support vector machines (SVM) have been successfully used in the remote sensing community, achieving a high classification accuracy of fine LCCOM using SVM remains difficult because of two factors. One is the lack of significant features for efficiently describing unique terrestrial characteristics of open pit mining areas and another is the lack of an optimized strategy to obtain suitable SVM parameters. This study attempted to address these two issues. Firstly, a novel carbonate index that was based on WorldView-3 was proposed and introduced into the used feature set. Additionally, three optimization methods—genetic algorithm (GA), k-fold cross validation (CV), and particle swarm optimization (PSO)—were used for obtaining the optimization parameters of SVM. The results show that the carbonate index was effective for distinguishing the dumping ground from other open pit mining lands. Furthermore, the three optimization methods could significantly increase the overall classification accuracy (OA) of the fine LCCOM by 8.40%. CV significantly outperformed GA and PSO, and GA performed slightly better than PSO. CV was more suitable for most of the fine land cover types of crop land, and PSO for road and open pit mining lands. The results of an independent test set revealed that the optimized SVM models achieved significant improvements, with an average of 8.29%. Overall, the proposed strategy was effective for fine LCCOM.


Introduction
Land degradation has been increasingly recognized as one of the most destructive impacts on the terrestrial environmental during the last century [1,2]. Some researchers have revealed the important effect of open pit mining on local land degradation [1][2][3][4][5]. Accordingly, land covers in complex open pit mining landscapes are being increasingly used as key datasets for global and local land degradation and development studies [6][7][8][9][10][11][12].
Currently, high resolution satellite imagery and machine learning algorithms (MLAs) have been applied to land cover classification in open pit mining areas [6,9,[12][13][14]. MLAs can generally accept various features sets [10], which have proven to be valuable in open pit mining areas classification. Several algorithms with excellent performance have been widely used, for example, support vector

Remote Sensing Data Resources
WV-3 can provide a dataset of higher spatial and spectral resolution from panchromatic, multispectral imagery to short-wave infrared imagery (SWIR) (as listed in Table 1). The use of WV-3 imagery has been found to provide better performance on mapping surface geology targets than conventional sensors [32,33]. The 2A level WV-3 data that were used in this study were acquired on 4 August 2015 with the total cloud cover of 1%.

Methods
The overall flowchart of the methodology can be divided into six phases, as presented in Figure  2. The high resolution features of WV-3 images and optimized SVM algorithm were used in this study

Remote Sensing Data Resources
WV-3 can provide a dataset of higher spatial and spectral resolution from panchromatic, multispectral imagery to short-wave infrared imagery (SWIR) (as listed in Table 1). The use of WV-3 imagery has been found to provide better performance on mapping surface geology targets than conventional sensors [32,33]. The 2A level WV-3 data that were used in this study were acquired on 4 August 2015 with the total cloud cover of 1%.

Methods
The overall flowchart of the methodology can be divided into six phases, as presented in Figure 2. The high resolution features of WV-3 images and optimized SVM algorithm were used in this study to Remote Sens. 2020, 12, 82 4 of 16 improve the classification accuracy of the fine LCCOM. Besides, an independent test set was used to examine the classification models for LCCOM.
to improve the classification accuracy of the fine LCCOM. Besides, an independent test set was used to examine the classification models for LCCOM.

Data Processing
Geometric registration between different bands of the used WV-3 level 2A products can meet the requirement of image fusion. According to our previous study [34], the panchromatic and multispectral data were fused based on the Gram-chmidt method because it provides higher fidelity of spatial and spectral characteristics, which is suitable for studying open pit mining areas.

Developing Features Based on WV-3
A total of 70 features divided into six types were used in this study (as listed in Table 2).
(1) Various spectral bands: eight VNIR spectral bands of the fused image and eight SWIR spectral bands.
(2) Vegetation index: one is the normalized difference vegetation index [35], which was calculated while using the first near-infrared band (NIR-1) of WV-3; the other is the soil-adjusted vegetation index [36], which was expected to improve the identification ability for bare soil and sparse vegetation in open pit areas. The formula for SAVI is as follows: According to statistic values, the value of L was determined with 0.5. Furthermore, the NIR-1 of WV-2 was used.
(3) Carbonate index (CI): a strong absorption feature at around 2.3 µm [37] characterizes carbonates. It is advantageous in distinguishing non-lithology cover types in mining areas, such as

Data Processing
Geometric registration between different bands of the used WV-3 level 2A products can meet the requirement of image fusion. According to our previous study [34], the panchromatic and multispectral data were fused based on the Gram-chmidt method because it provides higher fidelity of spatial and spectral characteristics, which is suitable for studying open pit mining areas.

Developing Features Based on WV-3
A total of 70 features divided into six types were used in this study (as listed in Table 2).
(1) Various spectral bands: eight VNIR spectral bands of the fused image and eight SWIR spectral bands.
(2) Vegetation index: one is the normalized difference vegetation index [35], which was calculated while using the first near-infrared band (NIR-1) of WV-3; the other is the soil-adjusted vegetation index [36], which was expected to improve the identification ability for bare soil and sparse vegetation in open pit areas. The formula for SAVI is as follows: According to statistic values, the value of L was determined with 0.5. Furthermore, the NIR-1 of WV-2 was used. It is advantageous in distinguishing non-lithology cover types in mining areas, such as dumping grounds. Accordingly, the CI was developed based on WV-3 characteristics, as follows, in this study: In formula (2), SWIR-5 and SWIR-3 are the fifth and third bands in the SWIR range, respectively. (4) Principal component bands: the principal component analysis [38] was carried out to eliminate the redundancies for the eight fused multispectral bands in this study. The top three bands with a cumulative contribution rate of 99.34% were used.
(5) Filter images: it is helpful to improve the classification accuracy in LCCOM for applying Gaussian low-pass filters to optical images [9]. Thus, the eight fused multi-spectral bands were calculated and a kernel size of 3 × 3 was used.
(6) Texture measures: five features belonging to gray level co-occurrence matrix texture [39] measures were calculated with a processing window size of 9 × 9.

Land Cover Classification Schemes
As previously reported [6,9], there were seven first-level classes in the study area, i.e., open pit mining land, crop land, forest land, water, road, urban and rural residential land, and bare land. Fine classes (for details, see Table 3) should be considered in land cover classification schemes for open pit mining areas due to the big intra-class spectral and topographic differences in the first-level classes. Two new fine types were particularly added in this study (Table 3). One is green dry land with land-water resources for crops mainly from natural precipitation and with high coverage. The other is black roof, usually referring to residential land in industrial parks. All of the procedures, such as training and test set construction, SVM algorithm optimization, classification model developing and prediction, and accuracy assessment were carried out on the fine land cover classes based on the open pit mining area in this study. Table 3. Land cover classification schemes used in this study (revised from [9]).

Fine Land Cover Types Description
Opencast pit Having mine pit lakes and spiral roads.
Ore processing site Characterized by linear mineral processing facilities and highly reflective rubble.
Dumping ground Located around stopes and may be gray in true color images.

Paddy field
Having adequate water supply and used for cultivation of rice, lotus, and other aquatic crops.

Fine Land Cover Types Description
Vegetable and fruit greenhouse Having white plastic film sides and roofs, and high surface albedo with regular rectangular shapes.
Green dry land On the land water resources for crops mainly from natural precipitation and with high coverage.
Gray dry land On the land water resources for crops mainly from natural precipitation and with low coverage.

Fallow land
No crops growing at the present stage.

Woodland
Includes timber stands, economic forests, and shelterbelts that have high chlorophyll content and are dark red in the false color image (R-NIR-1, G-Red, B-Green).

Shrub forest
Having multiple stems and shorter height, generally less than 2 m tall, and is bright red in false color images.

Forest under stress
Under the influence of surface mining development, around surface-mined land, having large amounts of deposited mineral dust, poor growth, and is grayish in true color images (R-Red, G-Green, B-Blue).
Nursery and orchard Having a rectangular shape like cropland dotted by vegetation cover and exposed soil and is black in true color images.
Pond and stream Including many fish ponds with regular rectangular shapes.
Mine pit pond Lakes created during and after mining, typically with irregular shapes.
Dark road Usually referring to asphalt highways.
Bright road Usually referring to cement roads.
Light gray road Usually referring to dirt roads.
Bright roof Usually referring to urban and town areas.
Red roof Usually referring to rural land.

Dark roof Usually referring to residential land in industrial parks
Blue roof Usually referring to land used for industrial parks.
Bare surface Referring to exposed land with little vegetation.

Training Set and Test Sets
Based on our training data polygons [9], a revised version of polygons was obtained (Table 4) based on WV-3 images and the above-mentioned updated fine land cover classification scheme. Almost all of the open pit mining land was delineated in a similar manner, and the others were randomly determined across the study area. We employed a stratified random sampling method in this study. The result is that each class included 1000 samples, in which 900 samples were considered as the training set, while the other 100 were considered as the test set (Table 4). This study used spatially dependent training and test sets for fine LCCOM, with reference to the literature [6,9].  An independent test set with the first-level land covers (i.e., crop, forest, water, road, residential, bare surface, and open pit mining) [9] was used. In this set, there were 700 samples, i.e., 100 samples for each first-level land class.

Classification Algorithm and Corresponding Parameter Optimization Methods
The penalty parameter C of SVM is key to improving the remote sensing classification accuracy. The polynomial kernels and the radial basis function (RBF) kernel were often used in the remote sensing community [40]. The used kernel parameter G can influence the complexity of the sample feature subspace distribution [16]. The classification accuracy and generalization ability of SVM will decline as G increases. When the G value is small, almost all of the training samples are support vectors. At this time, the training error is small and the test error is close to 1. However, the SVM generalization ability is poor. As the G value increases, the number of support vectors gradually decreases and the training error increases. Nevertheless, the SVM generalization ability is gradually enhanced. With the continued increase, when G reaches a certain threshold, the number of support vectors will increase again, and both training errors and test errors will also increase. At this point, SVM will start to deteriorate, regardless of classification ability or generalization ability. In addition, the penalty parameter C could influence the generalization of SVM. After the sample subspace is determined, when the C value is small, the complexity of the SVM is small, the penalty for the empirical error is small, and the empirical risk value is large.
CV is a time-consuming process, as it is often necessary to evaluate each parameter set at many grid points [26,41]. GAs have been used to obtain SVM kernel parameters, which were integrated into SVM algorithms to improve the accuracy of SVM classification [26,42]. PSO is selected to determine the optimum kernel function, which is particularly effective for the radial basis function kernel [43].
In this study, the GA, CV, and PSO algorithms were selected to optimize G and C parameters of SVM. The CV process was implemented in R software. The GA and PSO algorithms were processed in MATLAB R2009.

k-fold CV Algorithm
The k-fold CV method has been widely used in the remote sensing community [15,[44][45][46]. More details regarding the CV could be found in the literature [44]. In this study, a five-fold CV scheme in the function "best.tune" included in the e1071 package [47] was utilized to achieve the "optimal" parameter combination. The maximum mean accuracies resulting from the k-fold CV would be considered as the optimal parameter. Remote Sens. 2020, 12, 82 8 of 16

Genetic Algorithm
GA was considered to be an adaptive optimization method according to the genetic processes of biological organisms. Further details on GA can be found in the literature [26,48]. GA can simultaneously identify the optimal SVM kernel parameters without reducing the SVM classification accuracy.

PSO Algorithm
PSO was proposed based on the social behavior of bird flocking, and the detailed principle on PSO could be found in the literature [49,50]. PSO has the advantages of higher efficiency, uncomplicated implementation, and significant exploration abilities for parameter optimization in SVM, both globally and locally [44,51].

Accuracy Assessment
The classification accuracies of fine LCCOM were evaluated on the basis of the test set. The overall accuracy (OA) was used to indicate the performance of the optimization classification models. The F1-measure [51] was used to describe class-specific accuracy. Moreover, the percentage deviation [52] was calculated on the basis of the three above-mentioned metrics for evaluating differences in the overall performance and the accuracy of each class among different classification models. In addition, the McNemar test was used to examine whether the parameter optimization methods could significantly improve the classification accuracy and whether there is a significant difference between the three optimization methods.

Results of Parameter Optimization
A group of 48 parameter combinations (i.e., eight and six values for gamma and cost; for details, see Table 5) was used for the parameter optimization of SVM algorithm-based classification models. The default values for gamma and cost were 1/n (n is data dimension) and 1. While using CV, GA, and PSO algorithms, combinations of 2 −7 and 2 7 , 2 −9 and 2 7 , and 2 −5 and 2 3 for G and C were obtained, respectively.

Assessment of Classification Results
Four classifications were performed in this study. Table 6 presents the F1-measure, OA, and percentage deviation of different SVM models based on default parameters and different optimization methods. Table 7 presents the results of the McNemar statistical test.

F1-Measure and Percentage Deviation of Each Land Cover
Regarding the accuracy of each class (i.e., F1-measure), the three optimization methods yielded different effects. For example, only the following seven land covers complied with the above-mentioned conclusion (i.e., CV overtook GA and PSO): greenhouse, green dry land, gray dry land, fallow land, shrub, bright road, and bright roof. In general, all the three parameter optimization methods yielded over 90% F1-measures for the following land covers: greenhouse, pond and stream, mine pit pond, bright road, red roof, blue roof; over 80% for the following: paddy, gray dry land, woodland, dark road, light gray road, bright roof, and bare surface; over 70% for the following: fallow land, coerced forest, dark roof, and dumping ground; over 60% for the following: green dry land and nursery; and, over 50% for the following: shrub, open pit, and ore processing site.

McNemar Test
The McNemar test was conducted for each pair of the models based on SVM with default parameters and those optimized parameters while using the three methods. Table 7 shows pairs of classification models, the numbers of samples that one model wrongly classified and another model that correctly classified, and the corresponding chi-square and p values. The results indicate that: (1) SVM models based on CV, GA, and PSO algorithms significantly outperformed those with default parameters (chi-square values were larger than 3.84 and the p values were smaller than 0.05), i.e., the three parameter optimization methods significantly improved the classification performance; (2) there were significant differences between CV and the other two methods, i.e., CV significantly outperformed GA and PSO; and, (3) GA and PSO methods showed equivalent effects with no statistical significance.

Assessment of the Independent Test Set
A predicted map with the first-level land covers was obtained based on the classification result of the study area that derived from CV-SVM (Figure 3). The predicted map was poorer than that drawn in [9]. There were misclassifications between all of the land covers. In the west part of the study, the misclassification of crop land as forest land existed. In the whole study area, there were land covers that were wrongly classified as open pit mining land and road.
The map was then erased by the data polygons that were used to construct the dependent training and test sets. Finally, a stratified random sampling method was applied to obtain the independent test set with 700 samples. The land classes of the independent test samples were determined by a visual interpretation based on the WV-3 imagery. Table 8 shows all of the models' OA values for the independent test set and Table 9 depicts the results of statistical test. The SVM model just achieved an OA of 57.43%. The optimized SVM models significantly outperformed the SVM, with the improvements of 10.94%, 8.71%, and 5.22% (average 8.29%). Among the three optimized SVM models, just CV-SVM overtook PSO-SVM, and no significant differences existed between other pairs of them.

Effectiveness of The Used Features
The features that were used in this study were like that of previous studies [6,9]. The importance of each feature was assessed and then compared with those in other related studies [9]. Chen et al. [6] further confirmed the importance grades of each feature set and determined whether there were significant differences among them. Overall, the commonly used feature sets, such as spectral information, principal component bands, filter images, and texture measures, are effective for fine LCCOM.

Effectiveness of The Used Features
The features that were used in this study were like that of previous studies [6,9]. The importance of each feature was assessed and then compared with those in other related studies [9]. Chen et al. [6] further confirmed the importance grades of each feature set and determined whether there were significant differences among them. Overall, the commonly used feature sets, such as spectral information, principal component bands, filter images, and texture measures, are effective for fine LCCOM.
Nevertheless, there are three significant differences between this study and the aforementioned two studies, which resulted in different effects on the classification accuracies of different land covers.
(1) This study focused on the fine classification of land covers, which is more difficult than that of coarse land covers (neglecting intra-class misclassification) [6,9]. (2) This study utilized more spectral information, soil adjusted vegetation index, and the proposed CI, but not topographic variables and the standard deviation filters, which were the top two important feature sets in the previous studies [6].
(3) The WV-3 data that were used in this study were of much higher resolution than that of Ziyuan-3 image in the previous two studies (about seven times), which further increased the classification difficulty at a fine scale.
Taking open pit and ore processing sites, for example, their low accuracies (about 50%) could be attributed to the inherent difficulty of classification, misclassifications between each other, and the lack of topographic variables. Although Li et al. [9] achieved the highest OA of 87.34% for three land covers of open pit, to a certain degree, it could be attributed to the use of 10% data as the training set and the effect of spatial auto-correlation [53] between the training and test sets. The proposed CI could theoretically help to distinguish different open pit mining land classes, providing relatively higher accuracies for dumping ground. However, it could not differentiate the open pit and ore processing site. Obviously, the spectral band of WV-3 should be further explored to generate more effective lithology indices for fine LCCOM. In addition, the integration of lithology indices, topographic variables, and other features might be effective, and they will be considered in the future.
Similarly, the shrub and nursery exhibited low accuracies, owing to the very high resolution of WV-3 and misclassifications between each other. The addition of more spectral information and soil adjusted vegetation index was not enough for distinguishing them. Higher-level features that are derived from spectral information and more effective spatial features, such as the standard deviation filters, have potential in improving their accuracies, and we will investigate them in the future.
The highest OA for the independent test set was just 63.71%. It was lower than the feature subset-based RF, SVM, and ANN models (with OA values of 77.57%, 72.00%, and 64.29%, respectively), and all feature-based RF and SVM models (with OA values of 74.86% and 68.00%). This revealed that the topographic features were important.

Dependency of Test and Training Sets and Sampling Scheme
When compared to [6,9], this study used spatially dependent training and test sets for fine LCCOM, through which good OA and F1-measures could be obtained for most fine land covers. In general, independent training and test sets are the prerequisites of reliable accuracy assessment. Although the acquisition method of training and test sets determines their spatial auto-correlation [53], the effect was within normal range, as reflected by the statistical data in Table 4. In addition to the white road (31.63%), the number (or area) fractions of samples in the training and test sets and those in data polygons for each land cover class had the maximum, minimum, and average values of 7.34% , 0.04% , and 1.53% , respectively. The used data were only a small portion of data polygons and the spatial auto-correlation of training and test samples was very small. Therefore, it would not have affected the reliability of the accuracy assessment.
The classification in this study was different from the subclassification of three open pit mining lands in [9]. The spatial auto-correlation was very large; however, the large test set ensured the reliability of the accuracy assessment.
The following conclusions could be drawn regarding the effects of spatial auto-correlation on each land cover. (1) On the whole, spatial auto-correlation had little effect. For example, the land covers of gray dry land, gray road, dark roof, and bare surface exhibited higher spatial auto-correlation (i.e., the above-mentioned higher fraction; 1.60% , 1.42% , 2.91% , and 7.34% ) and their F1-measures were moderate (73.13%, 75.12%, 63.21%, and 75.13%; obtained from the SVM model with default parameters). Moreover, high F1-measures could be obtained for some land covers with low spatial auto-correlation, i.e., wood land (0.12% and 80.75%) and pond and stream (0.19% and 95.57%). It was clear that the separability of these classes was the dominant reason, although there were some other land covers with high spatial auto-correlation and high F1-measures, such as greenhouse (1.77% and 85.41%), mine pit pond (1.52% and 96.48%), bright road (31.63% and 94.23%), bright roof (1.07% and Remote Sens. 2020, 12, 82 13 of 16 81.25%), red roof (5.06% and 93.26%), and blue roof (1.99% and 95.88%). (2) The low accuracies of three surface-mined lands could be partly attributed to the insufficient number of training samples, which led to the low spatial auto-correlation to some degree. It could be further concluded that more training samples are necessary and the spatial auto-correlation should be fully and reasonably exploited for open pit mining lands with relatively low separability.
The results of independent test set in this study were worse than those in [9]. These results might reflect the true predict ability of the SVM models. More training data were necessary for better predicted map of the study area, owing to the complexity and difficulty of LCCOM and fine LCCOM. Small training data were just applicable for model comparison.

Influence of Parameter Optimization
For SVM-based models, parameter optimization is indispensable and it can significantly affect the obtained results [16,24]. Furthermore, many previous studies reported that parameter optimization has positive effects. Similarly, it can be concluded in this study that the three parameter optimization methods significantly improved the classification accuracy of LCCOM. Moreover, CV significantly outperformed GA and PSO, and GA slightly overtook PSO. A statistical test was also performed to determine whether there were significant improvements after parameter optimization and whether there were significant differences among them. In contrast, most studies only used one of the common optimization algorithms or focused only on modifying specific algorithms. Few studies have compared different optimization algorithms. For example, a comprehensive assessment was carried out for different optimization methods in SVM, such as continuous ant colony optimization, GA, imperialist competitive algorithm, and PSO [54,55]. The authors concluded that the suitability of the algorithm depended on the specific application. The limited parameter space [55] and the complexity of LCCOM in this study might obstruct the performance of more complex algorithms, such as GA and PSO.
Some studies have also reported that the combinatorial optimization of feature selection and parameter optimization in SVM might be more effective than only optimizing the parameters in SVM [55,56]. In this study, multiple types of features with highly redundant and correlated information were utilized.
Different land covers showed different sensitivities regarding the three parameter optimization methods. For example, CV was more suitable for most of the fine land covers of crop land (i.e., CV achieved the most substantial accuracy improvements), and PSO for road and open pit mining lands. Schuster et al. [52] and Li et al. [9] also drew a similar conclusion that easily distinguishable land covers are less sensitive to the additional operation, which is the parameter optimization in this study and feature selection in the previous two studies.

Conclusions
In this study, SVM models with three parameter optimization methods were investigated for improving the higher accuracy of fine LCCOM based on WV-3 images. Overall, the accuracy was significantly improved. In particular, the fine land covers from mining activities could be identified based on our proposed strategy. This is different from that of our previous study, which focused only on the coarse land cover classes in open pit mining areas. Some important conclusions can be drawn. Firstly, the proposed CI based on WV-3 was useful in distinguishing the dumping ground from other open pit mining lands. Secondly, parameter optimization methods can significantly improve the classification accuracy of fine LCCOM. CV significantly outperformed GA and PSO, and GA slightly overtook PSO. Finally, CV was more suitable for most of fine land covers of crop land, and PSO for road and open pit mining lands. The three optimized SVM models also achieved significant improvements based on the independent test set. In general, the fine land covers in an open pit area could be classified with higher accuracy while using WV-3 and SVM algorithms based on parameter optimization. In the future, combinatorial optimization should be investigated [57], and we will focus on the generalization ability of the proposed strategy in different mining regions.