Sparse Coding-Based Method Comparison for Land-Use Classification

Land-use classification utilize high-resolution remote sensing image. The image is utilized for improving the classification problem. Nonetheless, in other side, the problem becomes more challenging cause the image is too complex. We have to represent the image appropriately. On of the common method to deal with it is Bag of Visual Word (BOVW). The method needs a coding process to get the final data interpretation. There are many methods to do coding such as Hard Quantization Coding (HQ), Sparse Coding (SC), and Locality-constrained Linear Coding (LCC). However, that coding methods use a different assumption. Therefore, we have to compare the result of each coding method. The coding method affects classification accuracy. The best coding method will produce the better classification result. Dataset UC Merced consisted 21 classes is used in this research. The experiment result shows that LCC got better performance / accuracy than SC and HQ. LCC method got 86.48 % accuracy. Furthermore, LCC also got the best performance on various number of training data for each class.


Introduction
Remote-sensing technique has been used as an effective tool to monitor Land-use and land-cover classification. Moreover, remote sensing technique is used to observe dynamic changing of a land [1][2][3]. Nowadays, single object classification and land classification research are progressive due to the better quality of remote sensing image [4][5][6][7].
Land-use-based classification uses image from remote sensing. The image is processed to extract information of land-use. On remote sen-sing, representation and efficient identification are still open problem and challenging. A lot of previous research used analytical approach, which focused on pixel-or object based classification. It extracted spectral, texture, and geometrical attributes [8][9][10][11][12]. Nevertheless, the attribute is only used in a certain environment so it just produce less data representation.
The recent years, Bag of Visual Words (BO-VW) model is implemented to solve Land-use classification problems. It uses remote sensing image data [13]- [15]. Research [13] uses unsuper-vised-  [17]. Each method has different complexity. Bag of Visual Words utilize the coding method to get the data representation.
Land-use classification research usually uses free dataset from UC Merced. The dataset has high degree of difficulty. The dataset has 21 classes of Land-use. This research will compare the performance of several coding methods especially SC, LCC, and HQ for Land-use classification.
The rest of the paper is organize as follows. In the section II, we present method. The section III, result, and analysis are presented. Moreover, we concluded this research in section IV. The last section is the references.

Literature Review
In this part will explain about SC method, HQ, and LCC.

Sparse Coding (SC)
SC method is a method develop from VQ method. SC is a L1-norm regularization for getting a small value that is not 0. Equation (1) shows the sparse coding method.
That depends on  k 1, 2, ..., K. X is a SIFT descriptor and V is a codebook from K clustering.

Locality-constrained Linear Coding (LLC)
The LLC method is initiated by fixing the LCC method, which has a weakness to high computational complexity. This method implements a locality. Therefore, it is important. As a result, The LLC's encoding formula becomes [20] showed in equation (2).
Hard Quantization (HQ) HQ method presents any local feature with a nearest visual word but only gives good performance when use many vocabularies [21].
Besides using different coding techniques, the complexity of each method is also different. The complexity is shown in Table 1. HQ has the highest complexity and SC has the lowest complexity.

Methods
In this section, we will describe about the dataset and the method we used on the experiment. We are also present the experiment results and the analysis.

Dataset
To conduct the experiment, we chose to use UC Merced dataset. This dataset is a free data which can be downloaded in http://vision.ucmerced.edu/ datasets/landuse.html. It needs to know that this dataset has 21 classes. Figure 2 shows the example of each classes in UC Merced dataset. Each

Research Method
Our research method can be seen in Figure 1.
There are 5 main process need to be done. The first process is local feature extraction. Then, the coding process is conducted to get the sparse representation of the local feature. After that, the spatial information is extracted from the data based on sparse features. Moreover, the process is continued to the fourth process, classifier training. The final process is testing the performance of our model. Each step will be described below.

Local Feature Extraction
Local feature, extracted from the raw data, is on the image patches form. To extract the local feature, we used Scaled Invariant Feature Transform (SIFT) method. We use dense SIFT to get all information from the data. We set some parameters to be fixed. They are patch size, descriptor degree, and grid spacing. The patch size is set into 16 x 16 px; descriptor has 8 degrees, and grid spacing is set by 8 px. The codebook is set by 1024. Output of this process is descriptor of each patches from the image. Base on this setting, first we extract all of the patches from image. Then, each patch is processed by compute the gradient magnitude.

Coding Process
In this research, we compare the performance of Sparse Coding, Locality constrained Linear Coding, and Hard Quantization method on the coding quality for feature representation in classification task, especially land use classification. The input is the descriptor result from the local feature extraction process. Each local feature will be mapped into sparse representation and locality. The sparse representation means approaching some values close to 0 so that only a few features are active, whereas locality will provide the feature representation in linear form. This locality makes the final features linearly separated.

Spatial Information Extraction
The result of coding process is a code of local feature for each patch. This result is lacked of spatial information. To address this problem, we used Spatial Pyramid Matching (SPM) method [18]. We divided the image into 3 types of region, 1x1, 2x2, and 4x4. In the 1x1 region, spatial information is extracted on hole image. In the 2x2 region, image will be divided into 4 regions, and 16 regions for 4x4 region type. The function of this division is to eliminate redundant coding features. The input to extract spatial information from the data is the result of the coding process. Then, the result of this partition will be made into one array 1 and the data is ready to be trained using a classifier.

Classifier Training
The classifier used to classify data is the Support Vector Machine (SVM) classifier. Research [13][14][15] also uses this method as a classification method. In addition, this method is chosen because it is able to maximize margin in the formation of decision boundary.

Results and Analysis
On the experiment, we measured the accuracy of the classifier. Then, we inspected the influence of data training number.

Classification Accuracy
In here, we divided the training and testing data with ratio 4:1 for each classes. The result can be seen in Figure 3. From this result, LLC performed better than SC and HQ. Because of we used linear classifier, this result proves that LLC has better performance to mapped the local feature into linear space.

The Effect of Amount of Train Data on Accuracy
To know the ability to represent the features in each coding method, the researcher conducted an experiment using different amounts of trainer data. The amount of training data used is 10, 20, 40, and 60. Figure 3 shows the graph of the resulting accuracy. HQ is not involved in comparisons due to HQ dependence on large vocabularies. From Figure 4 it can also be seen that using LLC as a feature encoding provides better accuracy than using the SC method.

Analysis
From the measurement accuracy of the three methods, it can be seen that LLC has a better ability than SC or HQ. This proves that the locality that is carried by LLC is important so that it can represent better data. When it comes to land use classification, it relates to the amount of data that can be used as training data. LLC also shows its ability better than SC using little data. However when compared to [16], the accuracy of Figure 2. Example of each classes from UC Merced dataset. (a-v) agricultural, airplane, baseball diamond, beach, buildings, chaparral, dense residential, forest, freeway, golf course, harbor, intersection, medium residential, mobile homepark, overpass, parkinglot, river, runway, sparse residential, storage tanks, tennis court.  LLC is lower. But high accuracy is followed by high complexity in model development.

Conclusion
This study has conducted a comparison between HQ, SC, and LLC methods. The measurement results show that LLC has better performance compared to HQ and SC. The number of training data used for the training also determines the accuracy. The more the number of train data, the more improved model recognition capabilities. The highest accuracy was obtained by LLC method of 86.476% for UC Merced dataset.
From the results of this study, it can be done further research which do boosting the method of coding to improve recognition performance. It can also inspect the possibility of other factors besides sparsity and locality that are important in the coding process.