Image Pattern Recognition in Spatial Data using Artificial Neural Network

This study predicts erosion based on the image patterns as the input data by using an ANN approach. Several simulations had been carried out to get the ANN parameter combination in producing the best accuracy through trials and errors. The results show that the accuracy of artificial neural network training is not influenced by the number of channels, namely the input dataset (erosion factors) and the dimensions of the data, but it is determined by changes in the network parameters. The best combination of parameters is 2 hidden layers, learning rate 0.001, Momentum 0.9, and RMS 0.0001 with an accuracy of 98.55%


Introduction
The development of remote sensing technology from aerial photography to satellite imagery in the 1980s makes the pixel-based spatial analysis widely used. Remote sensing has become a solution in the acquisition of spatial data, which is fast and efficient but requires proper analytical techniques, if it is integrated with non-spectral spatial data such as measurements in the field. At present, the development of studies on spatial data analysis techniques is increasing along with the technological advances that demand high accuracy and time efficiency both in the analysis and data collection. The pixel-based spatial data analysis is strongly influenced by the classification method used because each method has a different treatment for the input data [1], [2]. The classification of input data that relies on spectral can be done using conventional methods based on statistical principles such as maximum likelihood where the data are normally distributed [3]. The spatial data with mixed or combined spectral characteristics between spectral and non-spectral data require analysis techniques with machine learning including decision trees [4], [5], [6], [7], fuzzy logic [8], [9], [10], and artificial neural networks [2], [11], [12], [13].
Artificial Neural Network is an effective method with higher accuracy than that of the decision tree [14], while fuzzy logic is more appropriate to be used for data with probabilistic membership of a vague class [9]. Artificial neural network is an information processing system that mimics the work of the neural network system in the human brain that represents knowledge with learning ability [15]. The main problems of the spatial data processing are large data dimensions and data uncertainties [16], [17]. ANN can easily understand uncertain data through network training and the right combination of input parameters [12]. ANN is widely used for an image analysis including filtering, interpretation, and prediction. In this study, ANN was used to predict the level of erosion through pattern recognition of several input data combinations. The sample was taken from each input data as the training data on ANN representing each class of erosion. Input data are a combination of spectral images, namely SPOT imagery and non-spectral images. SPOT imagery is as the data source of vegetation cover data, while non-spectral images are obtained from the interpolation of data from the measurements in the field. In training data, the ANN architecture used is Multilayer Perceptron (MLP) with backpropagation algorithm. Some previous researchers proved that ANN with MLP is able to provide accurate predictions in cases with complex data [1], [18]. Backpropagation is a supervised learning algorithm and is usually used by MLP to change the weights associated with neurons in the hidden layer [15].
The aim of using backpropagation is to obtain a balance between correct training pattern recognition and good responses to other similar patterns (testing data). The network can be trained continuously until all training patterns are correctly identified. In this study, a testing was conducted through simulations of various combinations of ANN parameters to find the best accuracy in recognizing patterns. The ANN parameters include iteration, learning rate, momentum, hidden layer, and RMS error. Iteration (I) is the number of repetitions performed during data training. [19] produces better accuracy (> 60%) in iterations of 10000 to 25000. Learning rate (LR) is a constant speed of learning, where the selection of the right constant will speed up the network in learning to recognize patterns, so that only little repetition occurs [15], [20]. Momentum (M) is a network parameter that serves to accelerate convergence and prevent local optimum, with values ranging from 0 to 1. Hidden layer is the layer that receives responses in the form of weights from the input layer to be forwarded to the output layer. RMS error is the value used to determine the RMS value limit, so that the iteration is stopped. Iteration will stop if the RMS value is lower than the tolerance limit value set on the network. Basically, there are no provisions on the exact value of each network parameter to obtain high accuracy, because ANN is based on "trial and error". Therefore, it is necessary to do a lot of simulations to test the best combination of the parameter values. The simulation results become a reference for analyzing the ability of the network in the recognition of erosion spatial data.

ANN-Multi Layer Perceptron
Information processing on ANN is divided into three, namely the input layer, hidden layer, and the output layer. Information will be collected in layers called neuron layers. Information given to ANN will be propagated from layer to layer, starting from the input layer to the output layer through another layer, which is often known as the hidden layer.
The algorithm used in this study is backpropagation. Farrokhzad [21] made an ANN-based modeling with backpropagation algorithm to analyze certain elements causing landslides in the study area and soil profiles. The results of the modeling accuracy with ANN were compared to data which were not used in the ANN training in accordance with the analysis of field data; the accuracy of ANN training obtains > 93%.
The backpropagation algorithm uses an error output to change its weight values in the backward direction. To get this error, the forward propagation stage must be done first. At the time of forward propagation, neurons are activated by using an activation function that can be differentiated [15]. The activation function used in this study is the sigmoid function: (2) Backpropagation Algorithm − Initialization of weights (weights are taken with a fairly small random value) − Set: iteration, target error, and learning rate (α) 3 Feedforward a. Each input unit (Xi, i = 1,2,3, ..., n) receives the xi signal and forwards the signal to all units in the layer above it (hidden layer) b. Each unit in the hidden layer (Zj, j = 1,2,3, ..., p) sums the weighted input signals: use the activation function to calculate the output signal: zj = f(z_inj) (4) and send the signal to all the units in the upper layer (output units) zj = f (z_inj) (4) c. Each unit of output (Yk, k = 1,2,3, ..., m) sums the weighted input signals.

Input Dataset
The erosion control factor becomes the input and output layers, namely the class of erosion (very slight, slight, moderate, severe, and very severe). Artificial neural networks require two stages of analysis, namely stages of data training and data testing. The overview of the inputs and outputs that occur in this study is presented in Figure 1. The sample area is taken from each erosion class with a total pixel number of 2216 (Table 1) consisting of 30 samples. The determination of the sample area is based on the results of interpretations on SPOT 5 satellite imagery and the results of field analysis. There are 9 input images, consisting of 4 channels in Spot Image 5, Image R as the representations of climate factors, Image K as the representations of soil factors, Image LS as the representations of topographic factors, Image C as the representations of vegetation factors, and Image P as the representation of land management factors. Each input image is presented in Figure 2. The mathematical calculation of ANN from the image input data above uses the backpropagation algorithm done by following the illustration in Figure 3 where the ANN parameters used refer to Table1.

Results and discussion
The recognition of input data patterns with ANN is done using 23 trials and error simulations with the different levels of accuracy as presented in Table 2. Arif and Danoedoro [2] conducted 20 ANN simulations to get the best classification results.  Table 2 shows that the lowest accuracy occurs in ANN 4, ANN 5, and ANN 7 simulations. ANN 1 simulation only produces an accuracy of 11.68%; in ANN 4 the same parameters are tried using 2 HL, in which the results cannot be executed by the network (error) as happened in ANN 5 and ANN 7 simulations. The examples of outputs produced under error conditions are that among the 30 trained patterns, ANN usually only recognizes two patterns. The parameter on ANN 4 was tried on ANN 5 and ANN 19 by adding the number of iterations; low results were still obtained. This means that the addition of iteration to the momentum values of 0.5, 2 HL, LR 0.001 and RMS 0.0001 is less optimal. Most of the low accuracy occurs in the simulations with 2 HL. HL is used to analyze the impact of ANN training iterations [24]. However, if the simulation with HL 1 produces low accuracy, the number of iterations as in ANN 1 is added with the same parameters (momentum, HL, LR, RMS); the number of iterations is added to 10000 on ANN 2 which results in a higher accuracy of 62.54%. The results of the 23 simulations carried out based on the trials and errors obtain several simulations with an accuracy of >80% as presented in Figure 4.  Figure 4(e) is a classification result with higher accuracy than the overall simulation that has been done. The highest accuracy reaches 98.55% with 2 HL network parameters; LR 0.001; M 0.9; RMS 0.0001; iteration of 20000. The results were higher than those produced by [24], which reached 95% in the land use classification. The accuracy of ANN training is influenced by the input dataset (erosion factors) and changes in the network parameters [1][1]. These findings provide an opportunity for future research to test other ANN algorithms and validation methods for pattern recognition results to determine the most optimal method. Arif