Hybrid of Particle Swarm and Levenberg Marquardt Optimization in Neural Network Model for Rainfall Prediction

Neural network model has undergone many modifications from the original model through the development of network architecture and optimization methods. By default, the gradient-based optimization method is a method used to find network weights. But some of the weaknesses and limitations of these methods inspired many researchers to try to use other methods. Non-gradient based heuristic method is a reasonable choice considering the learning algorithm in artificial neural networks is inspired by the characteristics of creatures so that optimization methods that also mimic life patterns in nature will be appropriate. One disadvantage of the heurisic method is the length of the iteration process. In this paper, a method which combines heuristic optimization methods and gradient-based methods is applied, namely particle swarm optimization (PSO) and Levenberg Marquardt. The weight obtained from the PSO method becomes the initial weight for the Levenberg Marquardt method. The proposed procedure is applied to the rainfall data in Cokrotulung Klaten. The results showed that this procedure succeeded in providing better predictions than the Levenberg Marquardt method.


Introduction
Neural network modeling in time series has experienced rapid development and has been applied in various fields. The interesting things for many researchers are the method to determine the optimal input, the number of units in the hidden layer, the activation function used in the hidden layer, and the optimization method used to get the optimal weight. Various modeling procedures have also been carried out to obtain optimal architecture, both theoretical and applied studies. The optimization method to get optimal weight becomes one of the main focuses in neural network modeling. By default, standard optimization methods used to estimate network weights are gradient-based methods. Therefore, the activated activation function is a continuous and differentiated function. In the next stage, various heuristic optimization methods for optimizing a function also experienced many developments. This is also supported by advances in computing so that this facilitates the development of various new optimization methods that do not use gradients. Likewise, statistical and mathematical modeling has progressed so that various alternative models are formed to get better predictions. These models are increasingly complex, the consequence of which is the need for an appropriate optimization technique to obtain parameter estimates. Development of heuristic optimization methods for optimizing a function is a new chapter in the field of statistical modeling. These methods are then used as a way to get parameter estimates from alternative models. Some of the optimization methods include genetic algorithm, ant colony, simulated annealing and particle swarm optimization. One disadvantage of the non-gradient method is that it requires a longer iteration time than the gradient-based method. Therefore, it is important to conduct experiments to combine the two methods so that the results are more efficient. This article discusses the use of particle swarm optimization as a method for determining the initial weight of a gradientbased optimization method, Levenberg Marquardt. The use of the Levenberg Marquardt method in the neural network model has been widely carried out. In fact, it becomes the default optimization method in the Matlab program for neural networks. Several previous studies relating to the use of particle swarm optimization for weight optimization also have been carried out [1][2][3]. In this paper, the use of Levenberg Marquardt for optimizing neural network models is done by using the PSO algorithm as a tool for determining initial weights. As an application, the proposed procedure is applied to the rainfall data at ZOM 136 Cokrotulung, Klaten. Rainfall prediction using neural networks has also been done by Lin and Wu [4].

Methods
In this study, network architecture has been determined. PSO optimization is used first, then the weight obtained is used as the initial value of the Levenberg Marquardt method. Explanations for each stage of modeling are explained in the following sections. The proposed procedure has

Neural Network
Neural network architecture consists of three layers, namely the input layer, the hidden layer and the output layer. In neural network modeling for time series data, the input layer consists of past data to lag p. In this study, determining the input using the best ARIMA method. The neural network architecture for time series data is presented in Figure 1. The architecture can be written in the form of a mathematical model as follows: (1) where is an activation function in the hidden layer, the weight vector consists of the weight of the input layer to the hidden layer (w ij ), the weight of the bias to the hidden layer (w bj ), the weight of the hidden layer to the output layer (w j ) and the weight of the bias to the output layer (w b ). In the standard backpropagation algorithm, the weights are optimized using certain gradient-based optimization methods. Initial weights are usually chosen randomly. In this study, PSO is used as a method for determining initial weights, then the gradient-based LM method is used to obtain the final weights.

Hybrid Particle Swarm Optimization-Levenberg Marquardt
PSO is inspired by the social environment of the life of a group of birds or fish that interact with each other. Without colliding, they move together towards a place by forming certain formations. This algorithm was proposed by Kennedy and Eberhart [5] for optimizing nonlinear continuous functions input output Hidden layer and has been applied to machine learning [6]. In PSO optimization, the set of solutions is called a particle and as an initialization is generated randomly. The quality of each particle is evaluated by a fitness function. The particles will move and follow the optimum particles, i.e the particles with the best fitness. Each bird is described as a particle that represents a solution to an optimization problem. These particles have position and velocity. There are two important tasks in the PSO algorithm, namely velocity and position updates. The update process is an iteration form of the PSO algorithm. The particles used to update speed and position will continue to accelerate to the best position of the same particle in the previous iteration and the best particle as a whole. This process continues until the termination procedure is fulfilled. The standard procedure for stopping criteria is the maximum number of iterations or minimum errors. At the beginning of the process, PSO initializes the initial position and initial velocity then at each iteration, PSO will update the position and velocity using the following formula [7]: (2) where : inertia weight : acceleration coefficients : random value from continue uniform distribution (0,1) : the best position of particle i at iteration t : optimum global from particle g at iteration t : position of particle i at iteration t At each iteration to update position and velocity, the weight of inertia is also always updated. The updating of inertia weights is expressed by the following formula: where ρ max : upper bound of ρ ρ min : lower bound of ρ t max : maximum of iteration t : current iteration The initial position of the particles generated randomly and after going through an iteration process the final position of the particles is the weight of the optimization results. The particle speed in the initialization process is given a zero value because at the initial position the particle has not experienced movement. The steps of the PSO algorithm can be described as follows: 1. Determine the value of the initial position and the initial velocity of each particle randomly 2. Calculate the fitness value for each particle 3. Determine pBest and gBest 4. Calculate the weight of inertia 5. Update the position and speed 6. If the stopping criteria are met, return to 2 7. End Determination of fitness in the initial population is very important because it is used to determine the best individual position (pBest) and the best global position (gBest). Fitness compared to the best population in the previous population. If the current results are better than the best global value of gBest, an update of the location and position of gBest is performed. The stages are repeated until the stopping criteria are met. After this stage, the optimal weights are obtained. These weights then become the initial weights of the Levenberg Marquardt method. Levenberg Marquardt algorithm was designed to approach second-order training speed without having to compute the Hessian matrix. This method improves the solution to problems that are much harder to solve by only adjusting the learning rate repeatedly [8]. This represents a procedure for evaluating the inverse of the Hessian using single pass through the data set [9].

Results and Discussion
The data used in this research is ten daily rainfall data in Cokrotulung Klaten, Central Java from January 2010 to July 2018, that is 309 data. The data is divided into two parts, the first is 247 data (80%) as training and the remaining 62 data (20%) as testing. The activation function used in the hidden layer is the sigmoid logistic. The results of the investigation with the ARIMA model show that the input to the neural network model are the three variables in lags 1, 2 and 18 and the number of hidden units the same with the number of input. In the proposed procedure, four scenarios are built. In the first scenario, PSO optimization with a few generations is used, in this case it is determined to be equal to 10, the resulting weights are the initial weights of the Levenberg Marquardt method. In the second scenario, the generation used in PSO is 30, in the third scenario, the generation used is 50 while in the third scenario, the generation used is 100. In all four kinds of scenarios, optimization is continued by the Levenberg Marquardt method of 1000 epochs. The results obtained were compared with original Levenberg Marquardt of 1000 epochs and original PSO of 1000 generations. The results are presented in Table 1. The results of Table 1 show that the hybrid particle swarm method with 30 generations and Levenberg Marquardt with 1000 epochs gave better results than the only Levenberg Marquardt method. The time needed to complete the iteration process on a hybrid method with a few generations at the PSO stage is not much different from the Levenberg Marquardt method. Thus the time needed to complete the iteration with the hybrid method is not an obstacle. Therefore the basis for choosing the optimization method used is purely due to the errors achieved. In this case, the hybrid method is superior. An identical explanation can also be seen from its comparison with the only PSO algorithm. Although not much different, the hybrid method is slightly superior to PSO. The problem is the PSO method which requires much longer time to complete 1000 generations. Taking into account the effectiveness, the hybrid method is preferred for use. This method is able to improve the error rate of the Levenberg Marquardt method and is able to speed up the required iteration time. As an illustration, Fig. 2 displays the results of in-sample and out-sample predictions.

Conclusion
The use of particle swarm to obtain the initial weights of Levenberg Marquardt in neural network model was developed. The proposed procedure has been used in various scenarios. A few generations of PSO was needed for obtaining the optimal weights prediction of the rainfall data by using Levenberg Marquardt. It better than use only Levenberg Marquardt and also faster than the only PSO. The developing of the hybrid procedure also been the interesting future works. It can be done by varying the gradient based optimization methods and make a comparison between them.   2 shows that the results of the in-sample prediction have successfully approached the actual. Seasonal patterns from the original data can be followed by predictions. Likewise with the results in the out-sample data, the predictions can also be close to the actual results. Thus the predicted results of the proposed procedure are able to produce predictions for both in-sample and out-sample predictions. In other words, in addition to being used to build the proposed procedure model, it is also good for forecasting.