Firefly Algorithm with Artificial Neural Network for Time Series Problems

Time series classification is a supervised learning method maps the input to the output using historical data. The primary objective is to discover interesting patterns hidden in the data. For the purpose of solving time series classification problems used the multi-layered perceptrons Artificial Neural Networks (ANN). The weights in the ANN are modified to provide the output values of the net, which are much closer to the values of the preferred output. For this reason, several algorithms had been proposed to train the parameters of the neural network for time series classification problems. This study attempts to hybrid the Firefly Algorithm (FA) with the ANN in order to minimize the error rate of classification (coded as FA-ANN). The FA is employed to optimize the weights of the ANN model based on the processes. The proposed FA-ANN algorithm was tested on 6 benchmark UCR time series data sets. The experimental results have revealed that the proposed FA-ANN can effectively solve time series classification problems.


INTRODUCTION
Time series data are produced, continuous and are processed within a wide range of application domains in different fields such as economics, engineering, science, medical and sociology. There has been an explosion of interest within the data mining community in pattern discovery, segmenting, clustering and classifying time series (Fu, 2011;Keogh et al., 2001;Lin et al., 2003Lin et al., , 2004. In recent years, the popularity of time series data has been growing rapidly. This type of data occurs every second, every minute, hourly and daily in several applications. This kind of data produces massive volumes of data stamped with time and this time is one of the essential aspects of many real-life data which people want to use to predict movements or trends in data. The areas that have occupied the attention of most time series data mining research are classification, clustering, detection and rule discovery. Time series data mining refers to data mining performed on temporal data. The primary objective is to discover interesting patterns hidden in the data (Fu, 2011).
Many research studies have been conducted on issues related to time series data, for instance, time series classification. However, in the time series classification problem there are labels for each time point. A classifier is trained using the values and labels from the time points of the training set. Given a new time series, the task is to label all time points. This is known as sequential supervised learning and is related to prediction (Fu, 2011).
However, studies on the use of time series classification in actual applications is lacking in the literature. Artificial Neural Network (ANN) is a widely used tool for the prediction of time series that has been used in several different applications. ANN has also been applied in time series applications for predication and classifications (Hüsken and Stagge, 2003).
In this study, we investigate the search capability of the Firefly Algorithm (FA) in finding optimal values for the weights of the ANN. To our knowledge, this is the first attempt where FA is experimented with ANN for time series classification problems. The ANN is employed first to obtain the initial solution and later the weights of the ANN will be adjusted by the FA in order to handle the time series data problem and minimize the error rate.

LITERATURE REVIEW
Firefly algorithm: A population based FA was developed by Yang (2009) for solving optimization problems. Short and the rhythmic flashing light produced by fireflies are the motivational aspect of the FA, The flashing lights of the fireflies enable them to attract each other and assist them to, find their mates, reach their prey and protect themselves from their predators, by creating a sense of fear in the minds of enemies (Apostolopoulos and Vlachos, 2011). It is noteworthy that, the less bright fireflies are easily attracted by the bright fireflies. This procedure can be created as an optimization algorithm, since the flashing light can be designed to be coordinated with the fitness function to be optimized. The firefly algorithm follows three rules: • Fireflies must be unisex • Lighter firefly is attracted towards the randomly moving brighter fireflies • The brightness of every firefly symbolizes the quality of the solutions The diversity in the Firefly Algorithm (FA) optimization is depicted by the random movement component, whilst the intensification is unconditionally manipulated by the attraction of various fireflies and the strength of attractiveness. As opposed to the other meta-heuristics, the association between exploration and exploitation in FA are relatively inter-connected; this might be a significant factor for its success, in solving multi-objective and multi-modal optimization problems. Łukasik and Żak (2009) have employed FA for addressing continuously constrained optimization tasks and has consistently outperformed the particle swarm optimizer. Yang (2009) has employed and compared firefly algorithm with particle swarm optimization, for various test functions and has obtained better results in terms of efficiency and success rate, as against particle swarm and genetic algorithm. The transmitting ability of FA gives better and quicker convergence, towards the optimality. A similar work has been carried out by Yang and Deb (2010) and experimental results have revealed that, firefly algorithm has retavely outperformed other approaches. Sayadi et al. (2010) has proposed firefly with local search for minimizing makespan in permutation flow scheduling problems. The initial results have indicated that, the proposed method has outshined the ant colony algorithm. Gandomi et al. (2011) has applied firefly algorithm for mixed variable structural optimization problems. The investigations have revealed that, firefly has performed better than particle swarm optimization, genetic algorithm, simulated annealing and harmony search algorithms. Ultimately another study on firefly algorithm can be found Apostolopoulos and Vlachos (2011). The success of the firefly motivates this present study to further investigate its performance over the time series problems.

METHODOLOGY
Firefly algorithm with Artificial Neural Networks (ANN): Rumelhart et al. (1986) have proposed the multi-layered perceptrons (Artificial Neural Networks) for the purpose of solving time series classification problems. Zhang (1999) has reviewed current works in Artificial Neural Networks (ANNs). A multi-layer neural network comprises of a huge volume of units (neurons), which are linked with each other in a pattern of connections. Generally the units in a net are divided into three classes: • Input units (receive information to be processed) • Output units (which, presents the outcomes of the processing) • Hidden-units (which, lie between the input and output units) Feed-forward ANNs enables one-way transmission of the signals i.e., from input to output. Initially, the network is trained on a collection of combined data, to map the input-output. The weights of the associations between neurons are then set and the network are utilized to identify the classifications of a new set of data.
Throughout the process of time series classification, the signal at the input units completely distributed via the net, to establish the initial values at all the output units. Every single input unit has an initial value, which signifies some external feature of the net. Later, each and every input unit transmits its initial value to all the associated hidden units. All these hidden units compute their own initial value and the signals are then sent to the output units. The initial value of each receiving unit is computed in accordance with simple activation function. The function chunks the contributions of all sending units, in which, the contribution of a unit is identified as the weight of the connection between the sending and receiving units, multiplied by the initial value of the sending unit. Normally this value is then further customized, such as, by modifying the initial sum to a value between 0 and 1 and/or by establishing the initial value to zero, until a threshold level of that sum is achieved. The ANN relies upon three fundamental aspects such as, input and activation functions of the unit, network architecture and the weight of each input connection. Provided that, the 1st two aspects are set, the performance of the ANN is characterized by the present values of the weights. The weights of the net to be trained are originally fixed to arbitrary values and then samples of the training set are continuously revealed to the net. The values for the input of a sample are entered to the input units and the output of the net is compared with the preferred output for this sample. Later, all the weights in the net are marginally modified to provide the output values of the net, which are much closer to the values of the preferred output. A lot of multi-layered perceptrons algorithms are available, for training networks (Neocleous and Schizas, 2002).
In this study, the FA is employed to optimise the weights of the ANN model, denoted as FA-ANN, to obtain the optimal parameter settings for training the network of ANN and to minimize the error rate.
The pseudo code of the FA is shown in Fig. 1. At the first, it generates the initial population of candidate solutions for the given problem (here, the weights of the ANN). After that, it calculates the light intensity for all fireflies and finds the attractive firefly (best candidate) within the population. Then, calculate the attractiveness and distance for each firefly to move all fireflies towards the attractive firefly in the search space. Finally, the attractive firefly moves randomly in the search space. This process is repeated until a termination criterion is met i.e., the maximum number of generations is reached.
Normally, the quality of the time series classification is measured based on the error rate that can be calculated based on True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN) as shown in Fig. 1 (Al -Obeidat et al., 2010;Yeh et al., 2007).
The procedure starts from an initial population of randomly generated individuals. The quality of each individual is calculated using Eq. (1) and the best solution among them is selected.
In FA, the form of attractiveness function of a firefly is depicted by the following: where, r = The distance between any two fireflies β 0 = The initial attractiveness at r = 0 and set to 1 in this study γ = An absorption coefficient which controls the decrease of the light intensity and also set to 1 in this study The distance between any two fireflies i and j, at positions xi and xj, respectively, can be defined as a Cartesian or Euclidean distance as follows: where, d is the dimensionality of the given problem.
The movement of a firefly i which is attracted by a brighter firefly j is represented by the following equation: (3) In Eq. (3), the first term is the current position of a firefly, the second term is used for considering the attractiveness of a firefly (attractive firefly), towards the intensity of light by neighbouring fireflies and the third term is used for the random movement of a firefly (random part), when it lacks the brighter ones. The coefficient α is a randomization parameter determined by the problem of interest, while rand is a random number generator consistently distributed in the space (0, 1) (Hung et al., 1997). In Eq. (4), the movement of the best candidate is done randomly.

EXPERIMENTAL RESULTS
The proposed algorithm is implemented using MATLAB and simulations are performed on an Intel Pentium 4, 2.8 GHz computer. We execute 10 independent runs for each datasets.

Benchmark datasets:
This experiment is performed on 6 datasets that can be freely downloaded from the UCR Time Series Classification/Clustering Homepage: www.cs.ucr.edu/~eamonn/time_series_data. The data    contains 6 time series data sets, which come from different domains (Table 1). All the UCR data sets are categorized as having similar complexity to real-world data sets with the data sets based on several criteria. All the 6 benchmark UCR time series data sets have a moderate to high time series length that ranges from 96 to 637 time series length.
Parameter settings: Table 2 shows the parameters for the proposed algorithm which were determined after some preliminary experiments. Table 3 presents the comparison of the error rate (%) between FA-ANN and ANN time series classification techniques with 6 datasets. The results clearly indicate that the hybrid method (FA-ANN) has outperformed the ANN algorithm on all datasets. For example, in the Gun-Point dataset the ANN has achieved 11.33% error rate, while the proposed FA-ANN obtained 00.08% of error rate. It is due to capability of the FA which incorporated into ANN to find the optimal weights for the ANN and consequently increase the performance of the ANN. This is believed that fireflies come together more closely around the optimal solution. In other words, it has good exploitation capability and can find better solutions as many candidates (fireflies) are gathered near optimal solution.

Experimental results:
Comparison with state-of-the-art: Table 4 shows the comparison of the results of FA-ANN and other available approaches in terms of error rate classification using 6 datasets. The best results are presented in bold.
The experimental results indicate that the proposed hybrid method (FA-ANN) outperforms other approaches on four out of six datasets (i.e., Gun-Point, Wafer, ECG and Coffee). FA-ANN is able to classify the Wafer with error rate of 0.004%. This capability is supported by the feature of the attractiveness i.e., the density of the light that caused the fireflies to be brighter (is determined by the value of the objective function) and attract to the location of near optimal solutions.

CONCLUSION
In this study a hybrid method based on the ANN and FA is proposed for solving time series classification problems. Initial solutions are generated at random using ANN and the improvement is carried out by the FA that tries to optimize the weights of the ANN. Experiments results using 6 benchmark UCR time series data sets show that the proposed FA-ANN outperforms the ANN on all datasets. Further comparison with other approaches in the literature shows that the hybrid method is able to minimize the error rate with new best results on 4 out of 6 datasets. As an extension of this study, further investigation will be devoted to validate the hybridization between FA with local search algorithm for the purpose of creating a balance between the exploration and exploitation during the optimization process and to avoid the premature convergence.