AN APPROACH TO IMPROVE FUNCTIONAL LINK NEURAL NETWORK TRAINING USING MODIFIED ARTIFICIAL BEE COLONY FOR CLASSIFICATION TASK

Classification is one of the most frequent studies in the area of Artificial Neural Network (ANNs). One of the best known types of ANNs is the Multilayer Perceptron (MLP). However, MLP usually requires a rather large amount of available measures in order to achieve good classification accuracy. To overcome this, a Functional Link Neural Networks (FLNN), which has single layer of trainable connection weight is used. The standard method for tuning the weight in FLNN is using a Backpropagation (BP) learning algorithm. Still, BP-learning algorithm has difficulties such as trapping in local optima and slow convergence especially for solving non-linearly separable classification problems. In this paper, a modified Artificial Bee Colony (mABC) is used to recover the BP drawbacks. With modifications on the employed bee’s exploitation phase, the implementation of the mABC as a learning scheme for FLNN has given a better accuracy result for the classification tasks.


INTRODUCTION
Artificial Neural Networks (ANNs) have been successfully applied to a variety of real world classification task especially in industry, business and science (Widrow, Rumelhart and Lehr, 1994;Zhang, 2000).One of the best known types of ANNs is the Multilayer Perceptron (MLP).The MLP is a model that maps sets of input data onto a set of appropriate output.MLP network is usually trained by adjusting the weight of connection between neurons.However, MLP usually requires a rather large amount of available measures in order to achieve good classification accuracy.To overcome this, a Functional Link Neural Networks (FLNN) which has single layer of trainable connection weights is used.The FLNN is a flat network without an existence of hidden layer which makes the network architecture less complicated (Misra and Dehuri, 2007).
The standard method for tuning weights in FLNN is using a Backpropagation (BP) learning algorithm.The BP-learning developed by Widrow et al. (1994) is well-known and widely used for training a Neural Networks.It is a type of supervised learning method where the neural network model will "learn" through a sample set of data called training set.The training set provides the network with examples of inputs and desired outputs to be computed and the error (difference between actual and expected results) is propagated backward to calculate the gradient error in order to modify the weights.The idea of the BP-learning algorithm is to reduced error, until the networks learned the training data.The training began with random weights, and the goal is to make adjustment until the minimal error is achieved.However, one of the crucial problems with the standard BP-learning algorithm is that the gradient search techniques tend to easily get trapped in local minimum especially for non-linear separable classification problems (Dehuri and Cho, 2010).
To recover the drawback of BP-learning, the Artificial Bee Colony (ABC) optimization algorithm is used to optimize the FLNN weights.The ABC algorithm was originally proposed by Karaboga (2005) for solving numerical optimization problem by simulating the intelligent foraging behavior of a honey bee.In this study, a modified ABC is used for training the FLNN.The modification is implemented on the employed bees' foraging behavior to improve the FLNN network ability on searching the optimal weights set for better accuracy results on classifying the out-of-sample or unseen data.

FUNCTIONAL LINK NEURAL NETWORK
Functional Link Neural Network (FLNN) is a class of Higher Order Neural Networks (HONNs) that utilize higher combination of its inputs (Pao, 1989;Patra and Bornand, 2010).It was created by Klassen and Pao (1989) and has been successfully used in many applications such as system identification (Patra and Bornand, 2010;Patra and Kot, 2002;Abbas, 2009); channel equalization (Patra and Pal, 1995), classification (Raghu, Poongodi and Regnanarayana, 1995;Abu-Mahfouz, 2005;Liuet al., 1994;Dehuri and Cho, 2010), pattern recognition (Klaseen and Pao, 1990;Park and Pao, 2000) and prediction (Majhi, Panda and Sahoo, 2009;Ghazali, Hussain and Liatsis, 2011).FLNN is more modest than MLP as it has a single-layer of trainable weights compared to the MLP whilst able to handle a non-linear separable classification tasks.The flat architecture of FLNN also make the learning algorithm in the network less complicated (Misra and Dehuri, 2007).In order to capture non-linear input-output mapping for a classification task, the input vector of FLNN is extended with a suitable enhanced representation of the input nodes which artificially increase the dimension of input space (Pao, 1989;Pao and Takefuji, 1992).
The focus of this paper is on FLNN with generic basis architecture.This architecture uses a tensor representation.Figure 1   Most previous learning algorithm used in the training of FLNN, is the BP-learning algorithm (Misra and Dehuri, 2007;Dehuri and Cho, 2010;Abu-Mahfouz, 2005;Ghazali, Hussain and Liatsis, 2011;Haring and Kok, 1995;1997;Sierra, Macias and Corbacho, 2001;Dehuri, Mishra and Cho, 2008).As shown in Figure 1, the weight values between enhanced input nodes and output node are randomly initialized.The output node,  ̂ of FLNN would correspond to the input pattern x and the number of input patterns, n.For tensor representation with single output node, the enhanced input can be noted as n+n(n-1)/2.Let the enhanced input node of tensor x be represented as   = 〈 1 ,  2 , …   ,  1  2 ,  1  3 , …  −1   〉 and let f denotes the output node's activation function (logistic sigmoid) as per in this work: The output value of the FLNN is obtained by:  ̂= () (2) where  ̂ is the output while denotes the output node activation function and  is the bias.In Eq. ( 1),   is the aggregate value which is the inner product of  and   .The square error E, between the target output and the actual output will be minimized as: (3) Where   is the target output and  ̂ is the actual output of the ith input training pattern, while n is the number of training pattern.During the training phase, the BP-learning algorithm will continue to update w and b until the maximum epoch or the convergent condition is reached.
Although BP-learning is the mostly used algorithm for FLNN training, the algorithm however has several limitations; it tends to easily gets trapped in local minimum especially when dealing with non-linear separable classification problems.The convergence speed of the BP learning also can gets too slow even if the learning goal and a given termination error achieved.Besides, the convergence behavior of the BP-learning algorithm is very dependable on the choices of initial values of the network connection weights as well as the parameters in the algorithm such as the learning rate and momentum (Dehuri and Cho, 2010).

STANDARD ARTIFICIAL BEE COLONY OPTIMIZATION
The Artificial Bees Colony (ABC) algorithm is an optimization tool, which simulates the intelligent foraging behavior of a honey bee swarm for solving multidimensional and multimodal optimization problem (Karaboga, 2005).In this model, three groups of bees which are employed, onlooker and scout bees determined the objects of problems by sharing information to one another.The employed bee uses random multidirectional search space in the Food Source area (FS).They carry the profitability information (nectar quantity) of the FS and share this information with the onlookers.Onlooker bees evaluate the nectar quantity obtained by the employed and bees and choose FS depending on the probability value base on the fitness.If the nectar amount of FS is higher than that of the previous one in their memory, they memorize the new position and forget the previous one (Karaboga, 2005).The employed bee whose food source has been abandoned becomes a scout and starts to search for finding a new food source randomly.The following is the standard ABC pseudo code: 1) Cycle = 0 2) Initialization population of scout bee with random solution xi,ji=1,2…FS 3) Evaluate fitness of the population 4) Cycle = 1:MCN 5) form new population  , for the employed bees using: Wherek is a random selected solution in the neighbourhood ofi, Φ is a random number in the range [-1,1] while j is a random selected dimension vector in i and evaluate them 6) Apply greedy selection between  , and  , 7) Calculate the probability values pifor the solutions xiusing:

FLNN WITH MODIFIED ABC LEARNING SCHEME
In this study, a modified ABC (mABC) is used as a learning scheme for training the FLNN.The modification is done on the part of employed bees' foraging phase so that they would exploit all weights and biases in the FLNN architecture in order to improve the network ability on searching the optimal weights set.In standard ABC algorithm, the position of a food source (FS) represents a possible solution to the optimization problem, and the nectar amount of a food source corresponds to the profitability (fitness) of the associated solution.In the case of training the FLNN with ABC, the weight,  and bias,  of the network are treated as optimization parameters to the optimization problem (finding minimum Error, E) presented in Eq. ( 3).The FLNN optimization parameters is treated as D-dimensional vector for the solution xi,j where ( = 1, 2, … , ) and ( = 1, 2, … , ) and each vector  is exploited by only one employed bee.In order to produce a candidate food source  , from the old one xi,j in memory, the ABC uses Eq. ( 4) where  ∈ {1,2, … , } and both k and j are a randomly chosen indexes.The food source of xi,j can be represented in a form of  =  ×  matrix.
As can be seen from Eq. ( 4) and matrix representation from Eq. ( 7), for each row of FS only one element from D will be chosen randomly and exploited by the employed bee by using:  = ( * ) + 1; (8) However in the case of FLNN mainly for classification tasks which are always deal with large number of optimization parameters (weights + bias), exploiting one element in each solution vector xi will cause longer foraging cycle in finding the optimal solution (Mohmad Hassim and Ghazali, 2013).Random selection of elements in each vector xi during employed bee phase also leads to a poor ability for FLNN network in finding the optimal weights set which result to a low classification accuracy on unseen data (Mohmad Hassim and Ghazali, 2012).
To overcome this, we eliminate the random employed bee behavior in selecting the elements in vector dimension as in Eq. ( 8).In the other hand, we direct the employed bee to visit all elements in D to exploit them before evaluating the vector xi.The modified ABC is performed as shown in pseudo code below, where the box indicates the improvement made to the standard ABC: 1) Cycle = 0 2) Initialize FLNN optimization parameters, D 3) Initialize population of scout bee with random solution xi,i=1,2…FS 4) Evaluate fitness of the population 5) Cycle = 1:MCN 6) Form new population (  ) for employed bees i. select solution, k in the neighbourhood of i, randomly ii.
Direct employed bee to exploit nectar value of j in population ( , ) using Eq. ( 4) where = 1, 2, … ,  is a dimension vector in i iv.
exit loop when j = D; 7) evaluate the new population (  ) 8) Apply greedy selection between   and   9) Calculate the probability values pifor the solutions xiusing Eq. ( 4) 10) Produce the new solutions υifor the onlookers from the solutions xiselected depending on pi and evaluate them 11) Apply the greedy selection process for onlookers 12) Determine the abandoned solution for the scout, if exists, and replace it with a new randomly produced solution xi using Eq. ( 5) 13) Memorize the best solution 14) cycle=cycle+1 15) Stop when cycle = Maximum cycle number (MCN).

SIMULATION RESULTS
In order to evaluate the performance of the FLNN model trained with modified ABC (FLNN-mABC) for classification problems, simulation experiments were carried out on a 2.30 GHz Core i5-2410M Intel CPU with 8.0 GB RAM in a 64-bit Operating System.The comparison of mABC algorithm with standard BP training and standard ABC algorithms is discussed based on the simulation results implemented in Matlab 2010b.In this work we considered 3 benchmark of multiclass classification problems obtained from the UCI Machine Learning Repository (Frank and Asuncion, 2010); IRIS dataset, GLASS Identification and THYROID Disease dataset.
During the experiment, simulations were performed on the training of the second order FLNN architecture with Backpropagation algorithm (FLNN-BP), second order FLNN architecture with ABC algorithm (FLNN-ABC) and second order FLNN architecture with modified ABC algorithm (FLNN-mABC).The best training accuracy for every benchmark problems were noted from these simulations.The Learning rate and momentum used for the FLNN-BP were 0.3 and 0.7 with the maximum of 1000 epoch and the minimum error=0.001as for the stopping criteria.Parameters setup for the both FLNN-ABC and FLNN-mABC however, only involved the setting up of stopping criteria of maximum 1000 cycles and minimum error=0.001.The activation function used for the network output for both MLP and FLNN is Logistic sigmoid function.Table 1 below summarized the parameters considered in this simulation.Ten trials were performed on each simulation of the FLNN-BP, FLNN-ABC and FLNN-mABC with the best accuracy result is noted from these 10 trials.In order to generate the training and test sets, each datasets were randomly divided into two equal sets (1 st -Fold and 2 nd -Fold).Each of these two sets was alternately used either as training set or as a test set.The average accuracy values of each datasets result were then used for comparison.
IRIS CLASSIFICATION PROBLEM This is a classical classification database made famous by Fisher (Fisher, 1936), who used it to illustrate principle of discrimination analysis.The dataset consists of 4 features and 3 classes of 50 instances each, where each class refers to a type of iris plant; Iris Setosa, Iris Versicolour and Iris Virginica.The architecture of FLNN up to second order for IRIS classification Problem is 10-3.The total of trainable weights involved in the training of FLNN is 33 (weights and biases).The classification result on unseen data in term of percentage accuracy is presented as figure 2 below.From figure 2, it can be seen that FLNN trained with modified ABC (FLNN-mABC) gives the best accuracy result of 94.7% on the unseen data which significantly outperformed both FLNN-BP and FLNN-ABC with the percentage difference of 2.7% and 5.4% respectively.

GLASS IDENTIFICATION PROBLEM
The glass identification database was donated by VinaSpiehler with a total of 214 instances.The dataset consists of 9 features and 6 classes, where each class refers to a type of glass: building windows float, building windows non-float, vehicle windows float, containers, tableware and headlamps.The architecture of FLNN up to second order for Glass Identification Problem is 45-6.The classification result on the unseen data in term of percentage accuracy for Glass Identification Problem is presented as figure 3 below.From figure 3, it is also worth noticing that FLNN trained with modified ABC (FLNN-mABC) gives 2.8% higher accuracy than FLNN-BP and 3.2% higher accuracy than FLNN-ABC in classifying an unseen data.According to the previous results in general, the modified ABC algorithm (mABC) offers better learning scheme for training the FLNN than the standard ABC and standard BPlearning.The experimental results from 3 classification problems have demonstrated that the proposed algorithm (mABC) performed highest accuracy instead of the standard ABC and BPlearning and may be used as alternative learning scheme for training the FLNN for classification tasks.

CONCLUSION
In this work, these authors evaluated the FLNN-mABC model for the task of pattern classification for multiclass classification problems.The experiment has demonstrated that FLNN-mABC performs the classification task quite well.For the case of IRIS, GLASS and THYROID, the simulation result shows that the proposed modified ABC algorithm can depicts the functional link neural network structure up to the second order with 3 inputs.The first order of the network consist of the 3 inputs x1, x2 and x3, while the second order of the network is the extended input based on the product unit x1x2, x1x3, and x2x3.The learning part of this architecture on the other hand, consists of a standard BP-learning as the training algorithm.

FIGURE 1 .
FIGURE 1.The 2 nd order FLNN structure with 3 inputs FLNN LEARNING SCHEME the new solutions υifor the onlookers from the solutions xiselected depending on pi and evaluate them 9) Apply the greedy selection process for onlookers 10) Determine the abandoned solution for the scout, if exists, and replace it with a new randomly produced solution xi using  when cycle = Maximum cycle number (MCN).

FIGURE 3 .
FIGURE 3. Glass classification accuracy by FLNN-BP, FLNN-ABC and FLNN-mABC THYROID CLASSIFICATION PROBLEM This dataset was created based on Thyroid Disease problem dataset donated by Stefan Aberhard from the UCI repository of machine learning database.The dataset deals with diagnosing a patient thyroid function.The dataset has 215 instances which consist of 5 features and 3 classes of thyroid functions; normal, hyper and hypo functioning.The architecture of FLNN up to second order for Glass Identification Problem is 45-6.The classification result in term of percentage of accuracy on the unseen data for Glass Identification Problem is presented as figure 4 below.