An entropy-LVQ system for S & P 500 downward shifts forecasting

Article history: Received July 20, 2011 Accepted 7 October 2011 Available online 8 October 2011 The purpose of this paper is to predict the S&P500 down moves with technical analysis indicators using learning vector quantization (LVQ) neural networks and probabilistic neural networks (PNN). In addition, entropy-based input selection technique is employed to improve the prediction accuracies. The out-of-sample simulations show that LVQ outperforms PNN. In addition, the Entropy-LVQ system achieved higher accuracy in comparison with the literature. © 2012 Growing Science Ltd. All rights reserved.


Introduction
The stock market is an attractive investment to make high profits.However, forecasting stock market movements has been a major challenge for investors and scholars since it is complex, noisy, and nonstationary.Moreover, stock market follows a random walk and is time varying (Chang & Liu, 2008).However, in recent years soft computing techniques have been widely used to model and predict stock market prices because of their abilities to handle uncertainty and noise in the stock market (Atsalakis & Valavanis, 2009;Yao & Herbert, 2009).In addition, many papers in the literature have employed technical analysis indicators to predict stock market movements because of their simplicity.For instance, this technique analyzes the market history in order to forecast its actual and future behavior.In particular, the price movement is predictable since stock market trends are observable and can be easily detected (Gorgulho et al., 2011).Indeed, technical indicators were successfully employed to model international stock market movements with artificial neural networks (Chavarnakul & Enke, 2008, Chang et al., 2009;Majhi et al., 2009), fuzzy logic (Chang & Liu, 2008), genetic programming (Chen et al., 2009), support vector machines (Wen et al., 2010), genetic algorithms (Gorgulho et al., 2011), and probabilistic neural networks (Schierholt & Dagli, 1996;Prokhorov & Wunsch, 1995, 1998;Mehrara et al., 2010).The purpose of our work is to predict the S&P500 price index downs by 0.1%, 0.2%, 0.3%, and 0.4% using technical indicators as inputs to learning vector quantization (LVQ) neural networks.Indeed, investors fear losses below a certain limit (March & Shapira, 1982).Therefore, we examine the effect of loss limit on the prediction accuracy of the LVQ.The LVQ is a supervised competitive neural network introduced by Kohonen (1995).
It operates by direct definition of the class borders according to the nearest neighbor rule.For instance, the classes that the competitive layer defines depend only on the distance between input vectors.The LVQ was chosen for its good performance for complex classification problems and fast learning (Kohonen, 1995;El-Banna et al., 2008), and for its capability to process large input data with a small computational burden (Kohonen, 1995;Tosaka et al., 2001).These features have also made LVQ popular in engineering classification problems (Tosaka et al., 2001;Dieterle et al., 2003;Lee et al., 2004;Mala et al., 2006;El-Banna et al., 2008).
The LVQ have also been employed in business and financial modeling applications; including prediction of financial distress (Brokett et al., 2006), exchange rates (Stahlbock, 2008), financial failure of banks (Boyacioglu et al., 2009), bankruptcy (Neves & Vieira, 2009), and credit rating (Chen et al., 2011).The performance of the LVQ will be compared to the performance of the probabilistic neural network (PNN) due to its effectiveness in financial classification problems (Schierholt & Dagli, 1996;Prokhorov & Wunsch, 1995, 1998;Bensic et al., 2005;Mehrara et al., 2010).However, to the best of our knowledge the LVQ neural network has not been used for stock market modelling and forecasting.Therefore, the main goal of this paper is to examine its effectiveness in the prediction of S&P500 movements.In addition, unlike previous works, a filterbased technique is employed to identify the relevant inputs; namely technical indicators.For instance, entropy is employed to perform this task.
The remaining of this paper is organized as follows.Section 2 introduces our methodology and data.In Section 3, the results are presented and discussed.Finally, we conclude the paper along with lines of future research in Section 4.

Data and methodology
This section describes our data and the artificial neural networks (ANN) used for forecasting the stock market ups and downs; namely the learning vector quantization (LVQ) and the probabilistic neural network (PNN).For the models examined, we let to be the limit1 of loss, X= ( ) represent the technical indicators available at time t which are inputs into the predictive systems (LVQ and PNN) used to classify actual inputs into one of two groups 1 m and 2 m (up and below the predetermined limit or floor respectively) at some future time t + 1.In particular, the output (group) takes the value -1 for a downtrend below the floor (limit) and +1 for an uptrend above the floor (limit).In this paper, we use time series data from the S&P500 price index for the period of October 21st, 2003 to January 29th, 2008.
Fig. 1 exhibits the S&P500 price index, and Table 1 lists the technical indicators used in our study.In order to select relevant indicators to be fed to the classifiers, the indicators are ranked based on their entropy statistics.For instance, the smaller a technical indicator's entropy is, the more discriminatory it is.The entropy is a measure of the uncertainty of a random variable, and it is considered in our study because of its effectiveness in removing both irrelevant and redundant inputs (Sanchez-Marono et al., 2007) and because it overcomes the issue of data sparseness (Zhu et al., 2010).The entropy of the variable (technical indicator) x after observing the output y (group or class) is defined as: where P(x i ) is the prior probabilities for all values of X, and P(x i |y i ) is the posterior probabilities of X given the values of y the output or the class.We select indicators having an entropy value less than 0.1.Otherwise, the four indicators with the lowest entropy values are selected as inputs.For instance, the values of entropy are sorted in an ascending order to consider those technical indicators with lowest entropy values.
According to the entropy-based ranking shown in Table 2, STOD, X4, DIS, and EMV are selected for -0.10% limit, STOD, DIS, X4, and EMV for -0.20% limit, STOD, DIS, EMV, and BR for -0.30% limit, and STOD, X2, EMV, and BR for -0.4% limit.Finally, all the input data are normalized in the interval [-1,1] to obtain accurate classification results using the classifiers.The normalization is performed as follows:

The learning vector quantization
The LVQ is a competitive neural classification network introduced by Kohonen (1995).It is composed of three layers.The first layer is an input layer with one neuron per input variable.The neurons in the input layer receive data.The second layer is a hidden layer with neurons that learn patterns and perform the classification.The third layer is an output layer with one node for each class to be recognized and classified.The output neurons are associated with input neurons by prototypes.
For instance, the learning is based on the neurons representing prototype vectors and the nearest neighbour approach for classifying data.In particular, the vector of the weights of the connections between all input neurons and one hidden neuron is called a codebook vector which is a prototype representing a labelled class.During the training, the prototypes are updated to optimize the class boundary.In particular, the codebook vector is set close to the presented input vector for correct classifications.Otherwise, it is set in the opposite direction for incorrect classifications.The algorithm of the LVQ is briefly described as follows: a) A map is initialized with a grid of prototypes and class labels.b) An instance x i is selected as input.
c) The Euclidean distance is calculated between x i and each prototype m i .
d) The winner m i is one with the smallest distance is selected as the best matching pattern according to :

The probabilistic neural networks
The PNN was proposed by Specht (1990).The PNN employs Bayesian decision-making theory based on an estimate of the probability density of the data.Then, PNN can identify nonlinear decision boundaries that approach the Bayes optimal.The PNN employs an exponential activation function and the basic network topology consists of four layers.The first layer is the inputs layer.In the second layer, the probability density function (PDF) of each group of patterns is directly estimated from the set of training samples.The third layer performs the summation of all PDFs.Finally, the Bayesian decision is made in the fourth layer.The PDF is assumed to follow a Gaussian distribution.Then, the PDF for a feature vector X to be of a certain category A is given by: where, p is the number of patterns in X, m is the number of the training patterns of category A, i is the pattern number, and σ is the smoothing factor of the Gaussian curves used to construct the PDF.
Finally, the performance of the neural networks classifiers is measures by the correct classification rate (Hit ratio) defined as: To conclude this chapter, the overall methodology is presented in Fig. 2.

Results
In this study, 50% of the data is used for learning and 50% is used for testing (out-of-sample prediction).Indeed, the testing data is used to assess the classification/prediction accuracy of the LVQ and PNN after training.The accuracy of each classifier is given in Figure 3.The simulation results show that the LVQ outperform the PNN in the prediction of the S&P500 decreases by less than 0.4%, 0.3% and 0.1%.For instance, the correct detection (classification) rate achieved by the LVQ is 81.4%, 78.6% and 67.91% for 0.4%, 0.3% and 0.1% loss limit respectively.
On the other hand, the correct detection (classification) rate achieved by the PNN is 18.6%, 75.81% and 32.09% for 0.4%, 0.3% and 0.1% loss limit respectively.However, LVQ underperforms PNN in terms of the forecasting of downward movements by less than 0.2%.For instance, LVQ and PNN obtained respectively 71.63% and 77.21%.In sum, the simulation results show that LVQ is in general more accurate than PNN in the prediction of S&P500 down moves.
In this study, the reason that LVQ outperforms the PNN is that PNN is sensitive to noisy data such as financial time series.Indeed, PNN models the probability density function (PDF) of each group of patterns by assuming that they follow a Gaussian distribution.On the other hand, instead of modeling the class densities, LVQ models the discrimination function defined by codebook vectors and the nearest neighbourhood search between the codebook vectors and data.In addition, LVQ employs a gradient descent style to minimize the error of the vector quantization approximation.As a result, LVQ is less sensitive to noise.In sum, the accuracy of the Entropy-LVQ system was 67.91%, 71.63%, 78.60%, and 81.4% depending on the loss limit.The overall results are interesting since some previous studies reported that stock prices are approximately close to the random walk process and; consequently; an accuracy of 56% in the predictions is a satisfying result for stock forecasting (Walczak, 2001;Qian & Rasheed, 2007).

Fig. 3. Simulation results
Although previous works have employed different technical indicators, classifiers and data sets, their results are given for comparison purpose.For instance, Chavarnakul and Enke (2008) obtained 64.68% to predict the S&P500 using generalized regression neural networks, Chang et al., (2009) achieved 65.2% in the forecasting of the Taiwan stock exchange using an intelligent system that hybridizes back propagation neural network and case based reasoning system, and Yao and Herbert (2009) obtained 62.91% to predict the New Zealand stock exchange using rough sets.Therefore, our simulation results are encouraging as mentioned before.Thus, the Entropy-LVQ system is promising.

Conclusion
The main purpose of this paper is to model and predict S&P500 moves below predetermined loss limits using technical indicators and learning vector quantization (LVQ) neural network.The entropy is employed to reduce the initial technical indicators set from twelve to four relevant and nonredundant predictive inputs.In addition, probabilistic neural networks (PNN) are also employed for comparison purpose.Finally, the accuracy is measured with the correct classification rate.The obtained simulation results show evidence of the superiority of LVQ over PNN in the prediction of S&P500 downward moves by less than 0.4%, 0.3% and 0.1%.In addition, the Entropy-LVQ system achieved higher accuracy than previous works found in the literature.For future researches, different loss limits, inputs selection techniques, and international stock markets will be considered.and Management, 13, 133-150. Boyacioglu, M.A., Kara, Y., & Baykan, O.K. (2009).Predicting bank financial failures using neural networks, support vector machines and multivariate statistical methods: a comparative analysis in the sample of savings deposit insurance fund (SDIF) transferred banks in Turkey.Expert Systems with Applications, 36, 3355-3366. Brockett, P.L., Golden, L.L., Jang, J., & Yang. C. (2006).A comparison of neural networks, statistical methods, and variable choice for life insurers' financial distress prediction.The Journal of Risk andInsurance, 2006, 73 (3), 397-419. Chang, P.C., &Liu, C.H. (2008).A TSK type fuzzy rule based system for stock price prediction.