Design and analysis of experiments in ANFIS modeling for stock price prediction

Article history: Received 10 September 2010 Received in revised form 4 January 2011 Accepted 6 January 2011 Available online 6 January 2011 At the computational point of view, a fuzzy system has a layered structure, similar to an artificial neural network (ANN) of the radial basis function type. ANN learning algorithms can be employed for optimization of parameters in a fuzzy system. This neuro-fuzzy modeling approach has preference to explain solutions over completely black-box models, such as ANN. In this paper, we implement the design of experiment (DOE) technique to identify the significant parameters in the design of adaptive neuro-fuzzy inference systems (ANFIS) for stock price prediction. © 2011 Growing Science Ltd. All rights reserved


Introduction
Fuzzy systems and neural networks (NN) are considered as two most widely used techniques in intelligent systems. Automatic control, pattern recognition, human-machine interaction, expert systems, modeling, medical diagnosis, economics, etc. are some of these systems' application areas (Echanobe et al., 2008). Obviously, each technique has its own advantages and drawbacks. Fuzzy systems have the ability to represent comprehensive linguistic knowledge and perform reasoning through fuzzy rules. However, fuzzy systems do not provide a mechanism to tune those rules. On the other hand, NN are adaptive systems that can be trained and tuned from a set of input-output data set. Nevertheless, it is very difficult to understand and represent the obtained knowledge.
In general, hybrid systems focus on combination of the advantages of different paradigms in order to overcome their shortcomings. Among these systems, neuro-fuzzy systems are characterized by the combination of neural networks with techniques from fuzzy sets and systems. Neuro-fuzzy systems exhibit the noise robustness and learning capabilities of neural networks together with the ability of fuzzy systems to explicitly model uncertainty, linguistic concepts, and the knowledge of human experts. These systems can combine both fuzzy and neuro paradigms in two different ways (Tsoukalas & Uhrig, 1997): 1) by introducing the fuzzification into the NN structure (i.e., fuzzy NN) and 2) by providing the fuzzy systems with learning ability by means of neural-network algorithms (i.e., NN-driven fuzzy reasoning techniques). In the first case, fuzzification can be introduced in any of the network aspects: neuron inputs, output, weights, aggregation operations, transfer functions, etc. In the second case, NN methods are used both with the aim of identifying rules and membership functions and for tuning the system (Echanobe et al., 2008).
By combining information from different sources, such as empirical models, heuristics, and data, neuro-fuzzy modeling has been recognized as a powerful tool which can facilitate the effective development of models (Babuška and Verbruggen, 2003). Neuro-fuzzy models can describe systems by means of fuzzy if-then rules represented in a network structure, in which learning algorithms known from the area of artificial neural networks (i.e. gradient descent, Levenberg-Marquardt, etc.) can be applied. Interestingly, neuro-fuzzy models tend to gravitate toward meeting a high accuracy requirement that happens at a substantial expense of lowering their transparency. This is somewhat inevitable considering the underlying black-box processing paradigm and various topologies existing in neuro-computing (Pedrycz & Reformat, 2003). Considering that many of the parameters in fact are fuzzy variables and that most often these systems operate in real time, the problem of neuro-fuzzy systems topology and parametric configuration becomes worse (Zanchettin et al., 2005). Determining the ANFIS parameters such as the number and the shape of input membership functions (MFs), the initial rule based construction approach, the number of data points; the clustering algorithms become a difficult designing task. Even in automatically rule based constructive models, the performance of the system still depends on the careful selection of sensitivity threshold, error threshold, and learning rates (Zanchettin et al., 2005). Mostly, the tuning and configuration of ANFIS are performed experimentally. These decisions are usually made in terms of the most popular and prevalent parameters, operators, and algorithms accomplished.
In this way, it is quite worthwhile to determine which variables have the most relevance and the greatest influence on the behavior and the performance of the ANFIS. As a result, the system designer could pay more attention in selection of the parameters which are statistically significant. Zanchettin et al. (2005) performed design of experiment (DOE) technique to verify the interactions and interrelations among parameters in the design of ANFIS and evolving fuzzy neural networks (EFuNN). They considered six factors for each of ANFIS and EFuNN and accomplished their tests to predict points of the time series which are the result of the Mackey-Glass equation integration. The aim of this paper is to extend their work by considering more factors in experimental design and to test their influence on the behavior ANFIS. In this regard, while they considered six factors each in two levels, we make use of nine factors mostly in three levels in experiments. There are three common factors with Zanchettin et al. (2005): input MF number, output MF shape, and training epochs. Besides, we consider six additional important factors which have been tuned for ANFIS modeling. Since stock price prediction is a popular problem and soft computing methods are widely used in order to solve this problem, we performed our experiments in input-output data of an automotive part-making manufacturer company for an Asian stock market.
The rest of the paper is organized as follows: Section 2 presents an overview of the ANFIS and DOE. Section 3 systematically discusses the problem description and experimental design of our problem. Section 4 presents the results of the statistical experiment. Finally, some conclusions and future directions are appeared in section 5.

Adaptive Neuro-Fuzzy Inference System
Adaptive neuro-fuzzy inference systems (ANFIS) represent a neural network approach to the design of fuzzy inference systems (Jang, 1993). Since its introduction, ANFIS networks have been widely considered in the technical literature and successfully applied to classification tasks, rule-based expert systems, prediction of time series, and so on. There are also some revisions and different versions of ANFIS (Panella & Gallo, 2005;Buragohain & Mahanta, 2008;Echanobe et al., 2008;Riverol & Di Sanctis, 2009). ANFIS is a fuzzy inference system that can be trained to model the collection of input-output data. This network makes use of a supervised learning algorithm to determine a nonlinear relationship among inputs and output. According to Kosko (1994), an ANFIS network is particularly suited to solve function approximation problems in several engineering fields.
There are two approaches used by ANFIS: Artificial neural network and fuzzy modeling. Suitable reasoning in quality and quantity might be achieved through composing these two approaches (Teshnehlab et al., 2008). In ANFIS, fuzzy logic is used to determine decision surfaces rather than the uncertainty associated with particular linguistic terms. Also the rule-based representation of neurofuzzy systems offers transparency. For pedagogical purposes, one can imagine a fuzzy inference system with two inputs x and y and one output z. Fig. 1 shows the equivalent ANFIS architecture (Type-3 ANFIS). The node functions in the same layer are from the same function family. The first layer implements a fuzzification, the second layer executes the T-norm of the antecedent part of the fuzzy rules, the third layer normalizes the membership functions, the fourth layer calculates the consequent parameters, and finally the last layer computes the overall output as the summation of all incoming signals (Jang, 1993). The feed forward equations of this ANFIS are as follows: where, • x is the input to node i, is the membership function of i A , • A i is the linguistic label associated with node functions, • w i is the firing strength of the ith rule, • i w is the ratio of the ith rule's firing strength to the sum of all rules' firing strength, Note that the network's output y is nonlinear in terms of w and the training of this ANN is thus a nonlinear optimization problem (Babuška & Verbruggen, 2003). Layer 4 Layer 3 Layer 5 The neuro-fuzzy inference system is optimized by adapting the antecedent parameters and consequent parameters so that a specified objective function (usually a difference between the model output and the actual output) is minimized. A number of methods have been proposed for learning rules. Jang (1993) proposed different methods to update the ANFIS parameters involving gradient descent and least square error (LSE) and high complexity is one of these methods' features. Mascioli et al. (1997) proposed merging of min-max and ANFIS models to determine the optimal set of fuzzy rules. Jang and Mizutani (1996) presented an application of the Lavenberg-Marquardt method, which is essentially a nonlinear least-squares technique, for learning in an ANFIS network. Chen (1999) compared several popular training algorithms for tuning parameters of ANFIS membership functions. Tang et al. (2005) proposed a hybrid system combining a fuzzy inference system and genetic algorithms to tune the parameters in the TSK fuzzy ANN. Shoorehdeli et al. (2009) proposed a novel hybrid learning algorithm with stable learning laws for ANFIS as a system identifier and studied the stability of this algorithm. Their hybrid learning algorithm is based on particle swarm optimization for training the antecedent part and forgetting factor recursive least square for training the conclusion part.

Design of Experiments
Design of experiments (DOE) or experimental design is a method for tuning the input parameters. DOE has very broad applications across all the natural, social, and engineering sciences. Assuming that true experiments are our basic concern, there are three important phases which could be used to make a meaningful study (Hicks, 1999): 1) the experimental or planning phase, 2) the design phase, and 3) the analysis phase. The main steps of each of these three phases are presented in Table 1. The analysis of experiments usually consists of the well-known analysis of variance (ANOVA) test. It is used to perform a systematic decomposition of the variability in the observed response values and to assign portions of the variability either to the effect of an independent variable or to experimental error. The analysis provides information regarding how much each factor and factors interaction contribute to the total variance of the data (Zanchettin et al., 2005).

2.1Problem description
For experimental study, the ANFIS is used to predict points of the price time series which are historical data of the stock price of an automotive part-making manufacturing share from an Asian stock exchange. The candidate variables of the system in the form of input and output are shown in Table 2. The rational combination of 27 technical and fundamental variables is used for prediction. In this table, closing price is the output variable. Prediction in stock market is a hot topic for research.
The modern school's view in stock market is dynamic systems and chaotic behavior of stock prices. From this standpoint, stock price movements have very complex and nonlinear correlations with some variables, which require advanced mathematical modeling. One of the challenges of modern capital market analysis is to develop theories that are capable of explaining these complex movements in asset prices and returns. The study of stock market has led financial economists to apply statistical techniques from chaos theory for analyzing stock market data. Based on these new techniques, recent empirical studies document nonlinearities in stock market data. The neural network model is also appropriate for capturing all the nonlinear dynamic relationships in stock market.   In the study performed with ANFIS, we run a full factorial experiment with different levels of value for each of the factors. We use the six important factors reported by Zanchettin et al. (2005). More specially, for initial rule base construction, the Sugeno and Yasukawa (1993) approach, subtractive clustering approach, and Emami et al. (1999) fuzzy modeling algorithm are used. For testing the effect of different cluster validity index (CVI), three CVI proposed by Fukuyama and Sugeno (1989), Kwon (1998), andFazel Zarandi et al. (2009) are considered.

Experimental design
Three different numbers of data points are used to test the effects of various factors. For each case, half of the data points are used for training phase of the ANFIS and the rest of them are used for testing. To show whether initial clustering algorithm is significant or not, we consider random initial clustering and agglomerative hierarchical clustering (AHC) algorithm. Besides, three clustering algorithms include fuzzy c-means (FCM), Gustafson-Kessel (GK), and fuzzy noise rejection data partitioning algorithm proposed by Melek et al. (2005) are tested. Table 3 demonstrates the full list of controlled factors in ANFIS experiment configuration. The description of the above models, algorithms, and techniques are as follows:

Initial rule based construction approaches
For testing the effects of the initial rule based construction subtractive clustering (Sugeno & Yasukawa, 1993;Chiu, 1994) and the fuzzy modeling algorithms proposed by Emami et al. (1999) are used. Sugeno and Yasukawa (1993) proposed an algorithm to extract the fuzzy if-then rules from the historical data with four main steps. The first three steps are part of the structure identification stage and the final step is the fuzzy reasoning stage. The first step is fuzzy clustering of the output variable. The second one is to determine the most relevant input variables with a myopic neighborhood search algorithm. The third step is construction the antecedent part of the fuzzy rules. This is achieved by projecting the output membership degrees into the already selected significant input variables. Finally, the fourth step is the fuzzy inference as stated earlier. Emami et al. (1999) proposed the same approach with some revisions. For clustering the output space, they used agglomerative hierarchical hard clustering for initial prototypes. Also for the purpose of input membership assignment, first, they performed fuzzy line clustering for input membership functions, and then eliminated ineffective input candidates. If we do not have a clear idea about how many clusters there should be for a given set of data, Subtractive clustering (Chiu, 1994), is a fast, one-pass algorithm for estimating the number of clusters and the cluster centers in a set of data. The cluster estimates can be used to initialize iterative optimization-based clustering methods and model identification methods like ANFIS.

Cluster validity indexes
In order to use fuzzy clustering algorithms, initially, we have to choose the optimum number of clusters. For this purpose, three different cluster validity indexes are tested. First, the following criterion proposed by Fukuyama and Sugeno (1989) is implemented: • n: number of data to be clustered, • c: number of clusters, • x k : kth data, usually a vector, • x : average of data, • i ν : vector expressing the center of ith cluster, • . : norm, • µ ik : grade of kth data belonging to ith cluster, and • m: degree of fuzziness.
Second, a validity index, which is proposed by Kim et al. (2004) and modified by Fazel Zarandi et al. (2009), is used. In other words, we minimize: The optimal number of the clusters is obtained by minimizing V FNT (U, V; X) over the range of c values: 2, …, c max ; where ) is the relative similarity between two fuzzy sets A p and A q and is defined as: where, and ) , : ( is the relative similarity between two fuzzy sets A p and A q at x j which is defined as, Here, h(x j ) is the entropy of datum x j and ) ( j A x u p is the membership value where x j belongs to the cluster A p . Finally, we use Kwon (1998) validity index as a third CVI test for our experiments. The index is as follow,

Initial clustering algorithms
For the choice of the initial clustering, an agglomerative hierarchical clustering algorithm (AHC) and random algorithm are tested. In random algorithm, we just create the initial membership function, randomly. The AHC algorithm puts each of the n data vectors in an individual cluster. Then, by defining a matrix of dissimilarities D = [d ij ], the AHC merges two or more of these clusters, moving to a higher level of data partition. The process is repeated to form a sequence of nested clustering in which the number of clusters decreases gradually until the minimum required number of clusters c is reached (Melek et al., 2005). In specific terms, we calculate the (c × N) matrix of dissimilarities D = [d ij ] as the following Euclidean-based distance, where ν hi and ν hj are mean vectors of the hard clusters X i and X j , respectively, and n i (n j ) is the number of data in the hard cluster X i (X j ).

Statistical results
Nine control factors (variables) were considered for ANFIS experimental design, where seven of them have three levels and the rest have two levels, resulting in 8748 combinations. Each one of the levels combinations of control factors was replicated five times, totalizing 43740 analyses. Response variable is set to root mean square error (RMSE). Table 4 gives the ANFIS variance analysis. The ANOVA table contains the sources of variation, degrees of freedom, sum of squares, mean square, Fratio test statistics, and the corresponding significance levels. Note that the input MF number, output MF shape, initial rule based construction approach, and cluster validity index factors present the greatest statistical relevance, because the higher F-ratio value or the smaller probability means the grater important and relevance of the corresponding factor.
Therefore, based on our results, these factors have the most important effects on the performance of ANFIS when the proposed method is used to the prediction of stock price problem. The initial rule based construction approach, corresponding to ≈ %50 of the system variance, the input MF number, corresponding to ≈ %39 of the system variance, cluster validity index, corresponding to ≈ %4 of the system variance, and output MF shape, corresponding to ≈ %3 of the system variance.
Through the variance analysis of the factorial experiment, 6 factors were seen to be significant at the 5 percent level: input MF number, output MF shape, initial rule base construction approach, degree of fuzziness, training epochs, and cluster validity index. In addition, four interactions among some factors are significant: input MF number and output MF shape, input MF number and initial rule based construction approach, output MF shape and training epochs, initial rule based construction approach and initial clustering algorithm.
In Table 4, the probability for the number of data points variable is 0.057. Hence, despite our expectation, three different controlled data point numbers have no significant influence on the performance of the ANFIS. This is an interesting result because it implies that in our problem, stock price prediction, the number of data points is not a significant source of variance in the prediction performance of ANFIS. Therefore, even few numbers of data points can be fed to ANFIS and the results might be acceptable.
As the probability value for the initial clustering approach controlled variable is greater than 5 percent, this factor is not statistically significant. Also small participation of the clustering algorithm in the performance of ANFIS in this problem can be easily interpreted from its low F-test value, 2.56, in Table 4. Since the interactions are significant, one should be very cautious in interpreting the main effects. A significant interaction here means that, for example, the effect of initial rule based construction approach on ANFIS performance (RMSE) at one degree of fuzziness is different from its effect at the other degree of fuzziness. The same statement holds for other significant interactions. These can be seen graphically by plotting the corresponding treatment RMSEs as shown in Fig. 2. Note that the lines in the figures are not parallel.

Conclusion
Starting with the popularity and prevalent use of the neuro-fuzzy systems, specially ANFIS, we explained the inherent difficulty of its designing and parameter setting process. To tackle this problem, one suitable approach is to perform design of experiment technique to identify the most statistically significant factors on the performance of the ANFIS. Since ANFIS is widely used in prediction problems, we applied it for the stock price prediction problem. Historical time series of an automotive part-making company available in Tehran stock exchange market were used for experimental purposes. The results of experiments showed that the most relevant parameters for ANFIS in stock price prediction problem are input membership function number, output membership function shape, initial rule based construction approach and cluster validity index. Moreover, four significant interactions among factors were identified in this experiment. The experiment brings a valuable insight into ANFIS designing process and can reduce difficulties in designing of ANFIS, reducing the search space, and the complexity of the systems' tuning. In automatic optimization techniques, this result can be mapped in the cost functions, for the adaptability to consider first the optimization of parameters with greatest influence in the ANFIS behavior and performance. For Clustering Algorithm (d) future works, one can completely accomplish this approach in other proposed neuro-fuzzy systems and also can use different popular application problems.