Applying Different Independent Component Analysis Algorithms and Support Vector Regression for IT Chain Store Sales Forecasting

Sales forecasting is one of the most important issues in managing information technology (IT) chain store sales since an IT chain store has many branches. Integrating feature extraction method and prediction tool, such as support vector regression (SVR), is a useful method for constructing an effective sales forecasting scheme. Independent component analysis (ICA) is a novel feature extraction technique and has been widely applied to deal with various forecasting problems. But, up to now, only the basic ICA method (i.e., temporal ICA model) was applied to sale forecasting problem. In this paper, we utilize three different ICA methods including spatial ICA (sICA), temporal ICA (tICA), and spatiotemporal ICA (stICA) to extract features from the sales data and compare their performance in sales forecasting of IT chain store. Experimental results from a real sales data show that the sales forecasting scheme by integrating stICA and SVR outperforms the comparison models in terms of forecasting error. The stICA is a promising tool for extracting effective features from branch sales data and the extracted features can improve the prediction performance of SVR for sales forecasting.


Introduction
Independent component analysis (ICA) is one of the most widely applied blind source separation (BSS) techniques for separating the source from the received signals without any prior knowledge of the source signal [1]. The goal of ICA is to recover independent sources when given only sensor observations that are unknown mixtures of the unobserved independent source signals. It has been investigated extensively in image processing, time series forecasting, and statistical process control [1][2][3][4][5][6]. For example, Oja et al. [2] applied linear ICA to foreign exchange rate time series prediction. They first used linear ICA to estimate the independent components and mixing matrix from the observed time series dataset and then filtered the independent components (to reduce the effects of noise) through linear and nonlinear smoothing techniques.
Finally, autoregression (AR) modeling was employed to predict the smoothed independent components. Cao and Chong [3] employed ICA as a feature extraction tool in developing a support vector machine (SVM) forecaster. The independent components (ICs) were considered features of the forecasting data and used to build the SVM forecasting model.
Lu et al. [4] proposed a hybrid scheme which integrates ICA, engineering process control (EPC), and backpropagation neural network (BPN) to recognize shift and trend patterns in correlated processes. Lu [5] developed an ICA-based disturbance separation scheme to diagnose shift patterns with different levels of shift magnitudes in a correlated process. Lu et al. [6] proposed a two-stage forecasting model by integrating linear ICA and support vector regression (SVR) for financial time series. They first 2 The Scientific World Journal applied linear ICA to the forecasting variables to generate the independent components. After identifying and removing the ICs containing the noise, the rest of the ICs were then used to reconstruct the forecasting variables which contained less noise and served as the input variables of the SVR forecasting model.
For time series forecasting problems, the first important step is usually to use feature extraction to reveal the underlying/interesting information that cannot be found directly from the observed data. The performance of predictors can be improved by using the features as inputs [6][7][8][9][10]. Therefore, the two-stage forecasting scheme by integrating feature extraction method and prediction tool is a well-known method in literature [7][8][9]. The basic ICA is usually used as a novel feature extraction technique to find independent sources (i.e., features) for time series forecasting [1][2][3][6][7][8]. The independent sources called independent components (ICs) can be used to represent hidden information of the observable data. The basic ICA has been widely applied in different time series forecasting problems, such as stock price prediction and exchange rate forecasting [2,6,11,12]. However, there are only very few articles utilizing ICA in sales forecasting. Lu and Wang [13] combined ICA, growing hierarchical self-organizing maps (GHSOM), and SVR to develop a clustering-based sales forecasting model for predicting sales of computer dealer.
The basic ICA was originally developed to deal with the problems similar to the "cocktail party" problem in which many people are speaking at once. It assumed that the extracted ICs are independent in time (independence of the voices) [14]. Thus, the basic ICA is also called temporal ICA (tICA). However, for some application data such as biological time series and functional magnetic resonance imaging (fMRI) data, it is more realistic assumed that the ICs are independent in space (independent of the images or voxel) [15,16]. This ICA model is called spatial ICA (sICA). Besides, spatiotemporal ICA (stICA) based on the assumption that there exist small dependences between different spatial source data and between different temporal source data is also proposed [15,16]. In other words, stICA maximizes the degree of independence over space as well as over time, without necessarily producing independence in either space or time [15,16]. In short, there are three different ICA algorithms. tICA seeks a set of ICs which are strictly independent in time. On the contrary, sICA tries to find a set of ICs which are strictly independent in space. stICA seeks a set of ICs which are not strictly independent over time nor space.
Many studies have been reported on using sICA and/or stICA algorithms to extract the distinguishability information from time series data. Calhoun et al. [14] used sICA and tICA algorithms to extract features from fMRI data. They found that sICA and tICA can have diverging results, depending upon the characteristics of the underlying signals to be estimated. But, they also indicated that sICA and tICA algorithms can produce similar results when the signals are uncorrelated in both the spatial and the temporal dimensions. Stone et al. [16] applied stICA for event-related fMRI data. Their results showed that the performance of stICA was superior to those of principal component analysis (PCA), sICA and tICA. Kim et al. [17] employed sICA, tICA, and stICA for clustering genes and finding biologically meaningful modes. The results showed that tICA was more useful than sICA and stICA in the task of gene clustering and that the modes found by stICA were better than that of sICA and tICA. Castells et al. [18] used sICA and stICA algorithms for analyzing simulated and real electrocardiograms (ECGs) data and found the stICA algorithm outperformed the sICA model.
Sales forecasting is one of the most important issues for information technology (IT) companies [13,19,20] as IT companies face a competitive environment, with rapid changes to product specifications, intense competition, and rapidly eroding prices. By predicting consumer demand before selling, sales forecasting helps to determine the appropriate number of products to keep in inventory, thereby preventing over-or understocking. Moreover, since an IT chain store has many branches, how to construct an effective sales forecasting model is a challenging task for managing the IT chain store sales.
The sales of a branch of an IT chain store may be affected by other neighboring branches of the same IT chain store. Therefore, to forecast sales of a branch, the historical sales data of this branch and its neighboring branches will be good predictor variables. The historical sales data of the branches of an IT chain store are highly correlated in space or time or both. Thus, three different ICA algorithms are used in this study to extract features from the branch sales data of an IT chain store. The feature extraction performance of the three different ICA algorithms is compared by using the two-stage forecasting scheme.
In this study, we propose a sales forecasting model for the branches of an IT chain store by integrating ICA algorithms and SVR. SVR based on statistical learning theory is an effective neural network algorithm and has been receiving increasing attention for solving nonlinear regression estimation problems. The SVR is derived from the structural risk minimization principle to estimate a function by minimizing an upper bound of the generalization error [21]. Due to the advantages of the generalization capability in obtaining a unique solution, SVR can lead to great potential and superior performance in practical applications. It has been successfully applied in time series forecasting problem, such as sales data [13,19,20], traffic flow [22][23][24], electric load [25][26][27], wind speed [28], and financial time series data [6,7,10,29,30].
In the proposed sales forecasting scheme, we first use three different ICA algorithms (i.e., tICA, sICA, and stICA) on the predictor variables to estimate ICs. The ICs can be used to represent underlying/hidden information of the predictor variables. The ICs are then used as the input variables of the SVR for building the prediction model. In order to evaluate the performance of the three different ICA algorithms, a real branch sales data of an IT chain store is used as the illustrative example.
The rest of this paper is organized as follows. Section 2 gives brief overviews of temporal ICA, spatial ICA, and spatiotemporal ICA and SVR. The sales forecasting scheme The Scientific World Journal 3 is described in Section 3. Section 4 presents the experimental results and this paper is concluded in Section 5.

Methodology
2.1. Temporal, Spatial, and Spatiotemporal ICA. In general, stICA finds a linear decomposition, by maximizing the degree of independence over space as well as over time, without necessarily producing independence in either space or time. It permits a tradeoff between the independence of arrays and the independence of time courses. Different from stICA, tICA enforces independence constraints over time, to seek a set of independent time courses. While, sICA compels independence constraints over space, to find a set of independent arrays [17].
For temporal ICA (tICA), it embodies the assumption thatṼ can be decomposed:Ṽ = PA , where A is an × mixing matrix and P is an × matrix of statistically independent temporal signals. tICA can be used to obtain the decomposition p =ṼW . W is a permuted version of A −1 . The vector p is a set of extracted temporal signals and is a scale version of exactly one column vector in matrix P. This is achieved by maximizing the entropy ℎ = (Y) of Y = (P ), where is approximates the cdf of the temporal source signals.
For spatial ICA (sICA), it is assumed thatŨ can be decomposed asŨ = SA , where A is a × mixing matrix and S is an × matrix of statistically independent spatial signals. sICA can be applied to generate the decomposition y =ŨW , where W is a permuted version of A −1 . The vector y is a scale version of exactly one column vector in matrix S and is a set of extracted spatial signals. This is achieved by maximizing the entropy ℎ = (Z) of Z = (y ), where is approximates the cdf of the spatial source signals.
In spatiotemporal ICA (stICA), it is trying to find the decompositionX = SΛP , where S is an × matrix with a set of statistically independent spatial signals, P is an × matrix of mutually independent temporal signals, and Λ is a diagonal scaling matrix and is required to ensure that S and P have amplitudes appropriate to their respective cdfs and . Under the condition ofX =ŨṼ , there exist two × un-mixing matrices W and W , such that P =ṼW and S =ŨW . Then, if W ΛW = I, the following relation holds: X = SΛP =ŨW Λ(ṼW ) =ŨṼ . We can estimate the W and W by maximizing an objective function associated with spatial and temporal entropies at the same time. That is, the objective function for stICA has the following form: , where (0.5 is used in this study) defines the relative weighting for spatial entropy and temporal entropy. More details on tICA, sICA, and stICA can be found in [14][15][16].

Support Vector Regression. Support vector regression
(SVR) is an artificial intelligent forecasting tool based on statistical learning theory and structural risk minimization principle [21]. The SVR model can be expressed as the following equation [21]: where z is weight vector, is bias, and 0( ) is a kernel function which uses a nonlinear function to transform the nonlinear input to be linear mode in a high dimension feature space. Traditional regression gets the coefficients through minimizing the square error which can be considered as empirical risk based on loss function. Vapnik [21] introduced the so-called -insensitivity loss function to SVR. Considering empirical risk and structure risk synchronously, the SVR model can be constructed to minimize the following programming: Min : (z z) where = 1, . . . , is the number of training data; ( + * ) is the empirical risk; defined the region of -insensitivity, when the predicted value falls into the band area, the loss is zero. Contrarily, if the predicted value falls out the band area, the loss is equal to the difference between the predicted value and the margin; z z/2 is the structure risk preventing overlearning and lack of applied universality; is modifying coefficient representing the tradeoff between empirical risk and structure risk.
After selecting proper modifying coefficient ( ), width of band area ( ), and kernel function (0( )), the optimum of each parameter can be resolved though Lagrange function. Cherkassky and Ma [31] proposed that radial basis function (RBF), defined as Φ( , ) = exp(−‖ − ‖ 2 /2 2 ), is suited for solving most forecasting problems. So this paper uses RBF with parameter = 0.2 as kernel function in SVR modeling. The performance of SVR is mainly affected by the setting of parameters and [21,31]. There are no general rules for the choice of and . This study uses exponentially growing sequences of and to identify good parameters [32]. The parameter set of and which generate the minimum forecasting mean square error (MSE) is considered as the best parameter set.

Proposed Sales Forecasting Scheme
This study uses a two-stage sales forecasting scheme. In this scheme, we use different ICA algorithms as feature extraction method and utilize support vector regression as prediction tool. The schematic representation of the proposed sales forecasting scheme is illustrated in Figure 1.
As shown in Figure 1, the first step of the proposed sales forecasting scheme is data scaling. In this step, the original datasets and prediction variables are scaled into the range of Then, the three different ICA algorithms including tICA, sICA, and stICA are used in the scaled data to estimate ICs. In the third step, the ICs contained hidden information of the prediction variables are used as input variables to construct SVR sales forecasting model. Since this study uses three ICA algorithms to extract features, based on the two-stage scheme, four sales forecasting methods including tICA-SVR, sICA-SVR, t-stICA-SVR, and s-stICA-SVR are presented in this study. For the tICA-SVR, the tICA algorithm is used to generate temporal ICs (called t ICs). The sICA algorithm is utilized to estimate spatial ICs (called s ICs) for sICA-SVR method. As stICA algorithm generates two different sets of ICs which are used to represent the temporal ICs (called tst ICs) and spatial ICs (called s-st ICs), respectively; the t-stICA-SVR forecasting model using t-st ICs as inputs and s-stICA-SVR prediction scheme applying s-st ICs as prediction variables are developed.

Datasets and Performance Criteria.
For evaluating the performance of the three different ICA algorithms for sales forecasting for IT chain store, a real weekly branch sales dataset of an IT chain store is used in this study. This data contains 10 neighboring branches. There are totally 96 data points in each branch. The first 70 data points (72.9% of the total sample points) are used as the training sample and the remaining 26 data points (27.1% of the total sample points) are used as testing sample. Figures 2(a)-2(j) show the sales data of the 10 branches, respectively. From Figures 2(a)-2(j), it can be seen that sales characteristics between the 10 branches are different. As these 10 branches are neighboring branches, if we want to forecast sales of one of the 10 branches (can be called target branch), the sales data of the rest 9 branches can be used as predictor variables. Therefore, the previous week's sales volume (T-1) of the target branch and the 9 neighboring branches are used as 10 predictor variables. The input matrices X tr of size 10 × 70 and X te of size 10 × 26 are then generated for training stage andtesting stage, respectively.
The prediction results of the four two-stage sales forecasting schemes including tICA-SVR, sICA-SVR, t-stICA-SVR, and s-stICA-SVR methods are compared to the SVR model without using ICA for feature extraction (called the single SVR model). All of the five forecasting schemes are used for one-step ahead forecasting of monthly sales data (i.e., onemonth ahead forecasting). In building the SVR forecasting model, the LIBSVM package proposed by Chang and Lin [33] is adapted in this study.
The prediction performance is evaluated using the following statistical metrics, namely, the root mean square error (RMSE), mean absolute difference (MAD), and mean absolute percentage error (MAPE). RMSE, MAD, and MAPE are measures of the deviation between actual and predicted values. The smaller the values of RMSE, MAD, and MAPE, the closer are the predicted time series values to that of the actual value. The definitions of these criteria are as below: where and represent the actual and predicted value at week , respectively; is the total number of data points.

Forecasting Results.
In this study, 10 branches' sales data are used to assess the performance of the five forecasting methods. In this section, first, we use the sales data of Branch 1 as evaluation sample. That is, Branch 1 is the first target branch.
In the modeling of single SVR model for Branch 1, the scaled values of the 10 predictor variables are directly used as inputs. In selecting the parameters for modeling SVR, the parameter set ( = 2 11 , = 2 −7 ) is used as the start point of grid search for searching the best parameters. The testing results of the SVR model with combinations of different parameter sets are summarized in Table 1. From Table 1, it can be found that the parameter set ( = 2 11 , = 2 −7 ) gives The Scientific World Journal  The Scientific World Journal  the best forecasting result (minimum testing MSE) and is the best parameter set for single SVR model. For the tICA-SVR model, first, the original predictor variables are scaled and then passed to tICA algorithm to estimate ICs, that is, features. The ICs are then used for building SVR forecasting model. Ten ICs are estimated by the tICA algorithm since 10 predictors are used. As the same process with above single SVR, the parameter set ( = 2 9 , = 2 −7 ) is used as the start point of grid search. Table 2 summarizes the testing results of the tICA-SVR model with combinations of different parameter sets. As Table 2 shows, the parameter set ( = 2 11 , = 2 −5 ) gives the best forecasting result and is the best parameter set for the tICA-SVR model.
The forecasting results of Branch 1 using the tICA-SVR, sICA-SVR, t-stICA-SVR, s-stICA-SVR, and single SVR models are computed and listed in Table 6. Table 6 depicts that the RMSE, MAD, and MAPE of the t-stICA-SVR model  0.0320 0.0411    are, respectively, 20.369, 8.140, and 6.64%. It can be observed that these values are smaller than those of the tICA-SVR, sICA-SVR, s-stICA-SVR, and single SVR models. Therefore, the t-stICA-SVR model can generate the best prediction result for forecasting sales of Branch 1.
Using a similar modeling process as abovementioned, the five forecasting models are conducted for forecasting sales of Branch 2 to Branch 10. Table 7 summarizes the forecasting results of Branch 2 to Branch 10 using the tICA-SVR, sICA-SVR, t-stICA-SVR, s-stICA-SVR, and single SVR models, respectively. Note that the values in Table 7 are MAPE values. It can be observed from the Table 7 that the t-stICA-SVR forecasting scheme has the smallest RMSE, MAD, and MAPE in comparison with the four comparison models in every branches. Thus, the t-stICA-SVR model can provide better forecasting precision and outperforms the four comparison methods in forecasting sales of IT chain store.
Moreover, it also can be seen from the Tables 6 and 7 that s-stICA-SVR outperforms the tICA-SVR, sICA-SVR, and single SVR models. Since the t-stICA-SVR and s-stICA-SVR can provide better forecasting results than the tICA-SVR and sICA-SVR models, it indicates that stICA algorithm can estimate more effective ICs and improve sales forecasting performance for IT chain store. Besides, from the Tables 6 and 7, we find that temporal ICs are more suitable for forecasting branch sales since the forecasting performance of t-stICA-SVR and tICA-SVR models is better than that of s-stICA-SVR and sICA-SVR models, respectively.
In order to further evaluate and compare the performance of the five forecasting schemes (i.e., tICA-SVR, sICA-SVR, t-stICA-SVR, s-stICA-SVR, and single SVR models), 3-month ahead and 6-month ahead forecasts are also considered in this study. The forecasting errors of the five abovementioned forecasting schemes under three different forecast horizons are computed and listed in Table 8. From Table 8, it can be found that the MAPE of the t-stICA-SVR scheme is all smaller than those of the tICA-SVR, sICA-SVR, s-stICA-SVR, and single SVR models for the three forecast horizons. It indicates that the t-stICA-SVR scheme can consistently yield a smaller deviation between the actual and predicated values and then effectively provide good forecasting accuracy in different forecast horizons. In addition, it also can be 8 The Scientific World Journal observed from Table 8 that the t-stICA-SVR and s-stICA-SVR schemes outperform the tICA-SVR, sICA-SVR, and single SVR models at all three forecast horizons. Based on the findings in Table 8, it reveals that stICA is a promising tool for extracting effective features from branch sales data and the extracted features can improve the prediction performance of SVR for sales forecasting. The experimental results are consistent with the conclusions of Stone et al. [16] and Kim et al. [17]. The stICA solution extracts the original source signal to a much greater extent than the tICA and sICA solution if the latent variables contain spatial and temporal information.

Conclusion
Forecasting sales of branches is a crucial aspect of the marketing and inventory management in IT chain store. In this paper, we used three different ICA algorithms including tICA, sICA, and stICA for sales forecasting and compared the feature extraction performance of the three different ICA algorithms. Four sales forecasting methods including tICA-SVR, sICA-SVR, t-stICA-SVR, and s-stICA-SVR were presented in this study. In the proposed sales forecasting methods, we first used three different ICA algorithms (i.e., tICA, sICA, and stICA) on the predictor variables to estimate ICs. The ICs can be used to represent underlying/hidden information of the predictor variables. The ICs are then used as the input variables of the SVR for building the prediction model. A real weekly sales data including 10 branches of an IT chain store was used for evaluating the performance of the sales forecasting methods. Experimental results showed that the t-stICA-SVR and s-stICA-SVR models can produce the lowest prediction error in forecasting sales of the 10 branches. They outperformed the comparison methods used in this study. Thus, compared to tICA and sICA algorithms, stICA algorithm can estimate more effective ICs and improve sales forecasting performance for IT chain store. Moreover, we also found that, compared to spatial ICs, the temporal ICs are more suitable features for forecasting branch sales.