Developing an approach to evaluate stocks by forecasting effective features with data mining methods
Introduction
Of the most important concerns of market practitioners is future information of the companies which offer stocks. A reliable prediction of the company’s financial status provides a situation for the investor to more confident investments and gaining more profits (Huang, 2012a, Huang, 2012b). One can refer to different studies about share gaining and return prediction, for example, time series stock price prediction model (Araújo & Ferreira, 2013), buy–hold–sell prediction model (Wu et al., 2014, Zhang et al., 2014), Index prediction model with Anfis (Svalina, Galzina, Lujić, & Šimunović, 2013) or MARS and SVR (Kao, Chiu, Lu, & Chang, 2013), profit gaining (Ng, Liang, Li, Yeung, & Chan, 2014). However, unlike the return, risk has been rarely considered for prediction, while customers usually balance their return for a proper level of risk, then clearly both risk and return are important factors in financial decision making (Barak et al., 2013, Tsai et al., 2011). Without risk evaluation the portfolio efficient frontier does not make sense. Thus, this paper implements the forecasting of both risk and return of stocks which has tremendous effect on price setting. Also, up-down prediction of stock movement such as (Patel et al., 2014, Yu et al., 2014, Zhang et al., 2014) cannot result in precision view of stock future and investors gaining. While classifying the amount of risk and return to different categories like our method gives more specific and clear knowledge.
Therefore, in this study, the simultaneous prediction of risk and return classes with different classification algorithms is investigated.
To predict risk and return variables accurately, the effective factors need to be identified. In fact, one of the key issues of stock prediction design lies on how to select representative features for prediction (Zhang, Hu, Xie, Wang, et al., 2014).
Most studies in this area focus on technical features, financial ratios or macroeconomic indicators. For example, Tsai and Hsiao (2010) studied 8 financial ratios and 16 macroeconomic indicators as the main features to predict stock return by back propagation in Taiwan stock market. Cheng, Chen, and Lin (2010) conducted a comprehensive study on macroeconomic and technical features and studied 8 financial ratios and 10 macroeconomic indicators to investigate their effect on return variation in Taiwan stock market. By applying probabilistic back propagation algorithm, rough set and C4.5 Tree, they achieved 76% accuracy. de Oliveira, Nobre, and Zárate (2013) use 15 technical indicators and 11 fundamental indexes to prediction of stocks movement in Petrobras with artificial neural networks and obtain 87.50% for direct prediction. Tsai et al. (2011) considered 19 financial ratios and 11 macroeconomic indicators in Taiwan stock market by combining logistic regression algorithm, MLP back propagation and CART Tree to investigate their effect urn (negative or positive) on the stock return and achieved 66.67% accuracy based on bagging and voting algorithms. In majority of studies, as mentioned, the focus is mostly on financial ratios, macroeconomic indicators, and technical indicators based on experts’ ideas to predict returns. However, this paper presents a systematic and efficient methodology for comprehensive searching the potential representative features on stock market in 3 categories of financial ratio, profit and loss reports, and stock pricing models and not arbitrarily choosing likely effective features.
Furthermore, many studies have claimed and verified that feature selection (FS) is the key process in stock prediction modeling (Tsai & Hsiao, 2010). Zhang, Hu, Xie, Wang, et al. (2014) use a causal feature selection (CFS) algorithm to find effective features in Shanghai stock exchanges. The idea in their model is about causalities based feature selection algorithm. They assert that CFS represents direct influences between various stock features, while correlation based algorithms cannot distinguish direct influences from indirect ones. Wu et al. (2014) use textual and technical features to improve prediction accuracy of stock market. They use SVR algorithm and trend segmentation method to forecast trends and generate trading signals, respectively. Their feature selection algorithm is stepwise regression analysis. Although there are a variety of studies in the area of feature selection, almost all of them use a single feature selection model.
In this research, a novel hybrid feature selection algorithm on the basis of filter and function-based clustering method is applied to select the important features. What makes our proposed approach different from the previous ones is that we consider the combination of 9 different feature selection algorithms with function-based clustering algorithm. Hybrid model of our paper enjoys the power and advantage of correlation based algorithms like Chi-square, One-R in addition to the power of classified errors based, interval based, and information based algorithms like SVM, Relief-f, and Gini index/gain ration algorithms respectively. The effectiveness of our model is illustrated with the prediction of both risk and return of stocks and then analyzing the results with and without implementing of our hybrid feature selection algorithms.
To sum up, in the first stage of paper, a complete list of likely effective features on the stocks risks and returns are identified. After developing an appropriate database in the second stage, different classification algorithms are used to predict the risk and return. We also scrutinize on the effect of their results to our data base based on feature-oriented view point. Finally, in the third stage, a novel hybrid feature selection algorithm on the basis of filter and function-based clustering method is applied to select the important features which affect the prediction of risk and return.
The contribution of the paper is summarized as follow:
- •
A comprehensive and systematic study to identify the likely effective features in risk and return prediction.
- •
Stock risks as well as return prediction with different classification methods.
- •
Designing a hybrid feature selection algorithm on the basis of filter and function-based clustering.
- •
Finally, each algorithm with a feature-oriented view point is analyzed. The results indicate the factors which cause strength and weakness of that algorithm. As a result the nature of each feature is provided according to the amount of interference variable in their prediction.
The rest of the article is organized as follows. In Section 2, the proposed model is presented which has three stages. In Section 3, to illustrate the approach, we implement it with some real data from Tehran Stock Exchange (TSE). The results are analyzed in which the predictions with and without considering important effective features are also compared. Then in Section 4, a discussion on real return and risk prediction with important features has been represented. Finally, some conclusion and future research directions are provided in Section 5.
Section snippets
Proposed model
Our proposed algorithm which consists of three stages is shown in Fig. 1. In the first stage a database is developed and data is pre-processed. Non-systematic risk as well as real return is predicted with classification algorithms in the next stage. A hybrid feature selection algorithm is also presented in the third stage and risk and return are re-predicted based on selected features.
Experimental results and analysis
In this study a database including 44 input features and 2 goal features are gathered from TSE data from 2003 to 2012. The resulting database has 1963 records for 400 companies.
According to a group of experts, 5 intervals were introduced for the real return: very high with a range higher than 9.3, high with the range of 4–9.3, average with a range of 1.14–4, low with the range of −1.3 to 1.14 and very low that lower than −1.3. Risk is also classified in 3 intervals: high in range of higher than
The real return results in prediction with selected features
If for denser structure trees all effective features in first prediction are selected by the proposed hybrid model, results in better accuracy, such as “BF Tree”, “LAD Tree”, and “FT Tree”. Otherwise, it is possible that accuracy drops, like “CART and Rep” TREEs. The selected features have different effect on the accuracy of forecasting. Some trees with large structure, such as J48 Graph and J48 Tree are get lower accuracy, while some get a higher accuracy such as ID3 Numerical. Higher accuracy
Conclusions
In this study, an approach for simultaneous prediction of risk and real return were developed by applying data mining technique as well as fundamental data set. To do this, first through a comprehensive study, the features which can be potentially effective on risk and return were investigated. Then, after developing an appropriate database the preprocessing of database step was taken. To predict the real return and risk, 20 and 15 different prediction algorithms were applied respectively.
References (64)
- et al.
A morphological-rank-linear evolutionary method for stock market prediction
Information Sciences
(2013) - et al.
Fuzzy turnover rate chance constraints portfolio model
European Journal of Operational Research
(2013) - et al.
Estimation of expected return: CAPM vs. Fama and French
International Review of Financial Analysis
(2005) - et al.
A comparison between Fama and French’s model and artificial neural networks in predicting the Chinese stock market
Computers & Operations Research
(2005) A comparative study of artificial neural networks, and decision trees for digital game content stocks price prediction
Expert Systems with Applications
(2011)- et al.
A soft-computing based rough sets classifier for classifying IPO returns in the financial markets
Applied Soft Computing
(2012) - et al.
A hybrid forecast marketing timing model based on probabilistic neural network, rough set and C4.5
Expert Systems with Applications
(2010) - et al.
Applying artificial neural networks to prediction of stock price and improvement of the directional prediction index – case study of PETR4, Petrobras, Brazil
Expert Systems with Applications
(2013) - et al.
The use of data mining and neural networks for forecasting stock market returns
Expert Systems with Applications
(2005) - et al.
Common risk factors in the returns on stocks and bonds
Journal of Financial Economics
(1993)
Size, value, and momentum in international stock returns
Journal of Financial Economics
Mining rules from an incomplete dataset with a high missing rate
Expert Systems with Applications
A hybrid stock selection model using genetic algorithms and support vector regression
Applied Soft Computing
A hybrid stock selection model using genetic algorithms and support vector regression
Applied Soft Computing
Application of wrapper approach and composite classifier to the stock trend prediction
Expert Systems with Applications
A hybrid approach by integrating wavelet-based feature extraction with MARS and SVR for stock index forecasting
Decision Support Systems
Evolving and clustering fuzzy decision tree for financial time series data forecasting
Expert Systems with Applications
Combined MCDM techniques for exploring stock selection based on Gordon model
Expert Systems with Applications
Predictive modeling using segmentation
Journal of Interactive Marketing
Predicting returns with financial ratios
Journal of Financial Economics
Sign eigenanalysis and its applications to optimization problems and robust statistics
Computational Statistics & Data Analysis
Feature selection based on cluster and variability analyses for ordinal multi-class classification problems
Knowledge-Based Systems
Predictability and the earnings–returns relation
Journal of Financial Economics
An adaptive network-based fuzzy inference system (ANFIS) for the forecasting: The case of close price indices
Expert Systems with Applications
Combining multiple feature selection methods for stock prediction: Union, intersection, and multi-intersection approaches
Decision Support Systems
Predicting stock returns by classifier ensembles
Applied Soft Computing
Determinants of intangible assets value: The data mining approach
Knowledge-Based Systems
Stock market trading rule discovery using two-layer bias decision tree
Expert Systems with Applications
An intelligent stock trading system using comprehensive features
Applied Soft Computing
A SVM stock selection model within PCA
Procedia Computer Science
Empirical evidence on corporate governance in Europe: The effect on stock returns, firm value and performance
Journal of Asset Management
Analysis of financial statements
Cited by (77)
Systemic risk prediction using machine learning: Does network connectedness help prediction?
2024, International Review of Financial AnalysisEvaluating the performance of ensemble classifiers in stock returns prediction using effective features
2023, Expert Systems with ApplicationsCitation Excerpt :Among the 20 selected features, six are associated with macroeconomic indicators and the rest belong to financial ratios (Table 8). A close look at the macroeconomic indicators reveals that the annual changes in OPEC oil price (AOPC) as well as gold coin price (AGCC) place leading-order controls on the Iran stock market; however, the importance of these two features has been largely overlooked in the prior studies (Emamgholipour et al., 2013; Barak and Modarres, 2015; Barak et al., 2017). Thus, those studies need to be re-evaluated to incorporate the impact of macroeconomic features, particularly non-traditional ones (e.g., AOPC and AGCC), on the stock returns.
Intraday and interday features in the high-frequency data: Pre- and post-Crisis evidence in China's stock market
2022, Expert Systems with ApplicationsMachine learning techniques and data for stock market forecasting: A literature review
2022, Expert Systems with ApplicationsCitation Excerpt :On account of this, combinations of several methods such as KNN + SVM (Cao et al., 2019; Chen & Hao, 2017), ANN + SVM (Lu & Wu, 2011; Weng et al., 2017), and others have been investigated for predicting stock prices or returns. In addition, the forecast accuracy of the techniques mentioned above have been improved using feature selection methods (Barak & Modarres, 2015; Zhang et al., 2014), feature extraction methods such as principal component analysis (PCA) (Chen & Hao, 2018; Wang & Wang, 2015), evolutionary algorithms such as genetic algorithms (GA) (Ye et al., 2016), Wavelet transforms (Chiang et al., 2016), and particle swarm optimizations (Chai et al., 2015), to name a few. Moreover, in contrast to the previously discussed supervised learning techniques, the ability of clustering as an unsupervised method was also examined for forecasting stock prices (e.g., Vilela et al., 2019).
A sentiment-based modeling and analysis of stock price during the COVID-19: U- and Swoosh-shaped recovery
2022, Physica A: Statistical Mechanics and its ApplicationsResearch on Graph Neural Network in Stock Market
2022, Procedia Computer Science