Skip to main content
Log in

StockProF: a stock profiling framework using data mining approaches

  • Original Article
  • Published:
Information Systems and e-Business Management Aims and scope Submit manuscript

Abstract

Analysing stock financial data and producing an insight into it are not easy tasks for many stock investors, particularly individual investors. Therefore, building a good stock portfolio from a pool of stocks often requires Herculean efforts. This paper proposes a stock profiling framework, StockProF, for building stock portfolios rapidly. StockProF utilizes data mining approaches, namely, (1) Local Outlier Factor (LOF) and (2) Expectation Maximization (EM). LOF first detects outliers (stocks) that are superior or poor in financial performance. After removing the outliers, EM clusters the remaining stocks. The investors can then profile the resulted clusters using mean and 5-number summary. This study utilized the financial data of the plantation stocks listed on Bursa Malaysia. The authors used 1-year stock price movements to evaluate the performance of the outliers as well as the clusters. The results showed that StockProF is effective as the profiling corresponded to the average capital gain or loss of the plantation stocks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Abbas OA et al (2008) Comparisons between data clustering algorithms. Int Arab J Inf Technol 5(3):320–325

    Google Scholar 

  • Ang A, Kjaer K (2011) Investing for the long run. A decade of challenges: a collection of essays on pensions and investments Andra AP-fonden, Second Swedish National Pension Fund-AP2

  • Basu S (1977) Investment performance of common stocks in relation to their price-earnings ratios: a test of the efficient market hypothesis. J Finance 32(3):663–682

    Article  Google Scholar 

  • Bellazzi R, Zupan B (2008) Predictive data mining in clinical medicine: current issues and guidelines. Int J Med Inf 77(2):81–97

    Article  Google Scholar 

  • Benmelech E, Dvir E (2013) Does short-term debt increase vulnerability to crisis? Evidence from the east asian financial crisis. J Int Econ 89(2):485–494

    Article  Google Scholar 

  • Bollen J, Mao H, Zeng X (2011) Twitter mood predicts the stock market. J Comput Sci 2(1):1–8

    Article  Google Scholar 

  • Bradley PS, Fayyad U, Reina C (1998) Scaling em (expectation-maximization) clustering to large databases. Tech. rep., Technical Report MSR-TR-98-35, Microsoft Research Redmond

  • Brenuig MM, Kriegel HP, Ng R, Sander J (2000) Lof: identifying density-based local outliers. ACM Sigmod Rec 29(2):79–104

    Google Scholar 

  • Chang PC, Liu CH (2008) A tsk type fuzzy rule based system for stock price prediction. Exp Syst Appl 34(1):135–144

    Article  Google Scholar 

  • Chen AH, Siems TF (2004) The effects of terrorism on global capital markets. Eur J Polit Econ 20(2):349–366

    Article  Google Scholar 

  • Chipman H, Tibshirani R (2006) Hybrid hierarchical clustering with applications to microarray data. Biostatistics 7(2):286–301

    Article  Google Scholar 

  • Dechow PM, Hutton AP, Meulbroek L, Sloan RG (2001) Short-sellers, fundamental analysis, and stock returns. J Financ Econ 61(1):77–106

    Article  Google Scholar 

  • Do CB, Batzoglou S (2008) What is the expectation maximization algorithm? Nat Biotechnol 26(8):897–900

    Article  Google Scholar 

  • Enke D, Thawornwong S (2005) The use of data mining and neural networks for forecasting stock market returns. Exp Syst Appl 29(4):927–940

    Article  Google Scholar 

  • Estivill-Castro V, Yang J (2004) Fast and robust general purpose clustering algorithms. Data Mining Knowl Discov 8(2):127–150

    Article  Google Scholar 

  • Fama EF (1965) Random walks in stock market prices. Financ Anal J 21:55–59

    Article  Google Scholar 

  • Farmer RE (2012) The stock market crash of 2008 caused the great recession: theory and evidence. J Econ Dyn Control 36(5):693–707

    Article  Google Scholar 

  • Fisher PA (1997) Common stocks and uncommon profits, vol 16. Wiley, New York

    Google Scholar 

  • Fung GPC, Yu JX, Lam W (2002) News sensitive stock trend prediction. In: Chen M-S, Yu PS, Liu B (eds) Advances in knowledge discovery and data mining, Springer, Berlin, pp 481–493

  • Graham B, McGowan B (2005) The intelligent investor. HarperCollins, New York

    Google Scholar 

  • Greenwald BC, Kahn J, Sonkin PD, Van Biema M (2004) Value investing: from Graham to Buffett and beyond. Wiley, New York

    Google Scholar 

  • Han J, Kamber M (2006) Data mining, Southeast Asia edition: concepts and techniques. Morgan kaufmann, Burlington

    Google Scholar 

  • Hsu CM (2011) A hybrid procedure for stock price prediction by integrating self-organizing map and genetic programming. Exp Syst Appl 38(11):14,026–14,036

    Google Scholar 

  • Huang CF, Chang BR, Cheng DW, Chang CH (2012) Feature selection and parameter optimization of a fuzzy-based stock selection model using genetic algorithms. Int J Fuzzy Syst 14(1):65–75

    Google Scholar 

  • Huang W, Nakamori Y, Wang SY (2005) Forecasting stock market movement direction with support vector machine. Comput Oper Res 32(10):2513–2522

    Article  Google Scholar 

  • Huarng KH, Yu THK, Kao TT (2008) Analyzing structural changes using clustering techniques. Int J Innov Comput Inf Control 4(5):1195–1201

    Google Scholar 

  • Jain AK, Murty MN, Flynn P (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323

    Article  Google Scholar 

  • Jin X, Han J (2010) Expectation maximization clustering. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning, Springer, Berlin, pp 382–383

  • Karabulut Y (2011) Can facebook predict stock market activity? SSRN eLibrary

  • Kasa K (1992) Common stochastic trends in international stock markets. J Monet Econ 29(1):95–124

    Article  Google Scholar 

  • Keller A (2000) Fuzzy clustering with outliers. In: Fuzzy Information Processing Society, 2000. NAFIPS. 19th international conference of the North American, IEEE, pp 143–147

  • Kirkos E, Spathis C, Manolopoulos Y (2007) Data mining techniques for the detection of fraudulent financial statements. Exp Syst Appl 32(4):995–1003

    Article  Google Scholar 

  • Klawonn F, Rehm F (2009) Cluster analysis for outlier detection, Chap 35. In: Wang J (ed) Encyclopedia of data warehousing and mining. IGI Global, pp 214–218

  • Knox EM, Ng RT (1998) Algorithms for mining distancebased outliers in large datasets. In: Proceedings of the international conference on very large data bases, Citeseer, pp 392–403

  • Kohara K, Ishikawa T, Fukuhara Y, Nakamura Y (1997) Stock price prediction using prior knowledge and neural networks. Intell Syst Account Finance Manag 6(1):11–22

    Article  Google Scholar 

  • Kusiak A, Shah S (2006) Data-mining-based system for prediction of water chemistry faults. IEEE Trans Ind Electron 53(2):593–603

    Article  Google Scholar 

  • Ladas A, Ferguson E, Aickelin U, Garibaldi J (2015) A data mining framework to model consumer indebtedness with psychological factors. CoRR arXiv:1502.05911

  • Lee AJ, Lin MC, Kao RT, Chen KT (2010) An effective clustering approach to stock market prediction. In: Pacific Asia Conference on Information Systems, pp 345–354

  • Lowe J (2007) Warren Buffett speaks: wit and wisdom from the world’s greatest investor. Wiley, New Jersey

    Google Scholar 

  • Lu CL, Chen TC (2009) A study of applying data mining approach to the information disclosure for taiwans stock market investors. Exp Syst Appl 36(2):3536–3542

    Article  Google Scholar 

  • Lynch LPPS (1994) Beating the street. Simon and Schuster, New York

    Google Scholar 

  • Mittermayer MA (2004) Forecasting intraday stock price trends with text mining techniques. In: System sciences, 2004. Proceedings of the 37th annual Hawaii international conference on, IEEE, pp 10

  • Mizuno H, Kosaka M, Yajima H, Komoda N (1998) Application of neural network to technical analysis of stock market prediction. Stud Inform Control 7(3):111–120

    Google Scholar 

  • Nanda S, Mahanty B, Tiwari M (2010) Clustering indian stock market data for portfolio management. Exp Syst Appl 37(12):8793–8798

    Article  Google Scholar 

  • Norio O, Ye T, Kajitani Y, Shi P, Tatano H (2011) The 2011 eastern japan great earthquake disaster: overview and comments. Int J Disaster Risk Sci 2(1):34–42

    Article  Google Scholar 

  • Olson DL, Shi Y (2007) Introduction to business data mining. McGraw-Hill/Irwin Englewood Cliffs, New York

    Google Scholar 

  • Ordonez C, Cereghini P (2000) Sqlem: fast clustering in sql using the em algorithm. IN: ACM SIGMOD Record, ACM 29:559–570

  • Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. Sigkdd Explor 6(1):90–105

    Article  Google Scholar 

  • Penman SH (2007) Financial statement analysis and security valuation, 3rd edn. McGraw-Hill/Irwin, New York

  • Sim K, Liu G, Gopalkrishnan V, Li J (2011) A case study on financial ratios via cross-graph quasi-bicliques. Inf Sci 181(1):201–216

    Article  Google Scholar 

  • Siu A, Wong YR (2004) Economic impact of sars: the case of hong kong*. Asian Econ Pap 3(1):62–83

    Article  Google Scholar 

  • Sun J, Li H (2008) Data mining method for listed companies financial distress prediction. Knowl Based Syst 21(1):1–5

    Article  Google Scholar 

  • Tan CS, Yong CK, Tay YH (2012) Modeling financial ratios of malaysian plantation stocks using bayesian networks. In: Sustainable utilization and development in engineering and technology (STUDENT), 2012 IEEE Conference on, IEEE, pp 7–12

  • Teknomo K (2006) K-means clustering tutorial. Medicine 100(4):3

    Google Scholar 

  • Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc Ser B 63(2):411–423

    Article  Google Scholar 

  • Wang JT, Zaki MJ, Toivonen HT, Shasha D (2005) Int Data Mining Bioinform. Springer, Berlin

    Google Scholar 

  • Wang YF (2003) Mining stock price using fuzzy rough set system. Exp Syst Appl 24(1):13–23

    Article  Google Scholar 

  • Whitley E, Ball J (2001) Statistics review 1: presenting and summarising data. Crit Care 6(1):66

    Article  Google Scholar 

  • Wittman T (2002) Time-series clustering and association analysis of financial data. University of Texas, Austin

    Google Scholar 

  • Wong WK, Manzur M, Chew BK (2003) How rewarding is technical analysis? Evidence from singapore stock market. Appl Financ Econ 13(7):543–551

    Article  Google Scholar 

  • Yoon Y, Swales G (1991) Predicting stock price performance: A neural network approach. In: System Sciences, 1991. Proceedings of the twenty-fourth annual Hawaii international conference on, IEEE, vol 4, pp 156–162

  • Zhang Y, Wu L (2009) Stock market prediction of S&P 500 via combination of improved bco approach and bp neural network. Exp Syst Appl 36(5):8849–8854

    Article  Google Scholar 

  • Zingales L (2008) Causes and effects of the lehman brothers bankruptcy. Committee on Oversight and Government Reform US House of Representatives

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Keng-Hoong Ng.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ng, KH., Khor, KC. StockProF: a stock profiling framework using data mining approaches. Inf Syst E-Bus Manage 15, 139–158 (2017). https://doi.org/10.1007/s10257-016-0313-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10257-016-0313-z

Keywords

Navigation