Skip to main content
Log in

Natural language based financial forecasting: a survey

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Natural language processing (NLP), or the pragmatic research perspective of computational linguistics, has become increasingly powerful due to data availability and various techniques developed in the past decade. This increasing capability makes it possible to capture sentiments more accurately and semantics in a more nuanced way. Naturally, many applications are starting to seek improvements by adopting cutting-edge NLP techniques. Financial forecasting is no exception. As a result, articles that leverage NLP techniques to predict financial markets are fast accumulating, gradually establishing the research field of natural language based financial forecasting (NLFF), or from the application perspective, stock market prediction. This review article clarifies the scope of NLFF research by ordering and structuring techniques and applications from related work. The survey also aims to increase the understanding of progress and hotspots in NLFF, and bring about discussions across many different disciplines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Anton M, Polk C (2014) Connected stocks. J Finance 69(3):1099–1127

    Article  Google Scholar 

  • Antweiler W, Frank MZ (2004) Is all that talk just noise? The information content of internet stock message boards. J Finance 59(3):1259–1294

    Article  Google Scholar 

  • Avramov D, Zhou G (2010) Bayesian portfolio analysis. Annu Rev Financ Econ 2:25–47

    Article  Google Scholar 

  • Baccianella S, Esuli A, Sebastiani F (2010) SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: 7th language resources and evaluation conference, pp 2200–2204

  • Banko M, Cafarella MJ, Soderland S, Broadhead M, Etzioni O (2007) Open information extraction from the web. In: International joint conference on artificial intelligence, pp 2670–2676

  • Bao T, Hommes C, Makarewicz T (2015) Bubble formation and (in)efficient markets in learning-to-forecast and -optimise experiments. Tinbergen Institute Discussion Paper TI 2015-107/II. https://www.econstor.eu/bitstream/10419/125108/1/15107.pdf

  • Bengio Y, Ducharme R, Vincent P (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155

    MATH  Google Scholar 

  • Blei DM (2012) Probabilistic topic models. Commun ACM 55(4):77–84

    Article  Google Scholar 

  • Bollen J, Mao H, Zeng X (2011) Twitter mood predicts the stock market. J Comput Sci 2(1):1–8

    Article  Google Scholar 

  • Bouchey P, Nemtchinov V, Wong TKL (2015) Volatility harvesting in theory and practice. J Wealth Manage 18(3):89–100

    Article  Google Scholar 

  • Brabazon A, O’Neill M (2008) An introduction to evolutionary computation in finance. IEEE Comput Intell Mag 3(4):42–55

    Article  Google Scholar 

  • Brachman RJ, Khabaza T et al (1996) Mining business databases. Commun ACM 39(11):42–48

    Article  Google Scholar 

  • Brown GW, Cliff MT (2004) Investor sentiment and the near-term stock market. J Empir Finance 11:1–27

    Article  Google Scholar 

  • Bühler K (1934) Sprachtheorie. Fischer, Jena

    Google Scholar 

  • Cambria E (2013) An introduction to concept-level sentiment analysis. In: Lecture notes in computer science (LNCS), vol 8266. Springer, pp 478–483

  • Cambria E (2016) Affective computing and sentiment analysis. IEEE Intell Syst 31(2):102–107

    Article  Google Scholar 

  • Cambria E, White B (2014) Jumping NLP curves: a review of natural language processing research. IEEE Comput Intell Mag 9(2):48–57

    Article  Google Scholar 

  • Cambria E, Livingstone A, Hussain A (2012) The hourglass of emotions. In: Lecture notes in computer science, vol 7403. Springer, pp 144–157

  • Cambria E, Wang H, White B (2014) Guest editorial: big social data analysis. Knowl-Based Syst 69:1–2

    Article  Google Scholar 

  • Cambria E, Poria S, Bajpai R, Schuller B (2016) SenticNet 4: a semantic resource for sentiment analysis based on conceptual primitives. In: International conference on computational linguistics (COLING), pp 2666–2677

  • Cambria E, Poria S, Gelbukh A, Thelwall M (2017) Sentiment analysis is a big suitcase. IEEE Intell Syst 32(6):74–80

  • Cavalcante RC, Brasileiro RC, Souza VL, Nobrega JP, Oliveira AL (2016) Computational intelligence and financial markets: a survey and future directions. Expert Syst Appl 55:194–211

    Article  Google Scholar 

  • Chan SW, Chong MW (2017) Sentiment analysis in financial texts. Decis Support Syst 94:53–64

    Article  Google Scholar 

  • Chan S, Franklin J (2011) A text-based decision support system for financial sequence prediction. Decis Support Syst 52(1):189–198

    Article  Google Scholar 

  • Chang CY, Zhang Y, Teng Z, Bozanic Z, Ke B (2016) Measuring the information content of financial news. In: Proceedings of the the 26th international conference on computational linguistics

  • Chaturvedi I, Ong YS, Tsang I, Welsch R, Cambria E (2016) Learning word dependencies in text by means of a deep recurrent belief network. Knowl-Based Syst 108:144–154

    Article  Google Scholar 

  • Chaturvedi I, Ragusa E, Gastaldo P, Zunino R, Cambria E (2017) Bayesian network based extreme learning machine for subjectivity detection. J Frankl Inst. https://doi.org/10.1016/j.jfranklin.2017.06.007

    Google Scholar 

  • Chen N, Ribeiro B, Chen A (2016) Financial credit risk assessment: a recent review. Artif Intell Rev 45:1–23

    Article  Google Scholar 

  • Choi H, Varian H (2012) Predicting the present with google trends. Econ Rec 88(1):2–9

    Article  Google Scholar 

  • Chomsky N (1956) Three models for the description of language. IRE Trans Inf Theory 2(3):113–124. https://doi.org/10.1109/TIT.1956.1056813

    Article  MATH  Google Scholar 

  • Cohen L, Frazzini A (2008) Economic links and predictable returns. J Finance 63(4):1977–2011

    Article  Google Scholar 

  • Das SR, Chen MY (2007) Yahoo! for amazon: sentiment extraction from small talk on the web. Manage Sci 53(9):1375–1388

    Article  Google Scholar 

  • Ding X (2016) Research on methodology of market trends prediction based on social media. Ph.D. thesis, Harbin Institute of Technology

  • Ding X, Zhang Y, Liu T, Duan J (2015) Deep learning for event-driven stock prediction. In: International joint conference on artificial intelligence

  • Dong L, Wang Z, Xiong D (2017) Stock market prediction based on text information. Acta Scientiarum Naturalium Universitatis Pekinesis. https://doi.org/10.13209/j.0479-8023.2017.037

  • Fama EF (1970) Efficient capital markets: a review of theory and empirical work. J Finance 25:383–417

    Article  Google Scholar 

  • Feldman R (2013) Techniques and applications for sentiment analysis. Commun ACM 56(4):82–89

    Article  Google Scholar 

  • Fellbaum C (1998) WordNet: an electronic lexical database. MIT Press, Cambridge

    MATH  Google Scholar 

  • Frazier KB, Ingram RW, Tennyson BM (1984) A methodology for the analysis of narrative accounting disclosures. J Account Res 22(1):318–331

    Article  Google Scholar 

  • Fung GPC, Yu JX, Lam W (2003) Stock prediction: integrating text mining approach using real-time news. In: 2003 IEEE international conference on computational intelligence for financial engineering, 2003. Proceedings, pp 395–402. https://doi.org/10.1109/CIFER.2003.1196287

  • Groth SS, Muntermann J (2011) An intraday market risk management approach based on textual analysis. Decis Support Syst 50(4):680–691

    Article  Google Scholar 

  • Guha RV, Lenat DB (1990) Cyc: a midterm report. AI Mag 11(3):32–59

    Google Scholar 

  • Hagenau M, Liebmann M, Neumann D (2013) Automated news reading: stock price prediction based on financial news using context-capturing features. Decis Support Syst 55(3):685–697. https://doi.org/10.1016/j.dss.2013.02.006

    Article  Google Scholar 

  • Hajizadeh E, Ardakani HD, Shahrabi J (2010) Application of data mining techniques in stock markets: a survey. J Econ Int Finance 2(7):109–118

    Google Scholar 

  • Hamilton WL, Clark K, Leskovec J, Jurafsky D (2016) Inducing domain-specific sentiment lexicons from unlabeled corpora. In: Empirical methods in natural language processing (EMNLP), pp 595–605

  • Harmer GP, Abbott D (1999) Parrondo’s paradox. Stat Sci 14(2):206–213

    Article  MathSciNet  MATH  Google Scholar 

  • Hatzivassiloglou V, McKeown KR (1997) Predicting the semantic orientation of adjectives. In: Proceedings of the European Chapter of the Association for Computational Linguistics (EACL), pp 174–181

  • Henry E (2008) Are investors influenced by how earnings press releases are written? Int J Bus Commun 45:363–407

    Article  Google Scholar 

  • Heston SL, Sinha NR (2016) News versus sentiment: predicting stock returns from news stories. Technical Report 2016-048: Board of Governors of the Federal Reserve System, Washington

  • Hofman JM, Sharma A, Watts DJ (2017) Prediction and explanation in social systems. Science 355(6324):486–488

    Article  Google Scholar 

  • Hommes CH (2006) Heterogeneous agent models in economics and finance. In: Tesfatsion L, Judd K (eds) Handbook of computational economics II: agent-based economics. Elsevier, pp 1109–86

  • Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, pp 168–177

  • Hyndman RJ, Koehler AB (2006) Another look at measures of forecast accuracy. Int J Forecast 22(4):679–688

    Article  Google Scholar 

  • Kelly EF (1975) Computer recognition of English word senses. Elsevier, Amsterdam

    Google Scholar 

  • Kittrell J (2011) Sentiment reversals as buy signals. Wiley, Hoboken, pp 231–244. https://doi.org/10.1002/9781118467411.ch9

    Google Scholar 

  • Koleva N, Paiva D (2009) Copula-based regression models: a survey. J Stat Plan Inference 139(11):3847–3856. https://doi.org/10.1016/j.jspi.2009.05.023

    Article  MathSciNet  MATH  Google Scholar 

  • Kumar BS, Ravi V (2016) A survey of the applications of text mining in financial domain. Knowl-Based Syst 114:128–147

    Article  Google Scholar 

  • Lakonishok J, Maberly E (1990) The weekend effect: trading patterns of individual and institutional investors. J Finance 40:231–243

    Article  Google Scholar 

  • Lavrenko V, Schmill M, Lawrie D, Ogilvie P, Jensen D, Allan J (2000) Language models for financial news recommendation. In: Proceedings of the ninth international conference on information and knowledge management, pp 389–396

  • LeBaron B, Arthur W, Palmer R (1999) Time series properties of an artificial stock market. J Econ Dyn Control 23:1487–1516

    Article  MATH  Google Scholar 

  • Leetaru K, Schrodt PA (2013) Gdelt: global data on events, location, and tone, 1979–2012. In: ISA annual convention, vol 2. Citeseer

  • Li B, Hoi SCH (2014) Online portfolio selection: a survey. ACM Comput Surv 46(3). https://doi.org/10.1145/2512962

  • Li Q, Wang T, Gong Q, Chen Y, Lin Z, Song SK (2014a) Media-aware quantitative trading based on public web information. Decis Support Syst 61:93–105

    Article  Google Scholar 

  • Li Q, Wang T, Li P, Liu L, Gong Q, Chen Y (2014b) The effect of news and public mood on stock movements. Inf Sci 278:826–840

    Article  Google Scholar 

  • Li X, Xie H, Chen L, Wang J, Deng X (2014c) News impact on stock price return via sentiment analysis. Knowl-Based Syst 69:14–23

    Article  Google Scholar 

  • Li B, Hoi SCH, Sahoo D, Liu ZY (2015) Moving average reversion strategy for on-line portfolio selection. Artif Intell 222:104–123

    Article  MathSciNet  Google Scholar 

  • Li Q, Jiang L, Li P, Chen H (2015) Tensor-based learning for predicting stock movements. In: Proceedings of the twenty-ninth AAAI conference on artificial intelligence, pp 1784–1790

  • Li L, Qin B, Ren W, Liu T (2016) Truth discovery with memory network. CoRR arXiv:1611.01868

  • Liu H, Singh P (2004) ConceptNet—a practical commonsense reasoning tool-kit. BT Technol J 22(4):211–226

    Article  Google Scholar 

  • Liu C, Hoi SCH, Zhao P, Sun J (2016) Online arima algorithms for time series prediction. In: Thirtieth AAAI conference on artificial intelligence

  • Loughran T, McDonald B (2011) When is a liability not a liability? Textual analysis, dictionaries, and 10-ks. J Finance 66:67–97

    Article  Google Scholar 

  • Loughran T, McDonald B (2016) Textual analysis in accounting and finance: a survey. J Account Res 54(4):1187–1230

    Article  Google Scholar 

  • Ma Y, Cambria E, Gao S (2016) Label embedding for zero-shot fine-grained named entity typing. In: COLING, pp 171–180

  • Majumder N, Poria S, Gelbukh A, Cambria E (2017) Deep learning based document modeling for personality detection from text. IEEE Intell Syst 32(2):74–79

    Article  Google Scholar 

  • Malik HH, Bhardwaj VS, Fiorletta H (2011) Accurate information extraction for quantitative financial events. In: Proceedings of the 20th ACM international conference on information and knowledge management

  • Marsella S, Gratch J (2014) Computationally modeling human emotion. Commun ACM 57(12):56–67

    Article  Google Scholar 

  • Mihalcea R, Garimella A (2016) What men say, what women hear: finding gender-specific meaning shades. IEEE Intell Syst 31(4):62–67

    Article  Google Scholar 

  • Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. CoRR arXiv:1310.4546

  • Moniz A, de Jong F (2014) Classifying the influence of negative affect expressed by the financial media on investor behavior. In: Fifth information interaction in context symposium, IIiX ’14, Regensburg, Germany, 26–29 Aug 2014, pp 275–278

  • Mueen A, Keogh E (2010) Online discovery and maintenance of time series motifs. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’10. ACM, New York,, pp 1089–1098. https://doi.org/10.1145/1835804.1835941

  • Nassirtoussi AK, Aghabozorgi S, Waha TY, Ngo DCL (2014) Text mining for market prediction: a systematic review. Expert Syst Appl 41:7653–7670

    Article  Google Scholar 

  • Nguyen TH, Shirai K (2015) Topic modeling based sentiment analysis on social media for stock market prediction. In: The 53rd annual meeting of the association for computational linguistics (ACL), pp 1354–1364

  • Nguyen TH, Shirai K, Velcin J (2015) Sentiment analysis on social media for stock movement prediction. Expert Syst Appl 42:9603–9611

    Article  Google Scholar 

  • Njølstad LSH (2014) Sentiment analysis for financial applications. Master’s thesis, Norwegian University of Science and Technology

  • Nofer M, Hinz O (2015) Using twitter to predict the stock market: where is the mood effect? Bus Inf Syst Eng 57(4):229–242

    Article  Google Scholar 

  • Oliveira N, Cortez P, Areal N (2016) Stock market sentiment lexicon acquisition using microblogging data and statistical measures. Decis Support Syst 85:62–73

    Article  Google Scholar 

  • Oliveira N, Cortez P, Areal N (2017) The impact of microblogging data for stock market prediction: using twitter to predict returns, volatility, trading volume and survey sentiment indices. Expert Syst Appl 73:125–144

    Article  Google Scholar 

  • Owyang J (2009) The future of the social web. Forrester Research Inc, Cambridge

    Google Scholar 

  • Park CH, Irwin SH (2004) The profitability of technical analysis: a review. AgMAS project research report 2004-04, University of Illinois at Urbana-Champaign

  • Peters EE (1991) A chaotic attractor for the S&P 500. Financ Anal J 47(2):55–62+81. http://www.jstor.org/stable/4479416

  • Poria S, Cambria E, Gelbukh A (2016a) Aspect extraction for opinion mining with a deep convolutional neural network. Knowl-Based Syst 108:42–49

    Article  Google Scholar 

  • Poria S, Cambria E, Hazarika D, Vij P (2016b) A deeper look into sarcastic tweets using deep convolutional neural networks. In: COLING, pp 1601–1612

  • Poria S, Chaturvedi I, Cambria E, Hussain A (2016c) Convolutional MKL based multimodal emotion recognition and sentiment analysis. In: ICDM, Barcelona, pp 439–448

  • Poria S, Cambria E, Bajpai R, Hussain A (2017) A review of affective computing: from unimodal analysis to multimodal fusion. Inf Fusion 37:98–125

    Article  Google Scholar 

  • Qian B, Rasheed K (2004) Hurst exponent and financial market predictability. In: Proceedings of the 2nd IASTED international conference on financial engineering and applications, pp 203–209

  • Rachlin G, Last M, Alberg D, Kandel A (2007) Admiral: a data mining based financial trading system. In: IEEE symposium on computational intelligence and data mining

  • Rajput V, Bobde S (2016) Stock market forecasting techniques: literature survey. Int J Comput Sci Mob Comput 5(6):500–506

    Google Scholar 

  • Reuters T (2016) OptiRisk: Marketpsych indices and sentiment analysis toolkit. Products Leaflets Thomson Reuters

  • Ruiz EJ, Hristidis V, Castillo C, Gionis A, Jaimes A (2012) Correlating financial time series with micro-blogging activity. In: Proceedings of the fifth ACM international conference on web search and data mining, pp 513–522

  • Sag IA, Baldwin T, Bond F, Copestake A, Flickinger D (2002) Multiword expressions: a pain in the neck for NLP. In: Lecture notes in computer science, vol 2276, pp 1–15

  • Samo YLK, Vervuurt A (2016) Stochastic portfolio theory: a machine learning approach. In: Proceedings of the thirty-second conference on uncertainty in artificial intelligence (UAI)

  • Schneider MJ, Gupta S (2016) Forecasting sales of new and existing products using consumer reviews: a random projections approach. Int J Forecast 32:243–256

    Article  Google Scholar 

  • Schumaker RP, Chen H (2009) Textual analysis of stock market prediction using breaking financial news: the AZFin text system. ACM Trans Inf Syst 27(2):1–19. https://doi.org/10.1145/1462198.1462204

    Article  Google Scholar 

  • Schumaker RP, Zhang Y, Huang CN, Chen H (2012) Financial fraud detection using vocal, linguistic and financial cues. Decis Support Syst 53:458–464

    Article  Google Scholar 

  • Sehgal V, Song C (2007) Sops: stock prediction using web sentiment. In: Proceedings of the seventh IEEE international conference on data mining workshops, pp 21–26

  • Shacham S (1983) A shortened version of the profile of mood states. J Personal Assess 47(3):305–306

    Article  Google Scholar 

  • Shen W, Wang J, Ma S (2014) Doubly regularized portfolio with risk minimization. In: Proceedings of the twenty-eighth AAAI conference on artificial intelligence. AAAI Press, pp 1286–1292

  • Si J, Mukherjee A, Liu B, Li Q, Li H, Deng X (2013) Exploiting topic based twitter sentiment for stock prediction. In: The 51st annual meeting of the association for computational linguistics (ACL)

  • Si J, Mukherjee A, Liu B, Pan SJ, Li Q, Li H (2014) Exploiting social relations and sentiment for stock prediction. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), pp 1139–1145

  • Sowa JF (1987) Semantic networks. In: Shapiro SC (eds) Encyclopedia of artificial intelligence. Wiley, pp 1011–1024

  • Stein D, Bouchey P, Atwill T, Nemtchinov V (2013) Why does diversifying and rebalancing create alpha? White paper, Parametric

  • Tai Y, Kao H (2013) Automatic domain-specific sentiment lexicon generation with label propagation. In: The 15th international conference on information integration and web-based applications and services, Vienna, Austria

  • Taleb NN (2008) Finiteness of variance is irrelevant in the practice of quantitative finance. Complexity 14(3):66–76. https://doi.org/10.1002/cplx.20263

    Article  MathSciNet  Google Scholar 

  • Tetlock PC, Saar-Tsechansky M, Macskassy S (2008) More than words: quantifying language to measure firms’ fundamentals. J Finance 63(3):1437–1467

    Article  Google Scholar 

  • Ticknor JL (2013) A bayesian regularized artificial neural network for stock market forecasting. Expert Syst Appl 40(14):5501–5506

    Article  Google Scholar 

  • Tkác M, Verner R (2016) Artificial neural networks in business: two decades of research. Appl Soft Comput 38:788–804

    Article  Google Scholar 

  • Uhl M (2014) Reuters sentiment and stock returns. J Behav Finance 15(4):287–298

    Article  Google Scholar 

  • Valitutti R (2004) WordNet-affect: an affective extension of WordNet. In: Proceedings of the 4th international conference on language resources and evaluation, pp 1083–1086

  • Vui CS et al (2013) A review of stock market prediction with artificial neural network. In: IEEE international conference on control system, computing and engineering, pp 477–482

  • Wei W, Mao Y, Wang B (2016) Twitter volume spikes and stock options pricing. Comput Commun 73:271–281

    Article  Google Scholar 

  • Weidmann NB, Ward MD (2010) Predicting conflict in space and time. J Confl Resolut 54(6):883–901

    Article  Google Scholar 

  • Wilson T, Hoffmann P, Somasundaran S, Kessler J, Wiebe J, Choi Y, Cardie C, Riloff E, Patwardhan S (2005) OpinionFinder: a system for subjectivity analysis. In: Empirical methods in natural language processing (EMNLP)

  • Witte JH (2015) Volatility harvesting: extracting return from randomness. CoRR arXiv:1508.05241

  • Wuthrich B, Cho V, Leung S, Permunetilleke D, Sankaran K, Zhang J (1998) Daily stock market forecast from textual web data. In: IEEE international conference on systems, man, and cybernetics, vol 3, pp 2720–2725

  • Xing FZ, Cambria E, Zou X (2017) Predicting evolving chaotic time series with fuzzy neural networks. In: International joint conference on neural networks (IJCNN), pp 3176–3183

  • Yoshihara A, Seki K, Uehara K (2016) Leveraging temporal properties of news events for stock market prediction. Artif Intell Res 5(1):103–110

    Google Scholar 

  • Zhang GP (2003) Time series forecasting using a hybrid arima and neural network model. Neurocomputing 50:159–175

    Article  MATH  Google Scholar 

  • Zhang W, Li C, Ye Y, Li W, Ngai EW (2015) Dynamic business network analysis for correlated stock price movement prediction. IEEE Intell Syst 30(2):26–33

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Erik Cambria.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xing, F.Z., Cambria, E. & Welsch, R.E. Natural language based financial forecasting: a survey. Artif Intell Rev 50, 49–73 (2018). https://doi.org/10.1007/s10462-017-9588-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-017-9588-9

Keywords

Navigation