Skip to main content

The Power of Words: Predicting Stock Market Returns with Fine-Grained Sentiment Analysis and XGBoost

  • Conference paper
  • First Online:
Intelligent Systems and Applications (IntelliSys 2023)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 822))

Included in the following conference series:

  • 193 Accesses

Abstract

This study investigates the relationship between news sentiment and the stock market’s return. The sentiment was automatically analyzed using four methods, including lexicon-based and deep learning-based approaches, at three levels of granularity, i.e., sentence, paragraph, and full text. The sentiment was combined with features from the calendar year, lagged returns, and news publishers, which were fed into the XGBoost algorithm trained to classify the direction of market return for the following business day. The performance was maximized using Bayesian hyperparameter optimization and evaluated using nested cross-validation. The proof of concept was demonstrated using ten companies in the Dow Jones Index, which were grouped into five sectors. The findings indicate an asymmetric power of sentiment measures in different sectors, with the petroleum industry being the most responsive to the sentiment expressed in the news. The study highlights the significance of targeted sentiment measures in making informed decisions about the market direction, particularly for the petroleum industry.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://eventregistry.org/.

  2. 2.

    For more details on the available data catalog, refer to https://www.historicaloptiondata.com.

  3. 3.

    https://huggingface.co/datasets/financial_phrasebank.

  4. 4.

    https://github.com/cjhutto/vaderSentiment.

  5. 5.

    https://mitpress.mit.edu/books/wordnet.

  6. 6.

    The F1 is the harmonic mean of the precision and recall. The precision is the ratio of correctly predicted positive classes to all items predicted to be positive. And the Recall is the ratio of correctly predicted positive classes to all items that are actually positive.

  7. 7.

    https://huggingface.co/Farshid/roberta-large-financial-phrasebank-allagree, https://huggingface.co/Farshid/distilbert-base-uncased_allagree3.

  8. 8.

    https://github.com/fhamborg/NewsMTSC.

  9. 9.

    The ticker symbol represents crude oil futures contracts traded on the New York Mercantile Exchange (NYMEX) and the Chicago Mercantile Exchange (CME).

References

  1. Carosia, A.E.O., Coelho, G.P., Silva, A.E.A.: Analyzing the Brazilian financial market through Portuguese sentiment analysis in social media. Appl. Artif. Intell. 34, 1–19 (2020). https://doi.org/10.1080/08839514.2019.1673037

    Article  Google Scholar 

  2. Jing, N., Wu, Z., Wang, H.: A hybrid model integrating deep learning with investor sentiment analysis for stock price prediction. Expert Syst. Appl. 178, 115019 (2021). https://doi.org/10.1016/j.eswa.2021.115019

  3. Johnman, M., Vanstone, B.J., Gepp, A.: Predicting FTSE 100 returns and volatility using sentiment analysis. Account. Financ.; Wiley Online Library 58, 253–274 (2018). https://doi.org/10.1111/acfi.12373

  4. Yadav, A., Vishwakarma, D.K: Sentiment analysis using deep learning architectures: a review. In: Artificial Intelligence Review, vol. 53, pp. 4335–4385. Springer (2020). https://doi.org/10.1007/s10462-019-09794-5

  5. Tetlock, P.C: Giving content to investor sentiment: the role of media in the stock market. J. Financ. 62, 1139–1168 (2007). https://doi.org/10.1111/j.1540-6261.2007.01232.x

  6. Loughran, T., McDonald, B.: When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. J. Financ. 66, 35–65 (2011). https://doi.org/10.1111/j.1540-6261.2010.01625.x

    Article  Google Scholar 

  7. Nardo, M., Petracco-Giudici, M., Naltsidis, M.: Walking down Wall Street with a tablet: a survey of stock market predictions using the Web. J. Econ. Surv. 30, 356–369 (2016). https://doi.org/10.1111/joes.12102

    Article  Google Scholar 

  8. Li, X., Wu, P., Wang, W.: Incorporating stock prices and news sentiments for stock market prediction: a case of Hong Kong. Inf. Process. Manag. 57, 102212 (2020). https://doi.org/10.1016/j.ipm.2020.102212

    Article  Google Scholar 

  9. Hutto, C., Gilbert, E.Vader: A parsimonious rule-based model for sentiment analysis of social media text. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 8, pp. 216–225. (2014). https://doi.org/10.1609/icwsm.v8i1.14550

  10. Feuerriegel, S., Gordon, J.: News-based forecasts of macroeconomic indicators: a semantic path model for interpretable predictions. Eur. J. Oper. Res. 272, 162–175 (2019). https://doi.org/10.1016/j.ejor.2018.05.068

    Article  MathSciNet  Google Scholar 

  11. Liu, J., Chen, Y., Liu, K., Zhao, J.: Attention-based event relevance model for stock price movement prediction. In: China Conference on Knowledge Graph and Semantic Computing, pp. 37–49. Springer (2017). https://doi.org/10.1007/978-3-319-69627-9_4

  12. Wan, X., Yang, J., Marinov, S., Calliess, J.P., Zohren, S., Dong, X.: Sentiment correlation in financial news networks and associated market movements. Sci. Rep. 11, 1–12 (2021). https://doi.org/10.1038/s41598-021-82338-6

    Article  Google Scholar 

  13. Malo, P., Sinha, A., Korhonen, P., Wallenius, J., Takala, P.: Good debt or bad debt: detecting semantic orientations in economic texts. J. Assoc. Inf. Sci. Technol.; Wiley Online Library 65, 782–796 (2014). https://doi.org/10.1002/asi.23062

  14. Qi, P., Zhang, Y., Zhang, Y., Bolton, J., Manning, C. D.: Stanza: A Python natural language processing toolkit for many human languages. In: Association for Computational Linguistics (ACL) System Demonstrations (2020). www.nlp.stanford.edu/pubs/qi2020stanza.pdf

  15. De Wilde, B.: Textacy: NLP, before and after spaCy (2022). www.pypi.org/project/textacy/

  16. Loria, S.: Textblob: simplified text processing. Release 0.16 (2021). www.textblob.readthedocs.io/

  17. Sohangir, S., Petty, N., Wang, D.: Financial sentiment lexicon analysis. In: 2018 IEEE 12th International Conference on Semantic Computing (ICSC), pp. 286–289 (2018). https://doi.org/10.1109/ICSC.2018.00052

  18. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008. (2017). www.arxiv.org/abs/1706.03762

  19. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171–4186. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/n19-1423

  20. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: Roberta: a robustly optimized Bert pretraining approach (2019). https://doi.org/10.1145/3340531.3412026

  21. Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter (2019). www.arxiv.org/abs/1910.01108

  22. Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J.: Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45 (2020). www.aclweb.org/anthology/2020.emnlp-demos.6

  23. Beltagy, I., Peters, M.E., Cohan, A.: Longformer: the long-document transformer (2020). https://doi.org/10.48550/arXiv.2004.05150

  24. Hamborg, F., Donnay, K.: NewsMTSC: A dataset for (multi-)target-dependent sentiment classification in political news articles. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 1663–1675. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.eacl-main.142

  25. Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016). https://doi.org/10.1145/2939672.2939785

  26. Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2623–2631 (2019)

    Google Scholar 

  27. Varma, S., Simon, R.: Bias in error estimation when using cross-validation for model selection. BMC Bioinform. 7, 1–8 (2006). https://doi.org/10.1186/1471-2105-7-91

    Article  Google Scholar 

  28. Reiff, N.: The World’s Top 10 Oil Companies (2023). www.investopedia.com/articles/personal-finance/010715/worlds-top-10-oil-companies.asp

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Farshid Balaneji .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Balaneji, F., Maringer, D., Spasić, ‪. (2024). The Power of Words: Predicting Stock Market Returns with Fine-Grained Sentiment Analysis and XGBoost. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2023. Lecture Notes in Networks and Systems, vol 822. Springer, Cham. https://doi.org/10.1007/978-3-031-47721-8_39

Download citation

Publish with us

Policies and ethics