The Power of Words: Predicting Stock Market Returns with Fine-Grained Sentiment Analysis and XGBoost

Balaneji, Farshid; Maringer, Dietmar; Spasić, ‪Irena

doi:10.1007/978-3-031-47721-8_39

Farshid Balaneji¹⁰,
Dietmar Maringer¹⁰ &
‪Irena Spasić¹¹

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 822))

Included in the following conference series:

Proceedings of SAI Intelligent Systems Conference

193 Accesses

Abstract

This study investigates the relationship between news sentiment and the stock market’s return. The sentiment was automatically analyzed using four methods, including lexicon-based and deep learning-based approaches, at three levels of granularity, i.e., sentence, paragraph, and full text. The sentiment was combined with features from the calendar year, lagged returns, and news publishers, which were fed into the XGBoost algorithm trained to classify the direction of market return for the following business day. The performance was maximized using Bayesian hyperparameter optimization and evaluated using nested cross-validation. The proof of concept was demonstrated using ten companies in the Dow Jones Index, which were grouped into five sectors. The findings indicate an asymmetric power of sentiment measures in different sectors, with the petroleum industry being the most responsive to the sentiment expressed in the news. The study highlights the significance of targeted sentiment measures in making informed decisions about the market direction, particularly for the petroleum industry.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://eventregistry.org/.
2.
For more details on the available data catalog, refer to https://www.historicaloptiondata.com.
3.
https://huggingface.co/datasets/financial_phrasebank.
4.
https://github.com/cjhutto/vaderSentiment.
5.
https://mitpress.mit.edu/books/wordnet.
6.
The F1 is the harmonic mean of the precision and recall. The precision is the ratio of correctly predicted positive classes to all items predicted to be positive. And the Recall is the ratio of correctly predicted positive classes to all items that are actually positive.
7.
https://huggingface.co/Farshid/roberta-large-financial-phrasebank-allagree, https://huggingface.co/Farshid/distilbert-base-uncased_allagree3.
8.
https://github.com/fhamborg/NewsMTSC.
9.
The ticker symbol represents crude oil futures contracts traded on the New York Mercantile Exchange (NYMEX) and the Chicago Mercantile Exchange (CME).

References

Carosia, A.E.O., Coelho, G.P., Silva, A.E.A.: Analyzing the Brazilian financial market through Portuguese sentiment analysis in social media. Appl. Artif. Intell. 34, 1–19 (2020). https://doi.org/10.1080/08839514.2019.1673037
Article Google Scholar
Jing, N., Wu, Z., Wang, H.: A hybrid model integrating deep learning with investor sentiment analysis for stock price prediction. Expert Syst. Appl. 178, 115019 (2021). https://doi.org/10.1016/j.eswa.2021.115019
Johnman, M., Vanstone, B.J., Gepp, A.: Predicting FTSE 100 returns and volatility using sentiment analysis. Account. Financ.; Wiley Online Library 58, 253–274 (2018). https://doi.org/10.1111/acfi.12373
Yadav, A., Vishwakarma, D.K: Sentiment analysis using deep learning architectures: a review. In: Artificial Intelligence Review, vol. 53, pp. 4335–4385. Springer (2020). https://doi.org/10.1007/s10462-019-09794-5
Tetlock, P.C: Giving content to investor sentiment: the role of media in the stock market. J. Financ. 62, 1139–1168 (2007). https://doi.org/10.1111/j.1540-6261.2007.01232.x
Loughran, T., McDonald, B.: When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. J. Financ. 66, 35–65 (2011). https://doi.org/10.1111/j.1540-6261.2010.01625.x
Article Google Scholar
Nardo, M., Petracco-Giudici, M., Naltsidis, M.: Walking down Wall Street with a tablet: a survey of stock market predictions using the Web. J. Econ. Surv. 30, 356–369 (2016). https://doi.org/10.1111/joes.12102
Article Google Scholar
Li, X., Wu, P., Wang, W.: Incorporating stock prices and news sentiments for stock market prediction: a case of Hong Kong. Inf. Process. Manag. 57, 102212 (2020). https://doi.org/10.1016/j.ipm.2020.102212
Article Google Scholar
Hutto, C., Gilbert, E.Vader: A parsimonious rule-based model for sentiment analysis of social media text. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 8, pp. 216–225. (2014). https://doi.org/10.1609/icwsm.v8i1.14550
Feuerriegel, S., Gordon, J.: News-based forecasts of macroeconomic indicators: a semantic path model for interpretable predictions. Eur. J. Oper. Res. 272, 162–175 (2019). https://doi.org/10.1016/j.ejor.2018.05.068
Article MathSciNet Google Scholar
Liu, J., Chen, Y., Liu, K., Zhao, J.: Attention-based event relevance model for stock price movement prediction. In: China Conference on Knowledge Graph and Semantic Computing, pp. 37–49. Springer (2017). https://doi.org/10.1007/978-3-319-69627-9_4
Wan, X., Yang, J., Marinov, S., Calliess, J.P., Zohren, S., Dong, X.: Sentiment correlation in financial news networks and associated market movements. Sci. Rep. 11, 1–12 (2021). https://doi.org/10.1038/s41598-021-82338-6
Article Google Scholar
Malo, P., Sinha, A., Korhonen, P., Wallenius, J., Takala, P.: Good debt or bad debt: detecting semantic orientations in economic texts. J. Assoc. Inf. Sci. Technol.; Wiley Online Library 65, 782–796 (2014). https://doi.org/10.1002/asi.23062
Qi, P., Zhang, Y., Zhang, Y., Bolton, J., Manning, C. D.: Stanza: A Python natural language processing toolkit for many human languages. In: Association for Computational Linguistics (ACL) System Demonstrations (2020). www.nlp.stanford.edu/pubs/qi2020stanza.pdf
De Wilde, B.: Textacy: NLP, before and after spaCy (2022). www.pypi.org/project/textacy/
Loria, S.: Textblob: simplified text processing. Release 0.16 (2021). www.textblob.readthedocs.io/
Sohangir, S., Petty, N., Wang, D.: Financial sentiment lexicon analysis. In: 2018 IEEE 12th International Conference on Semantic Computing (ICSC), pp. 286–289 (2018). https://doi.org/10.1109/ICSC.2018.00052
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008. (2017). www.arxiv.org/abs/1706.03762
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171–4186. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/n19-1423
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: Roberta: a robustly optimized Bert pretraining approach (2019). https://doi.org/10.1145/3340531.3412026
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter (2019). www.arxiv.org/abs/1910.01108
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J.: Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45 (2020). www.aclweb.org/anthology/2020.emnlp-demos.6
Beltagy, I., Peters, M.E., Cohan, A.: Longformer: the long-document transformer (2020). https://doi.org/10.48550/arXiv.2004.05150
Hamborg, F., Donnay, K.: NewsMTSC: A dataset for (multi-)target-dependent sentiment classification in political news articles. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 1663–1675. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.eacl-main.142
Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016). https://doi.org/10.1145/2939672.2939785
Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2623–2631 (2019)
Google Scholar
Varma, S., Simon, R.: Bias in error estimation when using cross-validation for model selection. BMC Bioinform. 7, 1–8 (2006). https://doi.org/10.1186/1471-2105-7-91
Article Google Scholar
Reiff, N.: The World’s Top 10 Oil Companies (2023). www.investopedia.com/articles/personal-finance/010715/worlds-top-10-oil-companies.asp

Download references

Author information

Authors and Affiliations

University of Basel, Faculty of Business and Economics, Peter Merian-Weg 6, 4002, Basel, Switzerland
Farshid Balaneji & Dietmar Maringer
Cardiff University, School of Computer Science and Informatics, Abacws, Senghennydd Road, Cardiff, UK
‪Irena Spasić

Authors

Farshid Balaneji
View author publications
You can also search for this author in PubMed Google Scholar
Dietmar Maringer
View author publications
You can also search for this author in PubMed Google Scholar
‪Irena Spasić
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Farshid Balaneji .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Balaneji, F., Maringer, D., Spasić, ‪. (2024). The Power of Words: Predicting Stock Market Returns with Fine-Grained Sentiment Analysis and XGBoost. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2023. Lecture Notes in Networks and Systems, vol 822. Springer, Cham. https://doi.org/10.1007/978-3-031-47721-8_39

Download citation

DOI: https://doi.org/10.1007/978-3-031-47721-8_39
Published: 10 January 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47720-1
Online ISBN: 978-3-031-47721-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

The Power of Words: Predicting Stock Market Returns with Fine-Grained Sentiment Analysis and XGBoost