Abstract
This study investigates the relationship between news sentiment and the stock market’s return. The sentiment was automatically analyzed using four methods, including lexicon-based and deep learning-based approaches, at three levels of granularity, i.e., sentence, paragraph, and full text. The sentiment was combined with features from the calendar year, lagged returns, and news publishers, which were fed into the XGBoost algorithm trained to classify the direction of market return for the following business day. The performance was maximized using Bayesian hyperparameter optimization and evaluated using nested cross-validation. The proof of concept was demonstrated using ten companies in the Dow Jones Index, which were grouped into five sectors. The findings indicate an asymmetric power of sentiment measures in different sectors, with the petroleum industry being the most responsive to the sentiment expressed in the news. The study highlights the significance of targeted sentiment measures in making informed decisions about the market direction, particularly for the petroleum industry.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
For more details on the available data catalog, refer to https://www.historicaloptiondata.com.
- 3.
- 4.
- 5.
- 6.
The F1 is the harmonic mean of the precision and recall. The precision is the ratio of correctly predicted positive classes to all items predicted to be positive. And the Recall is the ratio of correctly predicted positive classes to all items that are actually positive.
- 7.
- 8.
- 9.
The ticker symbol represents crude oil futures contracts traded on the New York Mercantile Exchange (NYMEX) and the Chicago Mercantile Exchange (CME).
References
Carosia, A.E.O., Coelho, G.P., Silva, A.E.A.: Analyzing the Brazilian financial market through Portuguese sentiment analysis in social media. Appl. Artif. Intell. 34, 1–19 (2020). https://doi.org/10.1080/08839514.2019.1673037
Jing, N., Wu, Z., Wang, H.: A hybrid model integrating deep learning with investor sentiment analysis for stock price prediction. Expert Syst. Appl. 178, 115019 (2021). https://doi.org/10.1016/j.eswa.2021.115019
Johnman, M., Vanstone, B.J., Gepp, A.: Predicting FTSE 100 returns and volatility using sentiment analysis. Account. Financ.; Wiley Online Library 58, 253–274 (2018). https://doi.org/10.1111/acfi.12373
Yadav, A., Vishwakarma, D.K: Sentiment analysis using deep learning architectures: a review. In: Artificial Intelligence Review, vol. 53, pp. 4335–4385. Springer (2020). https://doi.org/10.1007/s10462-019-09794-5
Tetlock, P.C: Giving content to investor sentiment: the role of media in the stock market. J. Financ. 62, 1139–1168 (2007). https://doi.org/10.1111/j.1540-6261.2007.01232.x
Loughran, T., McDonald, B.: When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. J. Financ. 66, 35–65 (2011). https://doi.org/10.1111/j.1540-6261.2010.01625.x
Nardo, M., Petracco-Giudici, M., Naltsidis, M.: Walking down Wall Street with a tablet: a survey of stock market predictions using the Web. J. Econ. Surv. 30, 356–369 (2016). https://doi.org/10.1111/joes.12102
Li, X., Wu, P., Wang, W.: Incorporating stock prices and news sentiments for stock market prediction: a case of Hong Kong. Inf. Process. Manag. 57, 102212 (2020). https://doi.org/10.1016/j.ipm.2020.102212
Hutto, C., Gilbert, E.Vader: A parsimonious rule-based model for sentiment analysis of social media text. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 8, pp. 216–225. (2014). https://doi.org/10.1609/icwsm.v8i1.14550
Feuerriegel, S., Gordon, J.: News-based forecasts of macroeconomic indicators: a semantic path model for interpretable predictions. Eur. J. Oper. Res. 272, 162–175 (2019). https://doi.org/10.1016/j.ejor.2018.05.068
Liu, J., Chen, Y., Liu, K., Zhao, J.: Attention-based event relevance model for stock price movement prediction. In: China Conference on Knowledge Graph and Semantic Computing, pp. 37–49. Springer (2017). https://doi.org/10.1007/978-3-319-69627-9_4
Wan, X., Yang, J., Marinov, S., Calliess, J.P., Zohren, S., Dong, X.: Sentiment correlation in financial news networks and associated market movements. Sci. Rep. 11, 1–12 (2021). https://doi.org/10.1038/s41598-021-82338-6
Malo, P., Sinha, A., Korhonen, P., Wallenius, J., Takala, P.: Good debt or bad debt: detecting semantic orientations in economic texts. J. Assoc. Inf. Sci. Technol.; Wiley Online Library 65, 782–796 (2014). https://doi.org/10.1002/asi.23062
Qi, P., Zhang, Y., Zhang, Y., Bolton, J., Manning, C. D.: Stanza: A Python natural language processing toolkit for many human languages. In: Association for Computational Linguistics (ACL) System Demonstrations (2020). www.nlp.stanford.edu/pubs/qi2020stanza.pdf
De Wilde, B.: Textacy: NLP, before and after spaCy (2022). www.pypi.org/project/textacy/
Loria, S.: Textblob: simplified text processing. Release 0.16 (2021). www.textblob.readthedocs.io/
Sohangir, S., Petty, N., Wang, D.: Financial sentiment lexicon analysis. In: 2018 IEEE 12th International Conference on Semantic Computing (ICSC), pp. 286–289 (2018). https://doi.org/10.1109/ICSC.2018.00052
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008. (2017). www.arxiv.org/abs/1706.03762
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171–4186. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/n19-1423
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: Roberta: a robustly optimized Bert pretraining approach (2019). https://doi.org/10.1145/3340531.3412026
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter (2019). www.arxiv.org/abs/1910.01108
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J.: Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45 (2020). www.aclweb.org/anthology/2020.emnlp-demos.6
Beltagy, I., Peters, M.E., Cohan, A.: Longformer: the long-document transformer (2020). https://doi.org/10.48550/arXiv.2004.05150
Hamborg, F., Donnay, K.: NewsMTSC: A dataset for (multi-)target-dependent sentiment classification in political news articles. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 1663–1675. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.eacl-main.142
Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016). https://doi.org/10.1145/2939672.2939785
Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2623–2631 (2019)
Varma, S., Simon, R.: Bias in error estimation when using cross-validation for model selection. BMC Bioinform. 7, 1–8 (2006). https://doi.org/10.1186/1471-2105-7-91
Reiff, N.: The World’s Top 10 Oil Companies (2023). www.investopedia.com/articles/personal-finance/010715/worlds-top-10-oil-companies.asp
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Balaneji, F., Maringer, D., Spasić, . (2024). The Power of Words: Predicting Stock Market Returns with Fine-Grained Sentiment Analysis and XGBoost. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2023. Lecture Notes in Networks and Systems, vol 822. Springer, Cham. https://doi.org/10.1007/978-3-031-47721-8_39
Download citation
DOI: https://doi.org/10.1007/978-3-031-47721-8_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47720-1
Online ISBN: 978-3-031-47721-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)