Skip to main content
Log in

Machine learning to establish proxies for investor attention: evidence of improved stock-return prediction

  • Original Research
  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

It is widely recognized that limited attention capacity of individual investors affects stock performance. We construct five aggregate investor attention indices for each stock by extracting common information components related to stock returns from various attention proxies using equal-weighted (EW), principal component analysis (PCA), partial least squares (PLS), gradient boosting decision tree (GBDT), and random forest (RF) methods. In a sample of all Shanghai Stock Exchange 50 constituent stocks, we identify two attention indices constructed by machine learning algorithms, RF and GBDT, that provide economically meaningful enhanced prediction of stock returns in both in-sample and out-of-sample periods. Moreover, these indices are negatively related to return volatility. Results suggest the utility of using machine-learning to form proxies of investor attention and reveal the excellent forecasting power of these proxies in asset pricing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Where \(U\) and \(V\) are orthogonal matrices with orthonormal eigenvectors chosen from \(A{A}^{T}\) and \({A}^{T}A\) respectively. \({S}^{^{\prime}}\) is a diagonal matrix with \(r\) elements equal to the root of the positive eigenvalues of \(A{A}^{T}\) and \({A}^{T}A\).

References

  • Aboody, D., Lehavy, R., & Trueman, B. (2010). Limited attention and the earnings announcement returns of past stock market winners. Review of Accounting Studies, 2(15), 317–344.

    Article  Google Scholar 

  • Aggarwal, R., & Goodell, J. W. (2008). Equity premia in emerging markets: National characteristics as determinants. Journal of Multinational Financial Management, 18(4), 389–404.

    Article  Google Scholar 

  • Aggarwal, R., & Goodell, J. W. (2011). International variations in expected equity premia: Role of financial architecture and governance. Journal of Banking and Finance, 35(11), 3090–3100.

    Article  Google Scholar 

  • Akyildirim, E., Goncu, A., & Sensoy, A. (2020). Prediction of cryptocurrency returns using machine learning. Annals of Operations Research, 297, 3–36.

    Article  Google Scholar 

  • Andrei, D., & Hasler, M. (2015). Investor attention and stock market volatility. Review of Financial Studies, 1(28), 33–72.

    Article  Google Scholar 

  • Arif, S., & Lee, C. M. C. (2014). Aggregate investment and investor sentiment. Review of Financial Studies, 11(27), 3241–3327.

    Article  Google Scholar 

  • Baker, M., & Wurgler, J. (2006). Investor sentiment and the cross-section of stock returns. Journal of Finance, 61(4), 1645–1680.

    Article  Google Scholar 

  • Barberis, N., Shleifer, A., & Vishny, R. (1998). A model of investor sentiment. Journal of Financial Economics, 3(49), 307–343.

    Article  Google Scholar 

  • Barber, B. M., & Odean, T. (2008). All that glitters: The effect of attention and news on the buying behavior of individual and institutional investors. Review of Financial Studies, 2(21), 785–818.

    Article  Google Scholar 

  • Bali, T.G., Goyal, A., Huang, D., Jiang, F., & Wen, Q. (2020). Different strokes: Return predictability across stocks and bonds with machine learning and big data. Technical report. Georgetown University.

  • Ballings, M., Dirk, V. D. P., Hespeels, N., & Gryp, R. (2015). Evaluating multiple classifiers for stock price direction prediction. Expert Systems with Application, 42(20), 7046–7056.

    Article  Google Scholar 

  • Bianchi, D., Büchner, M., & Andrea Tamoni, A. (2021). Bond risk premiums with machine learning. Review of Financial Studies, 34(2), 1046–1089.

    Article  Google Scholar 

  • Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and regression trees. CRC Press.

    Google Scholar 

  • Bijl, L., et al. (2016). Google searches and stock returns. International Review of Financial Analysis, 45, 150–156.

    Article  Google Scholar 

  • Bordalo, P., Gennaioli, N., & Shleifer, A. (2012). Salience theory of choice under risk. Quarterly Journal of Economics, 3(127), 1243–1285.

    Article  Google Scholar 

  • Bosch, A., Zisserman, A., & Munoz, X. (2007). Image classification using random forests and ferns. In Proceedings IEEE 11th International Conference on Computer Vision.

  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.

    Article  Google Scholar 

  • Cepni, O., Guney, I. E., Gupta, R., & Wohar, M. E. (2020). The role of an aligned investor sentiment index in predicting bond risk premia of the US. Journal of Financial Markets, 51, 100541.

    Article  Google Scholar 

  • Chen, J., et al. (2016). Investor attention and macroeconomic news announcements: Evidence from stock index futures. Journal of Future Markets, 3(36), 240–266.

    Article  Google Scholar 

  • Chen, J., Tang, G., Yao, J., & Zhou, G. (2020). Investor attention and stock return. Available at SSRN 3194387.

  • Cziraki, P., Mondria, J., & Wu, T. (2019). Asymmetric attention and stock returns. Management Science, 67(1), 48–71.

    Article  Google Scholar 

  • Da, Z., Engelberg, J., & Gao, P. (2011). In search of attention. Journal of Finance, 5(66), 14611–15499.

    Google Scholar 

  • Da, Z., Engelberg, J., & Gao, P. (2014). The sum of all FEARS investor sentiment and asset prices. Review of Financial Studies, 1(28), 1–32.

    Google Scholar 

  • Daniel, K., Hirshleifer, D., & Subrahmanyam, A. (1998). Investor psychology and security market under- and overreactions. Journal of Finance, 6(53), 1839–1885.

    Article  Google Scholar 

  • Daskalaki, C., Kostakis, A., & Skiadopoulos, G. (2014). Are there common factors in individual commodity futures returns? Journal of Banking and Finance, 40, 346–363.

    Article  Google Scholar 

  • Ding, R., & Hou, W. (2015). Retail investor attention and stock liquidity. Journal of International Financial Markets, Institutions and Money, 37, 12–26.

    Article  Google Scholar 

  • Drake, M. S., Roulstone, D. T., & Thornock, J. R. (2012). Investor information demand: Evidence from google searches around earnings announcements. Journal of Accounting Research, 4(50), 1001–1040.

    Article  Google Scholar 

  • Drobetz, W., & Otto, T. (2020). Empirical asset pricing via machine learning: Evidence from the European stock market, Available at SSRN

  • Dzielinski, M. (2012). Measuring economic uncertainty and its impact on the stock market. Finance Research Letters, 3(9), 167–175.

    Article  Google Scholar 

  • Fama, E. F. (1970). Efficient capital markets: A review of theory and empirical work. Journal of Finance, 2(25), 383–417.

    Article  Google Scholar 

  • Fama, E. F., & French, K. (1993). Common risk factors in the returns on stocks and bonds. Journal of Financial Economics, 33, 3–56.

    Article  Google Scholar 

  • Fama, E. F., & French, K. R. (2015). A five-factor asset pricing model. Journal of Financial Economics, 116(1), 1–22.

    Article  Google Scholar 

  • Fang, L., & Peress, J. (2009). Media coverage and the cross-section of stock returns. Journal of Finance, 5(64), 2023–2052.

    Article  Google Scholar 

  • Figelman, I. (2007). Stock return momentum and reversal. Journal of Portfolio Management, 1(34), 51–67.

    Article  Google Scholar 

  • Fiske, S., & Taylor, S. (1998). Social cognition (2nd ed.). McGraw-Hill.

    Google Scholar 

  • Gao, L., & Suss, S. (2015). Market sentiment in commodity futures returns. Journal of Empirical Finance, 33, 84–103.

    Article  Google Scholar 

  • Gremers, M., & Pareek, A. (2014). Short-term trading and stock return anomalies: Momentum, reversal, and share issuance. Review of Finance, 4(19), 1649–1701.

    Google Scholar 

  • Gu, S., Kelly, B., & Xiu, D. (2020). Empirical asset pricing via machine learning. Review of Financial Studies, 33(5), 2223–2273.

    Article  Google Scholar 

  • Gu, S., Kelly, B., & Xiu, D. (2021). Autoencoder asset pricing models. Journal of Econometrics, 222(1), 429–450.

    Article  Google Scholar 

  • Guo, T., Finke, M., & Mulholland, B. (2014). Investor attention and advisor social media interaction. Applied Economics Letters, 4(22), 261–265.

    Google Scholar 

  • Han, L., Xu, Y., & Yin, L. (2018). Does investor attention matter? The attention-return relationships in FX markets. Economic Modelling, 68, 644–660.

    Article  Google Scholar 

  • He, X., Feng, G., Wang, J., & Wu, C. (2021). Predicting individual corporate bond returns. Technical report, City University of Hong Kong.

  • Ho, T. K. (1995). Random decision forests. In Proceedings of the third international conference on document analysis and recognition (pp. 278–282).

  • Hong, H., & Stein, J. (1999). A unified theory of underreaction, momentum trading, and overreaction in asset markets. Journal of Finance, 6(54), 2143–2184.

    Article  Google Scholar 

  • Hu, Y., Li, X., Goodell, J. W., & Shen, D. (2021). Investor attention shocks and stock co-movement: Substitution or reinforcement? International Review of Financial Analysis, 73, 101617.

    Article  Google Scholar 

  • Huang, D., Jiang, F., Tu, J., & Zhou, G. (2015). Investor sentiment aligned: A powerful predicitor of stock returns. Review of Financial Studies, 28(3), 791–837.

    Article  Google Scholar 

  • Huang, S., Huang, Y., & Lin, T.-C. (2019). Attention allocation and return co-movement: Evidence from repeated natural experiments. Journal of Financial Economics, 132(2), 369–383.

    Article  Google Scholar 

  • Kalsson, N., Loewenstein, G., & Seppi, D. (2009). The ostrich effect: Selective attention to information. Journal of Risk and Uncertainty, 38, 95–115.

    Article  Google Scholar 

  • Kaniel, R., Liu, S., Saar, G., & Titman, S. (2012). Individual investor trading and return patterns around earnings announcements. Journal of Finance, 2(67), 639–680.

    Article  Google Scholar 

  • Kyriakou, I., Mousavi, P., Nielsen, J. P., & Scholz, M. (2019). Forecasting benchmarks of long-term stock returns via machine learning. Annals of Operations Research, 297, 221–240.

    Article  Google Scholar 

  • Li, X., Ma, J., Wang, S., & Zhang, X. (2015). How does Google search affect trader positions and crude oil prices? Economic Modelling, 49, 162–171.

    Article  Google Scholar 

  • Li, J., & Yu, J. (2012). Investor attention, psychological anchors, and stock return predictability. Journal of Financial Economics, 2(104), 401–419.

    Article  Google Scholar 

  • Li, Y., Goodell, J. W., & Shen, D. (2021). Comparing search-engine and social-media attentions in finance research: Evidence from cryptocurrencies. International Review of Economics & Finance, 75, 723–746.

    Article  Google Scholar 

  • Lou, D. (2010). Maximizing short-term stock prices through advertising. Available at SSRN 1571947.

  • Merton, R. C. (1987). A simple model of capital market equilibrium with incomplete information. Journal of Finance, 3(42), 483–510.

    Article  Google Scholar 

  • Moat, H. S., Curme, C., Avakian, A., Kenett, D. Y., Stanley, H. E., & Preis, T. (2013). Quantifying Wikipedia usage patterns before stock market moves. Scientific Reports, 3(1), 1–5.

    Article  Google Scholar 

  • Neely, C., Rapach, D., Tu, J., & Zhou, G. (2014). Forecasting the equity risk premium: The role of technical indicators. Management Science, 60, 1772–1791.

    Article  Google Scholar 

  • Nisbett, R., & Ross, L. (1980). Human inference: Strategies and shortcomings of social judgment. ThriftBooks-Baltimore New Jersey Englewood-Cliffs.

  • Patel, J., Shah, S., Thakkar, P., & Kotecha, K. (2015). Predicting stock market index using fusion of machine learning techniques. Expert Systems with Application, 4(42), 2162–2172.

    Article  Google Scholar 

  • Peng, L., & Xiong, W. (2006a). Investor attention, overconfidence and category learning. Journal of Financial Economics, 3(80), 563–602.

    Article  Google Scholar 

  • Peng, L., Xiong, W., & Bollerslev, T. (2007). Investor attention and time-varying comovements. European Financial Management, 3(13), 394–422.

    Article  Google Scholar 

  • Peng, L., & Xiong, W. (2006b). Investor attention, overconfidence and category learning. Journal of Financial Economics, 80, 563–602.

    Article  Google Scholar 

  • Prasad, A. M., Iverson, L. R., & Liaw, A. (2006). Newer classification and regression tree techniques: Bagging and random forests for ecological prediction. Ecosystems, 2(9), 181–199.

    Article  Google Scholar 

  • Preis, T., Moat, H. S., & Stanley, H. E. (2013). Quantifying trading behavior in financial markets using Google Trends. Scientific Reports, 3(1), 1–6.

    Article  Google Scholar 

  • Quintana, D., Sáez, Y., & Isasi, P. (2017). Random forest prediction of IPO underpricing. Applied Sciences, 6(7), 636.

    Article  Google Scholar 

  • Rapach, D., & Zhou, G. (2013). Forecasting stock returns. Handbook of economic forecasting (pp. 328–383). Elsevier.

    Google Scholar 

  • Sicherman, N., Loewenstein, G., Seppi, D. J., & Utkus, S. P. (2016). Financial attention. Review of Financial Studies, 4(29), 863–897.

    Article  Google Scholar 

  • Smith, G. P. (2012). Google Internet search activity and volatility prediction in the market for foreign currency. Finance Research Letters, 2(9), 103–110.

    Article  Google Scholar 

  • Vlastakis, N., & Markellos, R. N. (2012). Information demand and stock market volatility. Journal of Banking & Finance, 6(36), 1808–1821.

    Article  Google Scholar 

  • Wold, H. (1966). Estimation of principal components and related models by iterative least squares. Multivariate Analysis, 391–420.

  • Ying, Q., Kong, D., & Luo, D. (2015). Investor attention, institutional ownership, and stock return: Empirical evidence from China. Emerging Markets Finance and Trade, 3(51), 672–685.

    Article  Google Scholar 

  • Zhang, W., Shen, D., Zhang, Y., & Xiong, X. (2013). Open source information, investor attention, and asset pricing. Economic Modelling, 33, 613–619.

    Article  Google Scholar 

  • Zhang, B., & Wang, Y. (2015). Limited attention of individual investors and stock performance: Evidence from the ChiNext market. Economic Modelling, 50, 94–104.

    Article  Google Scholar 

  • Zhu, Z., Sun, L., & Chen, M. (2019). Fundamental strength and short-term return reversal. Journal of Empirical Finance, 52, 22–39.

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (72071141).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dehua Shen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chu, G., Goodell, J.W., Shen, D. et al. Machine learning to establish proxies for investor attention: evidence of improved stock-return prediction. Ann Oper Res 318, 103–128 (2022). https://doi.org/10.1007/s10479-022-04892-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-022-04892-0

Keywords

Navigation