Evaluation of session-based recommendation algorithms

Ludewig, Malte; Jannach, Dietmar

doi:10.1007/s11257-018-9209-6

Evaluation of session-based recommendation algorithms

Published: 01 October 2018

Volume 28, pages 331–390, (2018)
Cite this article

User Modeling and User-Adapted Interaction Aims and scope Submit manuscript

Malte Ludewig¹ &
Dietmar Jannach²

3913 Accesses
206 Citations
20 Altmetric
1 Mention
Explore all metrics

Abstract

Recommender systems help users find relevant items of interest, for example on e-commerce or media streaming sites. Most academic research is concerned with approaches that personalize the recommendations according to long-term user profiles. In many real-world applications, however, such long-term profiles often do not exist and recommendations therefore have to be made solely based on the observed behavior of a user during an ongoing session. Given the high practical relevance of the problem, an increased interest in this problem can be observed in recent years, leading to a number of proposals for session-based recommendation algorithms that typically aim to predict the user’s immediate next actions. In this work, we present the results of an in-depth performance comparison of a number of such algorithms, using a variety of datasets and evaluation measures. Our comparison includes the most recent approaches based on recurrent neural networks like gru4rec, factorized Markov model approaches such as fism or fossil, as well as simpler methods based, e.g., on nearest neighbor schemes. Our experiments reveal that algorithms of this latter class, despite their sometimes almost trivial nature, often perform equally well or significantly better than today’s more complex approaches based on deep neural networks. Our results therefore suggest that there is substantial room for improvement regarding the development of more sophisticated session-based recommendation algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial intelligence in recommender systems

Article Open access 01 November 2020

Real-time prediction of online shoppers’ purchasing intention using multilayer perceptron and LSTM recurrent neural networks

Article 09 May 2018

Recommendation system based on deep learning methods: a systematic review and new directions

Article 03 August 2019

Notes

https://www.dropbox.com/sh/dbzmtq4zhzbj5o9/AACldzQWbw-igKjcPTBI6ZPAa?dl=0.
Other weighting functions, e.g., with a logarithmic decay, are possible as well. Using the linear function however led to the best results, on average, in our experiments.
The method was proposed by Hidasi et al. in the context of the gru4rec method.
We use the implementation published at https://github.com/hidasib/GRU4Rec.
We made additional experiments using other ways of encoding sequential information, e.g., by using embeddings of sessions and items with the popular Word2Vec and Doc2Vec approaches. However, none of these variations led to better accuracy results than the sknn method in our experiments. We therefore omit these results from our later discussions.
Note that the weighting function is designed to work independently from the similarity function. We rely on the binary session representation for the similarity calculation without considering the order of the items to ensure computational efficiency.
https://www.dropbox.com/sh/dbzmtq4zhzbj5o9/AACldzQWbw-igKjcPTBI6ZPAa?dl=0.
To ensure that the smaller size of those splits does not negatively affect the performance of the model-based approaches, we tested the single-split configurations as well on all datasets. The obtained results are mostly in line with those obtained with the sliding-window protocol and shown in “Appendix D”.
http://www.clef-newsreel.org/.
https://www.sport1.de/.
We provide additional results that were obtained for measurements taken at multiple list lengths in “Appendix B”.
In the dataset, timestamps are only available at the granularity of days.
We applied the Wilcoxon signed-rank test (\(\alpha =0.05\)) to determine the significance of differences between the two best performing approaches for each dataset.
The other media datasets did not exhibit any notable particularities.

References

Adomavicius, G., Kwon, Y.O.: Improving aggregate recommendation diversity using ranking-based techniques. IEEE Trans. Knowl. Data Eng. 24(5), 896–911 (2012)
Article Google Scholar
Adomavicius, G., Zhang, J.: Impact of data characteristics on recommender systems performance. ACM Trans. Manag. Inf. Syst. 3(1), 3:1–3:17 (2012)
Article Google Scholar
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: SIGMOD ’93, pp. 207–216 (1993)
Article Google Scholar
Baeza-Yates, R., Jiang, D., Silvestri, F., Harrison, B.: Predicting the next app that you are going to use. In: WSDM ’15, pp. 285–294 (2015)
Billsus, D., Pazzani, M.J., Chen, J.: A learning agent for wireless news access. In: IUI ’00, pp. 33–36 (2000)
Bonnin, G., Jannach, D.: Automated generation of music playlists: survey and experiments. Comput. Surv. 47(2), 26:1–26:35 (2014)
Article Google Scholar
Chen, S., Moore, J.L., Turnbull, D., Joachims, T.: Playlist prediction via metric embedding. In: KDD ’12, pp. 714–722 (2012)
Chen, S., Xu, J., Joachims, T.: Multi-space probabilistic sequence modeling. In: KDD ’13, pp. 865–873 (2013)
Cheng, C., Yang, H., Lyu, M.R., King, I.: Where you like to go next: Successive point-of-interest recommendation. In: IJCAI ’13, pp. 2605–2611 (2013)
Cho, K., van Merriënboer, B., Gülçehre, Ç, Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: EMNLP ’14, pp. 1724–1734 (2014)
Cooley, R., Mobasher, B., Srivastava, J.: Data preparation for mining world wide web browsing patterns. Knowl. Inf. Syst. 1(1), 5–32 (1999)
Article Google Scholar
Das, A.S., Datar, M., Garg, A., Rajaram, S.: Google news personalization: Scalable online collaborative filtering. In: Proceedings of the 16th International Conference on World Wide Web, WWW ’07, pp. 271–280 (2007)
Davidson, J., Liebald, B., Liu, J., Nandy, P., Van Vleet, T., Gargi, U., Gupta, S., He, Y., Lambert, M., Livingston, B., Sampath, D.: The YouTube video recommendation system. In: RecSys ’10, pp. 293–296 (2010)
Devooght, R., Bersini, H.: Long and short-term recommendations with recurrent neural networks. In: UMAP ’17, pp. 13–21 (2017)
Djuric, N., Radosavljevic, V., Grbovic, M., Bhamidipati, N.: Hidden conditional random fields with deep user embeddings for ad targeting. In: ICDM ’14, pp. 779–784 (2014)
Du, N., Dai, H., Trivedi, R., Upadhyay, U., Gomez-Rodriguez, M., Song, L.: Recurrent marked temporal point processes: embedding event history to vector. In: KDD ’16, pp. 1555–1564 (2016)
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)
MathSciNet MATH Google Scholar
Feng, S., Li, X., Zeng, Y., Cong, G., Chee, Y.M., Yuan, Q.: Personalized ranking metric embedding for next new POI recommendation. In: IJCAI ’15, pp. 2069–2075 (2015)
Garcin, F., Dimitrakakis, C., Faltings, B.: Personalized news recommendation with context trees. In: RecSys ’13, pp. 105–112 (2013)
Grbovic, M., Radosavljevic, V., Djuric, N., Bhamidipati, N., Savla, J., Bhagwan, V., Sharp, D.: E-commerce in your inbox: product recommendations at scale. In: KDD ’15, pp. 1809–1818 (2015)
Hariri, N., Mobasher, B., Burke, R.: Context-aware music recommendation based on latent topic sequential patterns. In: RecSys ’12, pp. 131–131 (2012)
He, R., McAuley, J.: Fusing similarity models with Markov Chains for sparse sequential recommendation. CoRR. (2016). arxiv:1609.09152
He, Q., Jiang, D., Liao, Z., Hoi, S.C.H., Chang, K., Lim, E.-P., Li, H.: Web query recommendation via sequential query prediction. In: ICDE ’09, pp. 1443–1454 (2009)
He, J., Li, X., Liao, L., Song, D., Cheung, W.: Inferring a personalized next point-of-interest recommendation model with latent behavior patterns. In: AAAI ’16 (2016)
Hidasi, B., Karatzoglou, A.: Recurrent neural networks with top-k gains for session-based recommendations. CoRR. (2017). arxiv:1706.03847
Hidasi, B., Karatzoglou, A., Baltrunas, L., Tikk, D.: Session-based recommendations with recurrent neural networks. In: ICLR ’16 (2016a)
Hidasi, B., Quadrana, M., Karatzoglou, A., Tikk, D.: Parallel recurrent neural network architectures for feature-rich session-based recommendations. In: RecSys ’16, pp. 241–248 (2016b)
Hosseinzadeh Aghdam, M., Hariri, N., Mobasher, B., Burke, R.: Adapting recommendations to contextual changes using hierarchical hidden Markov models. In: RecSys ’15, pp. 241–244 (2015)
Jannach, D., Hegelich, K.: A case study on the effectiveness of recommendations in the mobile internet. In: RecSys ’09, pp. 205–208 (2009)
Jannach, D., Adomavicius, G.: Recommendations with a purpose. In: RecSys ’16, pp. 7–10 (2016)
Jannach, D., Ludewig, M.: When recurrent neural networks meet the neighborhood for session-based recommendation. In: RecSys ’17, pp. 306–310 (2017)
Jannach, D., Lerche, L., Jugovac, M.: Adaptation and evaluation of recommendations for short-term shopping goals. In: RecSys ’15, pp. 211–218 (2015a)
Jannach, D., Lerche, L., Kamehkhosh, I., Jugovac, M.: What recommenders recommend: an analysis of recommendation biases and possible countermeasures. User Model. User Adapt. Interact. 25(5), 427–491 (2015b)
Article Google Scholar
Jannach, D., Kamehkhosh, I., Lerche, L.: Leveraging multi-dimensional user models for personalized next-track music recommendation. In: ACM SAC 2017 (2017a)
Jannach, D., Ludewig, M., Lerche, L.: Session-based item recommendation in e-commerce: on short-term intents, reminders, trends, and discounts. User Model. User Adapt. Interact. 27(3–5), 351–392 (2017b)
Article Google Scholar
Jugovac, M., Jannach, D., Karimi, M.: StreamingRec: a framework for benchmarking stream-based news recommenders. In: RecSys 2018 (2018)
Kabbur, S., Ning, X., Karypis, G.: FISM: factored item similarity models for top-n recommender systems. In: KDD ’13, pp. 659–667 (2013)
Kamehkhosh, I., Jannach, D., Ludewig, M.: A comparison of frequent pattern techniques and a deep learning method for session-based recommendation. In: TempRec Workshop at ACM RecSys ’17, Como, Italy (2017)
Karimi, M., Jannach, D., Jugovac, M.: News recommender systems—survey and roads ahead. Inf. Process. Manag. 54(6), 1203–1227 (2018)
Kingma, D.P., Adam, J.B.: A method for stochastic optimization. CoRR (2014). arxiv:1412.6980
Lee, D., Hosanagar, K.: Impact of recommender systems on sales volume and diversity. In: ICIS 2014 (2014)
Lerche, L., Jannach, D., Ludewig, M.: On the value of reminders within e-commerce recommendations. In: UMAP ’16, pp. 27–35 (2016)
Li, Z., Zhao, H., Liu, Q., Huang, Z., Mei, T., Chen, E.: Learning from history and present: next-item recommendation via discriminatively exploiting user behaviors. In: KDD 2018 (2018)
Lian, D., Zheng, V.W., Xie, X.: Collaborative filtering meets next check-in location prediction. In: WWW ’13, pp. 231–232 (2013)
Linden, G., Smith, B., York, J.: Amazon.com recommendations: item-to-item collaborative filtering. IEEE Internet Comput. 7(1), 76–80 (2003)
Article Google Scholar
Liu, J., Dolan, P., Pedersen, E.R.: Personalized news recommendation based on click behavior. In: IUI ’10, pp. 31–40 (2010)
Liu, Y., Liu, C., Liu, B., Qu, M., Xiong, H.: Unified point-of-interest recommendation with temporal interval assessment. In: KDD ’16, pp. 1015–1024 (2016)
Liu, Q., Zeng, Y., Mokhosi, R., Zhang, H.: STAMP: short-term attention/memory priority model for session-based recommendation. In: KDD 2018 (2018)
Ludmann, C.A.: Recommending news articles in the CLEF news recommendation evaluation lab with the data stream management system odysseus. In: Working Notes of CLEF 2017—Conference and Labs of the Evaluation (2017)
McFee, B., Lanckriet, G.: The natural language of playlists. In: ISMIR ’11, pp. 537–541 (2011)
McFee, B., Lanckriet, G.R.G.: Hypergraph models of playlist dialects. In: ISMIR ’12, pp. 343–348 (2012)
Mobasher, B., Dai, H., Luo, T., Nakagawa, M.: Using sequential and non-sequential patterns in predictive web usage mining tasks. In: ICDM ’02, pp. 669–672 (2002)
Moling, O., Baltrunas, L., Ricci, F.: Optimal radio channel recommendations with explicit and implicit feedback. In: RecSys ’12, pp. 75–82 (2012)
Natarajan, N., Shin, D., Dhillon, I.S.: Which app will you use next? Collaborative filtering with interactional context. In: RecSys ’13, pp. 201–208 (2013)
Norris, J.R.: Markov Chains. Cambridge University Press, Cambridge (1997)
Book Google Scholar
Quadrana, M., Karatzoglou, A., Hidasi, B., Cremonesi, P.: Personalizing session-based recommendations with hierarchical recurrent neural networks. In: RecSys ’17 (2017)
Quadrana, M., Cremonesi, P., Jannach, D.: Sequence-aware recommender systems. ACM Comput. Surv. 54, 1–36 (2018)
Article Google Scholar
Reddy, S., Labutov, I., Joachims, T.: Learning student and content embeddings for personalized lesson sequence recommendation. In: ACM Learning @ Scale ’16, pp. 93–96 (2016)
Rendle, S., Freudenthaler, C., Gantner, Z., Schmidt-Thieme, L.: BPR: Bayesian personalized ranking from implicit feedback. In: UAI ’09, pp. 452–461 (2009)
Rendle, S., Freudenthaler, C., Schmidt-Thieme, L.: Factorizing personalized Markov Chains for next-basket recommendation. In: WWW ’10, pp. 811–820 (2010)
Shani, G., Heckerman, D., Brafman, R.I.: An MDP-based recommender system. J. Mach. Learn. Res. 6, 1265–1295 (2005)
MathSciNet MATH Google Scholar
Soh, H., Sanner, S., White, M., Jamieson, G.: Deep sequential recommendation for personalized adaptive user interfaces. In: IUI ’17, pp. 589–593 (2017)
Song, Q., Cheng, J., Yuan, T., Lu, H.: Personalized recommendation meets your next favorite. In: CIKM ’15, pp. 1775–1778 (2015)
Song, Y., Elkahky, A.M., He, X.: Multi-rate deep learning for temporal recommendation. In: SIGIR ’16, pp. 909–912 (2016)
Sordoni, A., Bengio, Y., Vahabi, H., Lioma, C., Grue Simonsen, J., Nie, J.-Y.: A hierarchical recurrent encoder-decoder for generative context-aware query suggestion. In: CIKM ’15, pp. 553–562 (2015)
Tagami, Y., Kobayashi, H., Ono, S., Tajima, A.: Modeling user activities on the web using paragraph vector. In: WWW ’15, pp. 125–126 (2015)
Tan, Y.K., Xu, X., Liu, Y.: Improved recurrent neural networks for session-based recommendations. In: DLRS ’16 Workshop at ACM RecSys, pp. 17–22 (2016)
Tavakol, M., Brefeld, U.: Factored MDPs for detecting topics of user sessions. In: RecSys ’14, pp. 33–40 (2014)
Turrin, R., Quadrana, M., Condorelli, A., Pagano, R., Cremonesi, P.: 30music listening and playlists dataset. In: Poster Proceedings of RecSys ’15 (2015)
Twardowski, B.: Modelling contextual information in session-aware recommender systems with neural networks. In: RecSys ’16, pp. 273–276 (2016)
Vasile, F., Smirnova, E., Conneau, A.: Meta-prod2vec: product embeddings using side-information for recommendation. In: RecSys ’16, pp. 225–232 (2016)
Verstrepen, K., Goethals, B.: Unifying nearest neighbors collaborative filtering. In: RecSys ’14, pp. 177–184 (2014)
Wu, X., Liu, Q., Chen, E., He, L., Lv, J., Cao, C., Hu, G.: Personalized next-song recommendation in online karaokes. In: RecSys ’13, pp. 137–140 (2013)
Yap, G.-E., Li, X.-L., Yu, P.S.: Effective next-items recommendation via personalized sequential pattern mining. In: DASFAA ’12, Volume Part II, pp. 48–64 (2012)
Chapter Google Scholar
Yu, F., Liu, Q., Wu, S., Wang, L., Tan, T.: A dynamic recurrent model for next basket recommendation. In: SIGIR ’16, pp. 729–732 (2016)
Zangerle, E., Pichl, M., Gassler, W., Specht, G.: #nowplaying music dataset: extracting listening behavior from Twitter. In: WISMM ’14 Workshop at MM ’14, pp. 21–26 (2014)
Zeiler, M.D.: ADADELTA: an adaptive learning rate method. CoRR (2012). arxiv:1212.5701
Zhang, Y., Dai, H., Xu, C., Feng, J., Wang, T., Bian, J., Wang, B., Liu, T.-Y.: Sequential click prediction for sponsored search with recurrent neural networks. In: AAAI ’14, pp. 1369–1375 (2014)
Zheleva, E., Guiver, J., Mendes Rodrigues, E., Milić-Frayling, N.: Statistical models of music-listening sessions in social media. In: WWW ’10, pp. 1019–1028 (2010)

Download references

Author information

Authors and Affiliations

TU Dortmund, Dortmund, Germany
Malte Ludewig
AAU Klagenfurt, Klagenfurt, Austria
Dietmar Jannach

Authors

Malte Ludewig
View author publications
You can also search for this author in PubMed Google Scholar
Dietmar Jannach
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Malte Ludewig.

Additional information

A preliminary comparison of sequential recommendation algorithms was presented in our own previous work in Jannach and Ludewig (2017), Kamehkhosh et al. (2017) and a pre-print version of this work is available at https://arxiv.org/abs/1803.09587.

Appendices

Parameter configurations

See Tables 9, 10, 11 and 12.

Table 9 Parameters for algorithm gru4rec for all datasets

Evaluation of session-based recommendation algorithms

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence in recommender systems

Real-time prediction of online shoppers’ purchasing intention using multilayer perceptron and LSTM recurrent neural networks

Recommendation system based on deep learning methods: a systematic review and new directions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Parameter configurations

Full result tables

Additional results for precision and recall

Additional single split results

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation