Abstract
Recommender systems help users find relevant items of interest, for example on e-commerce or media streaming sites. Most academic research is concerned with approaches that personalize the recommendations according to long-term user profiles. In many real-world applications, however, such long-term profiles often do not exist and recommendations therefore have to be made solely based on the observed behavior of a user during an ongoing session. Given the high practical relevance of the problem, an increased interest in this problem can be observed in recent years, leading to a number of proposals for session-based recommendation algorithms that typically aim to predict the user’s immediate next actions. In this work, we present the results of an in-depth performance comparison of a number of such algorithms, using a variety of datasets and evaluation measures. Our comparison includes the most recent approaches based on recurrent neural networks like gru4rec, factorized Markov model approaches such as fism or fossil, as well as simpler methods based, e.g., on nearest neighbor schemes. Our experiments reveal that algorithms of this latter class, despite their sometimes almost trivial nature, often perform equally well or significantly better than today’s more complex approaches based on deep neural networks. Our results therefore suggest that there is substantial room for improvement regarding the development of more sophisticated session-based recommendation algorithms.
Similar content being viewed by others
Notes
Other weighting functions, e.g., with a logarithmic decay, are possible as well. Using the linear function however led to the best results, on average, in our experiments.
The method was proposed by Hidasi et al. in the context of the gru4rec method.
We use the implementation published at https://github.com/hidasib/GRU4Rec.
We made additional experiments using other ways of encoding sequential information, e.g., by using embeddings of sessions and items with the popular Word2Vec and Doc2Vec approaches. However, none of these variations led to better accuracy results than the sknn method in our experiments. We therefore omit these results from our later discussions.
Note that the weighting function is designed to work independently from the similarity function. We rely on the binary session representation for the similarity calculation without considering the order of the items to ensure computational efficiency.
To ensure that the smaller size of those splits does not negatively affect the performance of the model-based approaches, we tested the single-split configurations as well on all datasets. The obtained results are mostly in line with those obtained with the sliding-window protocol and shown in “Appendix D”.
We provide additional results that were obtained for measurements taken at multiple list lengths in “Appendix B”.
In the dataset, timestamps are only available at the granularity of days.
We applied the Wilcoxon signed-rank test (\(\alpha =0.05\)) to determine the significance of differences between the two best performing approaches for each dataset.
The other media datasets did not exhibit any notable particularities.
References
Adomavicius, G., Kwon, Y.O.: Improving aggregate recommendation diversity using ranking-based techniques. IEEE Trans. Knowl. Data Eng. 24(5), 896–911 (2012)
Adomavicius, G., Zhang, J.: Impact of data characteristics on recommender systems performance. ACM Trans. Manag. Inf. Syst. 3(1), 3:1–3:17 (2012)
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: SIGMOD ’93, pp. 207–216 (1993)
Baeza-Yates, R., Jiang, D., Silvestri, F., Harrison, B.: Predicting the next app that you are going to use. In: WSDM ’15, pp. 285–294 (2015)
Billsus, D., Pazzani, M.J., Chen, J.: A learning agent for wireless news access. In: IUI ’00, pp. 33–36 (2000)
Bonnin, G., Jannach, D.: Automated generation of music playlists: survey and experiments. Comput. Surv. 47(2), 26:1–26:35 (2014)
Chen, S., Moore, J.L., Turnbull, D., Joachims, T.: Playlist prediction via metric embedding. In: KDD ’12, pp. 714–722 (2012)
Chen, S., Xu, J., Joachims, T.: Multi-space probabilistic sequence modeling. In: KDD ’13, pp. 865–873 (2013)
Cheng, C., Yang, H., Lyu, M.R., King, I.: Where you like to go next: Successive point-of-interest recommendation. In: IJCAI ’13, pp. 2605–2611 (2013)
Cho, K., van Merriënboer, B., Gülçehre, Ç, Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: EMNLP ’14, pp. 1724–1734 (2014)
Cooley, R., Mobasher, B., Srivastava, J.: Data preparation for mining world wide web browsing patterns. Knowl. Inf. Syst. 1(1), 5–32 (1999)
Das, A.S., Datar, M., Garg, A., Rajaram, S.: Google news personalization: Scalable online collaborative filtering. In: Proceedings of the 16th International Conference on World Wide Web, WWW ’07, pp. 271–280 (2007)
Davidson, J., Liebald, B., Liu, J., Nandy, P., Van Vleet, T., Gargi, U., Gupta, S., He, Y., Lambert, M., Livingston, B., Sampath, D.: The YouTube video recommendation system. In: RecSys ’10, pp. 293–296 (2010)
Devooght, R., Bersini, H.: Long and short-term recommendations with recurrent neural networks. In: UMAP ’17, pp. 13–21 (2017)
Djuric, N., Radosavljevic, V., Grbovic, M., Bhamidipati, N.: Hidden conditional random fields with deep user embeddings for ad targeting. In: ICDM ’14, pp. 779–784 (2014)
Du, N., Dai, H., Trivedi, R., Upadhyay, U., Gomez-Rodriguez, M., Song, L.: Recurrent marked temporal point processes: embedding event history to vector. In: KDD ’16, pp. 1555–1564 (2016)
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)
Feng, S., Li, X., Zeng, Y., Cong, G., Chee, Y.M., Yuan, Q.: Personalized ranking metric embedding for next new POI recommendation. In: IJCAI ’15, pp. 2069–2075 (2015)
Garcin, F., Dimitrakakis, C., Faltings, B.: Personalized news recommendation with context trees. In: RecSys ’13, pp. 105–112 (2013)
Grbovic, M., Radosavljevic, V., Djuric, N., Bhamidipati, N., Savla, J., Bhagwan, V., Sharp, D.: E-commerce in your inbox: product recommendations at scale. In: KDD ’15, pp. 1809–1818 (2015)
Hariri, N., Mobasher, B., Burke, R.: Context-aware music recommendation based on latent topic sequential patterns. In: RecSys ’12, pp. 131–131 (2012)
He, R., McAuley, J.: Fusing similarity models with Markov Chains for sparse sequential recommendation. CoRR. (2016). arxiv:1609.09152
He, Q., Jiang, D., Liao, Z., Hoi, S.C.H., Chang, K., Lim, E.-P., Li, H.: Web query recommendation via sequential query prediction. In: ICDE ’09, pp. 1443–1454 (2009)
He, J., Li, X., Liao, L., Song, D., Cheung, W.: Inferring a personalized next point-of-interest recommendation model with latent behavior patterns. In: AAAI ’16 (2016)
Hidasi, B., Karatzoglou, A.: Recurrent neural networks with top-k gains for session-based recommendations. CoRR. (2017). arxiv:1706.03847
Hidasi, B., Karatzoglou, A., Baltrunas, L., Tikk, D.: Session-based recommendations with recurrent neural networks. In: ICLR ’16 (2016a)
Hidasi, B., Quadrana, M., Karatzoglou, A., Tikk, D.: Parallel recurrent neural network architectures for feature-rich session-based recommendations. In: RecSys ’16, pp. 241–248 (2016b)
Hosseinzadeh Aghdam, M., Hariri, N., Mobasher, B., Burke, R.: Adapting recommendations to contextual changes using hierarchical hidden Markov models. In: RecSys ’15, pp. 241–244 (2015)
Jannach, D., Hegelich, K.: A case study on the effectiveness of recommendations in the mobile internet. In: RecSys ’09, pp. 205–208 (2009)
Jannach, D., Adomavicius, G.: Recommendations with a purpose. In: RecSys ’16, pp. 7–10 (2016)
Jannach, D., Ludewig, M.: When recurrent neural networks meet the neighborhood for session-based recommendation. In: RecSys ’17, pp. 306–310 (2017)
Jannach, D., Lerche, L., Jugovac, M.: Adaptation and evaluation of recommendations for short-term shopping goals. In: RecSys ’15, pp. 211–218 (2015a)
Jannach, D., Lerche, L., Kamehkhosh, I., Jugovac, M.: What recommenders recommend: an analysis of recommendation biases and possible countermeasures. User Model. User Adapt. Interact. 25(5), 427–491 (2015b)
Jannach, D., Kamehkhosh, I., Lerche, L.: Leveraging multi-dimensional user models for personalized next-track music recommendation. In: ACM SAC 2017 (2017a)
Jannach, D., Ludewig, M., Lerche, L.: Session-based item recommendation in e-commerce: on short-term intents, reminders, trends, and discounts. User Model. User Adapt. Interact. 27(3–5), 351–392 (2017b)
Jugovac, M., Jannach, D., Karimi, M.: StreamingRec: a framework for benchmarking stream-based news recommenders. In: RecSys 2018 (2018)
Kabbur, S., Ning, X., Karypis, G.: FISM: factored item similarity models for top-n recommender systems. In: KDD ’13, pp. 659–667 (2013)
Kamehkhosh, I., Jannach, D., Ludewig, M.: A comparison of frequent pattern techniques and a deep learning method for session-based recommendation. In: TempRec Workshop at ACM RecSys ’17, Como, Italy (2017)
Karimi, M., Jannach, D., Jugovac, M.: News recommender systems—survey and roads ahead. Inf. Process. Manag. 54(6), 1203–1227 (2018)
Kingma, D.P., Adam, J.B.: A method for stochastic optimization. CoRR (2014). arxiv:1412.6980
Lee, D., Hosanagar, K.: Impact of recommender systems on sales volume and diversity. In: ICIS 2014 (2014)
Lerche, L., Jannach, D., Ludewig, M.: On the value of reminders within e-commerce recommendations. In: UMAP ’16, pp. 27–35 (2016)
Li, Z., Zhao, H., Liu, Q., Huang, Z., Mei, T., Chen, E.: Learning from history and present: next-item recommendation via discriminatively exploiting user behaviors. In: KDD 2018 (2018)
Lian, D., Zheng, V.W., Xie, X.: Collaborative filtering meets next check-in location prediction. In: WWW ’13, pp. 231–232 (2013)
Linden, G., Smith, B., York, J.: Amazon.com recommendations: item-to-item collaborative filtering. IEEE Internet Comput. 7(1), 76–80 (2003)
Liu, J., Dolan, P., Pedersen, E.R.: Personalized news recommendation based on click behavior. In: IUI ’10, pp. 31–40 (2010)
Liu, Y., Liu, C., Liu, B., Qu, M., Xiong, H.: Unified point-of-interest recommendation with temporal interval assessment. In: KDD ’16, pp. 1015–1024 (2016)
Liu, Q., Zeng, Y., Mokhosi, R., Zhang, H.: STAMP: short-term attention/memory priority model for session-based recommendation. In: KDD 2018 (2018)
Ludmann, C.A.: Recommending news articles in the CLEF news recommendation evaluation lab with the data stream management system odysseus. In: Working Notes of CLEF 2017—Conference and Labs of the Evaluation (2017)
McFee, B., Lanckriet, G.: The natural language of playlists. In: ISMIR ’11, pp. 537–541 (2011)
McFee, B., Lanckriet, G.R.G.: Hypergraph models of playlist dialects. In: ISMIR ’12, pp. 343–348 (2012)
Mobasher, B., Dai, H., Luo, T., Nakagawa, M.: Using sequential and non-sequential patterns in predictive web usage mining tasks. In: ICDM ’02, pp. 669–672 (2002)
Moling, O., Baltrunas, L., Ricci, F.: Optimal radio channel recommendations with explicit and implicit feedback. In: RecSys ’12, pp. 75–82 (2012)
Natarajan, N., Shin, D., Dhillon, I.S.: Which app will you use next? Collaborative filtering with interactional context. In: RecSys ’13, pp. 201–208 (2013)
Norris, J.R.: Markov Chains. Cambridge University Press, Cambridge (1997)
Quadrana, M., Karatzoglou, A., Hidasi, B., Cremonesi, P.: Personalizing session-based recommendations with hierarchical recurrent neural networks. In: RecSys ’17 (2017)
Quadrana, M., Cremonesi, P., Jannach, D.: Sequence-aware recommender systems. ACM Comput. Surv. 54, 1–36 (2018)
Reddy, S., Labutov, I., Joachims, T.: Learning student and content embeddings for personalized lesson sequence recommendation. In: ACM Learning @ Scale ’16, pp. 93–96 (2016)
Rendle, S., Freudenthaler, C., Gantner, Z., Schmidt-Thieme, L.: BPR: Bayesian personalized ranking from implicit feedback. In: UAI ’09, pp. 452–461 (2009)
Rendle, S., Freudenthaler, C., Schmidt-Thieme, L.: Factorizing personalized Markov Chains for next-basket recommendation. In: WWW ’10, pp. 811–820 (2010)
Shani, G., Heckerman, D., Brafman, R.I.: An MDP-based recommender system. J. Mach. Learn. Res. 6, 1265–1295 (2005)
Soh, H., Sanner, S., White, M., Jamieson, G.: Deep sequential recommendation for personalized adaptive user interfaces. In: IUI ’17, pp. 589–593 (2017)
Song, Q., Cheng, J., Yuan, T., Lu, H.: Personalized recommendation meets your next favorite. In: CIKM ’15, pp. 1775–1778 (2015)
Song, Y., Elkahky, A.M., He, X.: Multi-rate deep learning for temporal recommendation. In: SIGIR ’16, pp. 909–912 (2016)
Sordoni, A., Bengio, Y., Vahabi, H., Lioma, C., Grue Simonsen, J., Nie, J.-Y.: A hierarchical recurrent encoder-decoder for generative context-aware query suggestion. In: CIKM ’15, pp. 553–562 (2015)
Tagami, Y., Kobayashi, H., Ono, S., Tajima, A.: Modeling user activities on the web using paragraph vector. In: WWW ’15, pp. 125–126 (2015)
Tan, Y.K., Xu, X., Liu, Y.: Improved recurrent neural networks for session-based recommendations. In: DLRS ’16 Workshop at ACM RecSys, pp. 17–22 (2016)
Tavakol, M., Brefeld, U.: Factored MDPs for detecting topics of user sessions. In: RecSys ’14, pp. 33–40 (2014)
Turrin, R., Quadrana, M., Condorelli, A., Pagano, R., Cremonesi, P.: 30music listening and playlists dataset. In: Poster Proceedings of RecSys ’15 (2015)
Twardowski, B.: Modelling contextual information in session-aware recommender systems with neural networks. In: RecSys ’16, pp. 273–276 (2016)
Vasile, F., Smirnova, E., Conneau, A.: Meta-prod2vec: product embeddings using side-information for recommendation. In: RecSys ’16, pp. 225–232 (2016)
Verstrepen, K., Goethals, B.: Unifying nearest neighbors collaborative filtering. In: RecSys ’14, pp. 177–184 (2014)
Wu, X., Liu, Q., Chen, E., He, L., Lv, J., Cao, C., Hu, G.: Personalized next-song recommendation in online karaokes. In: RecSys ’13, pp. 137–140 (2013)
Yap, G.-E., Li, X.-L., Yu, P.S.: Effective next-items recommendation via personalized sequential pattern mining. In: DASFAA ’12, Volume Part II, pp. 48–64 (2012)
Yu, F., Liu, Q., Wu, S., Wang, L., Tan, T.: A dynamic recurrent model for next basket recommendation. In: SIGIR ’16, pp. 729–732 (2016)
Zangerle, E., Pichl, M., Gassler, W., Specht, G.: #nowplaying music dataset: extracting listening behavior from Twitter. In: WISMM ’14 Workshop at MM ’14, pp. 21–26 (2014)
Zeiler, M.D.: ADADELTA: an adaptive learning rate method. CoRR (2012). arxiv:1212.5701
Zhang, Y., Dai, H., Xu, C., Feng, J., Wang, T., Bian, J., Wang, B., Liu, T.-Y.: Sequential click prediction for sponsored search with recurrent neural networks. In: AAAI ’14, pp. 1369–1375 (2014)
Zheleva, E., Guiver, J., Mendes Rodrigues, E., Milić-Frayling, N.: Statistical models of music-listening sessions in social media. In: WWW ’10, pp. 1019–1028 (2010)
Author information
Authors and Affiliations
Corresponding author
Additional information
A preliminary comparison of sequential recommendation algorithms was presented in our own previous work in Jannach and Ludewig (2017), Kamehkhosh et al. (2017) and a pre-print version of this work is available at https://arxiv.org/abs/1803.09587.
Appendices
Rights and permissions
About this article
Cite this article
Ludewig, M., Jannach, D. Evaluation of session-based recommendation algorithms. User Model User-Adap Inter 28, 331–390 (2018). https://doi.org/10.1007/s11257-018-9209-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11257-018-9209-6