research-article

Collaboration and Transition: Distilling Item Transitions into Multi-Query Self-Attention for Sequential Recommendation

Authors:
Tianyu Zhu

Beihang University, Beijing, China

Beihang University, Beijing, China

0000-0001-7716-938X
View Profile

,
Yansong Shi

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China

0000-0002-4040-8831
View Profile

,
Yuan Zhang

Kuaishou Technology, Beijing, China

Kuaishou Technology, Beijing, China

0000-0002-7849-208X
View Profile

,
Yihong Wu

Université de Montréal, Montreal, Canada

Université de Montréal, Montreal, Canada

0009-0009-2680-4107
View Profile

,
Fengran Mo

Université de Montréal, Montreal, Canada

Université de Montréal, Montreal, Canada

0000-0002-0838-6994
View Profile

,
Jian-Yun Nie

Université de Montréal, Montreal, Canada

Université de Montréal, Montreal, Canada

0000-0003-1556-3335
View Profile

WSDM '24: Proceedings of the 17th ACM International Conference on Web Search and Data MiningMarch 2024Pages 1003–1011https://doi.org/10.1145/3616855.3635787

Published:04 March 2024Publication History

WSDM '24: Proceedings of the 17th ACM International Conference on Web Search and Data Mining

Pages 1003–1011

ABSTRACT

Modern recommender systems employ various sequential modules such as self-attention to learn dynamic user interests. However, these methods are less effective in capturing collaborative and transitional signals within user interaction sequences. First, the self-attention architecture uses the embedding of a single item as the attention query, making it challenging to capture collaborative signals. Second, these methods typically follow an auto-regressive framework, which is unable to learn global item transition patterns. To overcome these limitations, we propose a new method called Multi-Query Self-Attention with Transition-Aware Embedding Distillation (MQSA-TED). First, we propose an L-query self-attention module that employs flexible window sizes for attention queries to capture collaborative signals. In addition, we introduce a multi-query self-attention method that balances the bias-variance trade-off in modeling user preferences by combining long and short-query self-attentions. Second, we develop a transition-aware embedding distillation module that distills global item-to-item transition patterns into item embeddings, which enables the model to memorize and leverage transitional signals and serves as a calibrator for collaborative signals. Experimental results on four real-world datasets demonstrate the effectiveness of the proposed modules.

References

Xu Chen, Hongteng Xu, Yongfeng Zhang, Jiaxi Tang, Yixin Cao, Zheng Qin, and Hongyuan Zha. 2018. Sequential recommendation with user memory networks. In Proceedings of the eleventh ACM international conference on web search and data mining. 108--116.Google ScholarDigital Library
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
Chongming Gao, Shijun Li, Yuan Zhang, Jiawei Chen, Biao Li, Wenqiang Lei, Peng Jiang, and Xiangnan He. 2022. KuaiRand: An Unbiased Sequential Recommendation Dataset with Randomly Exposed Videos. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 3953--3957.Google ScholarDigital Library
Ruining He, Wang-Cheng Kang, and Julian McAuley. 2017. Translation-based recommendation. In Proceedings of the eleventh ACM conference on recommender systems. 161--169.Google ScholarDigital Library
Ruining He and Julian McAuley. 2016a. Fusing similarity models with markov chains for sparse sequential recommendation. In 2016 IEEE 16th international conference on data mining (ICDM). IEEE, 191--200.Google ScholarCross Ref
Ruining He and Julian McAuley. 2016b. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In Proceedings of the 25th international conference on world wide web. 507--517.Google ScholarDigital Library
Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. 2020. Lightgcn: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 639--648.Google ScholarDigital Library
Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2015. Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939 (2015).Google Scholar
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).Google Scholar
Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recommendation. In 2018 IEEE international conference on data mining (ICDM). IEEE, 197--206.Google ScholarCross Ref
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).Google Scholar
Walid Krichene and Steffen Rendle. 2022. On sampled metrics for item recommendation. Commun. ACM, Vol. 65, 7 (2022), 75--83.Google ScholarDigital Library
Jae-woong Lee, Minjin Choi, Jongwuk Lee, and Hyunjung Shim. 2019. Collaborative distillation for top-N recommendation. In 2019 IEEE International Conference on Data Mining (ICDM). IEEE, 369--378.Google Scholar
Fangyu Li, Shenbao Yu, Feng Zeng, and Fang Yang. 2023. Effective and Efficient Training for Sequential Recommendation Using Cumulative Cross-Entropy Loss. arXiv preprint arXiv:2301.00979 (2023).Google Scholar
Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, and Jun Ma. 2017. Neural attentive session-based recommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 1419--1428.Google ScholarDigital Library
Jiacheng Li, Yujie Wang, and Julian McAuley. 2020. Time interval aware self-attention for sequential recommendation. In Proceedings of the 13th ACM international conference on web search and data mining. 322--330.Google ScholarDigital Library
Yang Li, Tong Chen, Peng-Fei Zhang, and Hongzhi Yin. 2021. Lightweight self-attentive sequential recommendation. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 967--977.Google ScholarDigital Library
Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton Van Den Hengel. 2015. Image-based recommendations on styles and substitutes. In Proceedings of the 38th ACM SIGIR international conference on research and development in information retrieval. 43--52.Google ScholarDigital Library
Ruihong Qiu, Zi Huang, Hongzhi Yin, and Zijian Wang. 2022. Contrastive learning for representation degeneration problem in sequential recommendation. In Proceedings of the fifteenth ACM international conference on web search and data mining. 813--823.Google ScholarDigital Library
Ruiyang Ren, Zhaoyang Liu, Yaliang Li, Wayne Xin Zhao, Hui Wang, Bolin Ding, and Ji-Rong Wen. 2020. Sequential recommendation with self-attentive multi-adversarial network. In Proceedings of the 43rd ACM SIGIR international conference on research and development in information retrieval. 89--98.Google ScholarDigital Library
Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Factorizing personalized markov chains for next-basket recommendation. In Proceedings of the 19th international conference on world wide web. 811--820.Google ScholarDigital Library
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, Vol. 15, 1 (2014), 1929--1958.Google Scholar
Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang. 2019. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM international conference on information and knowledge management. 1441--1450.Google ScholarDigital Library
Jiaxi Tang and Ke Wang. 2018a. Personalized top-n sequential recommendation via convolutional sequence embedding. In Proceedings of the eleventh ACM international conference on web search and data mining. 565--573.Google ScholarDigital Library
Jiaxi Tang and Ke Wang. 2018b. Ranking distillation: Learning compact ranking models with high performance for recommender system. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 2289--2298.Google ScholarDigital Library
Ye Tao, Ying Li, Su Zhang, Zhirong Hou, and Zhonghai Wu. 2022. Revisiting Graph based Social Recommendation: A Distillation Enhanced Social Graph Network. In Proceedings of the ACM Web Conference 2022. 2830--2838.Google ScholarDigital Library
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems , Vol. 30 (2017).Google Scholar
Chenyang Wang, Yuanqing Yu, Weizhi Ma, Min Zhang, Chong Chen, Yiqun Liu, and Shaoping Ma. 2022. Towards Representation Alignment and Uniformity in Collaborative Filtering. In Proceedings of the 28th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1816--1825.Google ScholarDigital Library
Chenyang Wang, Min Zhang, Weizhi Ma, Yiqun Liu, and Shaoping Ma. 2020. Make it a chorus: knowledge-and time-aware item modeling for sequential recommendation. In Proceedings of the 43rd ACM SIGIR International conference on research and development in Information Retrieval. 109--118.Google ScholarDigital Library
Xu Xie, Fei Sun, Zhaoyang Liu, Shiwen Wu, Jinyang Gao, Jiandong Zhang, Bolin Ding, and Bin Cui. 2022. Contrastive learning for sequential recommendation. In 2022 IEEE 38th international conference on data engineering (ICDE). IEEE, 1259--1273.Google ScholarCross Ref
Yuan Zhang, Fei Sun, Xiaoyong Yang, Chen Xu, Wenwu Ou, and Yan Zhang. 2020a. Graph-based regularization on embedding layers for recommendation. ACM Transactions on Information Systems (TOIS), Vol. 39, 1 (2020), 1--27.Google ScholarDigital Library
Yuan Zhang, Xiaoran Xu, Hanning Zhou, and Yan Zhang. 2020b. Distilling structured knowledge into embeddings for explainable and accurate recommendation. In Proceedings of the 13th ACM international conference on web search and data mining. 735--743.Google ScholarDigital Library
Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 1059--1068.Google ScholarDigital Library
Kun Zhou, Hui Wang, Wayne Xin Zhao, Yutao Zhu, Sirui Wang, Fuzheng Zhang, Zhongyuan Wang, and Ji-Rong Wen. 2020. S3-rec: Self-supervised learning for sequential recommendation with mutual information maximization. In Proceedings of the 29th ACM international conference on information & knowledge management. 1893--1902.Google ScholarDigital Library
Kun Zhou, Hui Yu, Wayne Xin Zhao, and Ji-Rong Wen. 2022. Filter-enhanced MLP is all you need for sequential recommendation. In Proceedings of the ACM Web Conference 2022. 2388--2399.Google ScholarDigital Library
Tianyu Zhu, Guannan Liu, and Guoqing Chen. 2020. Social collaborative mutual learning for item recommendation. ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 14, 4 (2020), 1--19.Google Scholar
Tianyu Zhu, Leilei Sun, and Guoqing Chen. 2021. Graph-based embedding smoothing for sequential recommendation. IEEE Transactions on Knowledge and Data Engineering, Vol. 35, 1 (2021), 496--508. ioGoogle Scholar

Index Terms

Collaboration and Transition: Distilling Item Transitions into Multi-Query Self-Attention for Sequential Recommendation
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Recommender systems

Recommendations

Sequential Recommendation via Stochastic Self-Attention
WWW '22: Proceedings of the ACM Web Conference 2022

Sequential recommendation models the dynamics of a user’s previous behaviors in order to forecast the next item, and has drawn a lot of attention. Transformer-based approaches, which embed items as vectors and use dot-product self-attention to measure ...
Read More
Item trend learning for sequential recommendation system using gated graph neural network
Abstract
Recommendation system, or recommender system, is widely used in online Web applications like e-commerce Web sites and movie review Web sites. Sequential recommender put more emphasis upon user’s short-term preference through exploiting information ...
Read More
Attention Mechanism Indicating Item Novelty for Sequential Recommendation
ASONAM '22: Proceedings of the 2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

Most sequential recommendation systems, including those that employ a variety of features and state-of-the-art network models, tend to favor items that are the most popular or of greatest relevance to the historic behavior of the user. Recommendations ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WSDM '24: Proceedings of the 17th ACM International Conference on Web Search and Data Mining
March 2024
1246 pages
ISBN:9798400703713
DOI:10.1145/3616855
General Chairs:
Luz Angélica
Caudillo Mata (MDA Geointelligence)
,
Silvio Lattanzi
Google Research
,
Andrés Muñoz Medina
Google Research
,
Program Chairs:
Leman Akoglu
CMU
,
Aristides Gionis
KTH
,
Sergei Vassilvitskii
Google Research
Copyright © 2024 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 March 2024
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
knowledge distillation
self-attention
sequential recommendation
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate498of2,863submissions,17%
Upcoming Conference
WSDM '25

Sponsor:

sigir

sigir

sigir

sigir

The Eighteenth ACM International Conference on Web Search and Data Mining

April 7 - 11, 2025

Hannover , Germany
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 106
  Total Downloads
- Downloads (Last 12 months)106
- Downloads (Last 6 weeks)63
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Collaboration and Transition: Distilling Item Transitions into Multi-Query Self-Attention for Sequential Recommendation

WSDM '24: Proceedings of the 17th ACM International Conference on Web Search and Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Sequential Recommendation via Stochastic Self-Attention

Item trend learning for sequential recommendation system using gated graph neural network

Attention Mechanism Indicating Item Novelty for Sequential Recommendation