ABSTRACT
In this work, we introduce Ducho 2.0, the latest stable version of our framework. Differently from Ducho, Ducho 2.0 offers a more personalized user experience with the definition and import of custom extraction models fine-tuned on specific tasks and datasets. Moreover, the new version is capable of extracting and processing features through multimodal-by-design large models. Notably, all these new features are supported by optimized data loading and storing to the local memory. To showcase the capabilities of Ducho 2.0, we demonstrate a complete multimodal recommendation pipeline, from the extraction/processing to the final recommendation. The idea is to provide practitioners and experienced scholars with a ready-to-use tool that, put on top of any multimodal recommendation framework, may permit them to run extensive benchmarking analyses. All materials are accessible at: https://github.com/sisinflab/Ducho/
Supplemental Material
- Vito Walter Anelli, Alejandro Bellog'i n, Antonio Ferrara, Daniele Malitesta, Felice Antonio Merra, Claudio Pomo, Francesco Maria Donini, and Tommaso Di Noia. 2021. Elliot: A Comprehensive and Rigorous Framework for Reproducible Recommender Systems Evaluation. In SIGIR. ACM, 2405--2414.Google Scholar
- Ruining He and Julian J. McAuley. 2016. VBPR: Visual Bayesian Personalized Ranking from Implicit Feedback. In AAAI. AAAI Press, 144--150.Google Scholar
- Xin Liu, Jiancheng Li, Jiaqi Wang, and Ziwei Liu. 2021. MMFashion: An Open-Source Toolbox for Visual Fashion Analysis. In ACM Multimedia. ACM, 3755--3758.Google Scholar
- Daniele Malitesta, Giandomenico Cornacchia, Claudio Pomo, Felice Antonio Merra, Tommaso Di Noia, and Eugenio Di Sciascio. 2023 a. Formalizing Multimedia Recommendation through Multimodal Deep Learning. CoRR , Vol. abs/2309.05273 (2023).Google Scholar
- Daniele Malitesta, Giuseppe Gassi, Claudio Pomo, and Tommaso Di Noia. 2023 b. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation. In ACM Multimedia. ACM, 9668--9671.Google Scholar
- Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. In ICML (Proceedings of Machine Learning Research, Vol. 139). PMLR, 8748--8763.Google Scholar
- Zixuan Yi, Zijun Long, Iadh Ounis, Craig Macdonald, and Richard McCreadie. 2023. Large Multi-modal Encoders for Recommendation. CoRR , Vol. abs/2310.20343 (2023).Google Scholar
- Jinghao Zhang, Yanqiao Zhu, Qiang Liu, Shu Wu, Shuhui Wang, and Liang Wang. 2021. Mining Latent Structures for Multimedia Recommendation. In ACM Multimedia. ACM, 3872--3880.Google Scholar
- Xin Zhou and Zhiqi Shen. 2023. A Tale of Two Graphs: Freezing and Denoising Graph Structures for Multimodal Recommendation. In ACM Multimedia. ACM, 935--943.Google Scholar
- Xin Zhou, Hongyu Zhou, Yong Liu, Zhiwei Zeng, Chunyan Miao, Pengwei Wang, Yuan You, and Feijun Jiang. 2023. Bootstrap Latent Representations for Multi-modal Recommendation. In WWW. ACM, 845--854.Google Scholar
Index Terms
- Ducho 2.0: Towards a More Up-to-Date Unified Framework for the Extraction of Multimodal Features in Recommendation
Recommendations
Multimodal Conditioned Diffusion Model for Recommendation
WWW '24: Companion Proceedings of the ACM on Web Conference 2024Multimodal recommendation aims at to modeling the feature distributions of items by using their multi-modal information. Prior efforts typically focus on the denoising of the user-item graph with a degree-sensitive strategy, which may not well-handle the ...
Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation
MM '23: Proceedings of the 31st ACM International Conference on MultimediaIn multimodal-aware recommendation, the extraction of meaningful multimodal features is at the basis of high-quality recommendations. Generally, each recommendation framework implements its multimodal extraction procedures with specific strategies and ...
Semantic-Guided Feature Distillation for Multimodal Recommendation
MM '23: Proceedings of the 31st ACM International Conference on MultimediaMultimodal recommendation exploits the rich multimodal information associated with users or items to enhance the representation learning for better performance. In these methods, end-to-end feature extractors (e.g., shallow/deep neural networks) are ...
Comments