ABSTRACT
We present the LFM-2b dataset containing the listening records of over 120,000 users of the music platform Last.fm. These users provide a total of more than two billion individual listening events that span a time range of over 15 years, from February 2005 until March 2020. These listening events refer to a total of 50 million distinct tracks of 5 million distinct artists. Beside the common metadata (i. e., artist and track name), LFM-2b contains additional information both regarding the users and items. This includes the demographic information of users, namely country, gender, and age, and the fine-grained genre and style of items together with the vector embeddings of their lyrics.
LFM-2b is a rich dataset that enables research on a variety of recommender system algorithms, such as the ones based on collaborative filtering (e.g., leveraging the user–item interactions in the form of listening events), but also content-based approaches (e.g., exploiting genres and lyrics), or hybrid combinations thereof. Users’ demographic information furthermore enable experimentation on identifying and mitigating various data and algorithmic biases of recommender systems, and investigating fairness aspects of such systems, e.g., according to gender.
- Thierry Bertin-Mahieux, Daniel P. W. Ellis, Brian Whitman, and Paul Lamere. 2011. The Million Song Dataset. In Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR 2011, Miami, Florida, USA, October 24-28, 2011, Anssi Klapuriand Colby Leider (Eds.). University of Miami, 591–596. http://ismir2011.ismir.net/papers/OS6-1.pdfGoogle Scholar
- Brian Brost, Rishabh Mehrotra, and Tristan Jehan. 2019. The Music Streaming Sessions Dataset. In The World Wide Web Conference, WWW 2019, San Francisco, CA, USA, May 13-17, 2019, Ling Liu, Ryen W. White, Amin Mantrach, Fabrizio Silvestri, Julian J. McAuley, Ricardo Baeza-Yates, and Leila Zia (Eds.). ACM, 2594–2600. https://doi.org/10.1145/3308558.3313641Google ScholarDigital Library
- Òscar Celma. 2010. Music Recommendation and Discovery - The Long Tail, Long Fail, and Long Play in the Digital Music Space. Springer, Berlin, Germany. https://doi.org/10.1007/978-3-642-13287-2Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171–4186.Google Scholar
- Gideon Dror, Noam Koenigstein, Yehuda Koren, and Markus Weimer. 2011. The Yahoo! Music Dataset and KDD-Cup’11. In Proceedings of the 2011 International Conference on KDD Cup 2011 - Volume 18(KDDCUP’11). JMLR.org, 3–18.Google Scholar
- David Hauger, Markus Schedl, Andrej Košir, and Marko Tkalčič. 2013. The million musical tweet dataset: what we can learn from microblogs. In Proceedings of the International Society for Music Information Retrieval Conference. Curitiva, Brazil, 189–194.Google Scholar
- Dominik Kowald, Peter Muellner, Eva Zangerle, Christine Bauer, Markus Schedl, and Elisabeth Lex. 2021. Support the underground: characteristics of beyond-mainstream music listeners. EPJ Data Science 10, 1 (2021), 1–26.Google ScholarCross Ref
- Oleg Lesota, Alessandro B. Melchiorre, Navid Rekabsaz, Stefan Brandl, Dominik Kowald, Elisabeth Lex, and Markus Schedl. 2021. Analyzing Item Popularity Bias of Music Recommender Systems: Are Different Genders Equally Affected?. In RecSys ’21: Fifteenth ACM Conference on Recommender Systems, Amsterdam, The Netherlands, 27 September 2021 - 1 October 2021, Humberto Jesús Corona Pampín, Martha A. Larson, Martijn C. Willemsen, Joseph A. Konstan, Julian J. McAuley, Jean Garcia-Gathright, Bouke Huurnink, and Even Oldridge (Eds.). ACM, 601–606. https://doi.org/10.1145/3460231.3478843Google ScholarDigital Library
- Brian McFee and Gert RG Lanckriet. 2012. Hypergraph Models of Playlist Dialects. In Proceedings of the International Society for Music Information Retrieval Conference. ISMIR, Porto, Portugal, 343–348.Google Scholar
- Alessandro B. Melchiorre, Navid Rekabsaz, Emilia Parada-Cabaleiro, Stefan Brandl, Oleg Lesota, and Markus Schedl. 2021. Investigating gender fairness of recommendation algorithms in the music domain. Inf. Process. Manag. 58, 5 (2021), 102666. https://doi.org/10.1016/j.ipm.2021.102666Google ScholarDigital Library
- A Poddar, E Zangerle, and Y Yang. 2018. nowplaying-RS: a new benchmark dataset for building context-aware music recommender systems. In Proceedings of the 15th Sound and Music Computing Conference.Google Scholar
- Markus Schedl. 2016. The LFM-1b Dataset for Music Retrieval and Recommendation. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval (New York, New York, USA) (ICMR ’16). Association for Computing Machinery, New York, NY, USA, 103–110. https://doi.org/10.1145/2911996.2912004Google ScholarDigital Library
- Markus Schedl, Eelco Wiechert, and Christine Bauer. 2018. The Effects of Real-world Events on Music Listening Behavior: An Intervention Time Series Analysis. In Companion of the The Web Conference 2018 on The Web Conference 2018, WWW 2018, Lyon, France, April 23-27, 2018, Pierre-Antoine Champin, Fabien Gandon, Mounia Lalmas, and Panagiotis G. Ipeirotis (Eds.). ACM, 75–76. https://doi.org/10.1145/3184558.3186936Google ScholarDigital Library
- Gabriel Vigliensoni and Ichiro Fujinaga. 2017. The Music Listening Histories Dataset. In Proceedings of the International Society for Music Information Retrieval Conference. ISMIR, Suzhou, China, 96–102.Google Scholar
- Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, 2019. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771(2019).Google Scholar
- Hamed Zamani, Markus Schedl, Paul Lamere, and Ching-Wei Chen. 2019. An Analysis of Approaches Taken in the ACM RecSys Challenge 2018 for Automatic Music Playlist Continuation. ACM Trans. Intell. Syst. Technol. 10, 5 (2019), 57:1–57:21. https://doi.org/10.1145/3344257Google ScholarDigital Library
Index Terms
- LFM-2b: A Dataset of Enriched Music Listening Events for Recommender Systems Research and Fairness Analysis
Recommendations
The LFM-1b Dataset for Music Retrieval and Recommendation
ICMR '16: Proceedings of the 2016 ACM on International Conference on Multimedia RetrievalWe present the LFM-1b dataset of more than one billion music listening events created by more than 120,000 users of Last.fm. Each listening event is characterized by artist, album, and track name, and further includes a timestamp. On the (anonymous) ...
Tailoring Music Recommendations to Users by Considering Diversity, Mainstreaminess, and Novelty
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information RetrievalA shortcoming of current approaches for music recommendation is that they consider user-specific characteristics only on a very simple level, typically as some kind of interaction between users and items when employing collaborative filtering. To ...
Improving Context-Aware Music Recommender Systems: Beyond the Pre-filtering Approach
ICMR '17: Proceedings of the 2017 ACM on International Conference on Multimedia RetrievalOver the last years, music consumption has changed fundamentally: people switch from private, mostly limited music collections to huge public music collections provided by music streaming platforms. Thus, the amount of available music has increased ...
Comments