skip to main content
10.1145/3498366.3505791acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
demonstration

LFM-2b: A Dataset of Enriched Music Listening Events for Recommender Systems Research and Fairness Analysis

Authors Info & Claims
Published:14 March 2022Publication History

ABSTRACT

We present the LFM-2b dataset containing the listening records of over 120,000 users of the music platform Last.fm. These users provide a total of more than two billion individual listening events that span a time range of over 15 years, from February 2005 until March 2020. These listening events refer to a total of 50 million distinct tracks of 5 million distinct artists. Beside the common metadata (i. e., artist and track name), LFM-2b contains additional information both regarding the users and items. This includes the demographic information of users, namely country, gender, and age, and the fine-grained genre and style of items together with the vector embeddings of their lyrics.

LFM-2b is a rich dataset that enables research on a variety of recommender system algorithms, such as the ones based on collaborative filtering (e.g., leveraging the user–item interactions in the form of listening events), but also content-based approaches (e.g., exploiting genres and lyrics), or hybrid combinations thereof. Users’ demographic information furthermore enable experimentation on identifying and mitigating various data and algorithmic biases of recommender systems, and investigating fairness aspects of such systems, e.g., according to gender.

References

  1. Thierry Bertin-Mahieux, Daniel P. W. Ellis, Brian Whitman, and Paul Lamere. 2011. The Million Song Dataset. In Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR 2011, Miami, Florida, USA, October 24-28, 2011, Anssi Klapuriand Colby Leider (Eds.). University of Miami, 591–596. http://ismir2011.ismir.net/papers/OS6-1.pdfGoogle ScholarGoogle Scholar
  2. Brian Brost, Rishabh Mehrotra, and Tristan Jehan. 2019. The Music Streaming Sessions Dataset. In The World Wide Web Conference, WWW 2019, San Francisco, CA, USA, May 13-17, 2019, Ling Liu, Ryen W. White, Amin Mantrach, Fabrizio Silvestri, Julian J. McAuley, Ricardo Baeza-Yates, and Leila Zia (Eds.). ACM, 2594–2600. https://doi.org/10.1145/3308558.3313641Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Òscar Celma. 2010. Music Recommendation and Discovery - The Long Tail, Long Fail, and Long Play in the Digital Music Space. Springer, Berlin, Germany. https://doi.org/10.1007/978-3-642-13287-2Google ScholarGoogle Scholar
  4. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171–4186.Google ScholarGoogle Scholar
  5. Gideon Dror, Noam Koenigstein, Yehuda Koren, and Markus Weimer. 2011. The Yahoo! Music Dataset and KDD-Cup’11. In Proceedings of the 2011 International Conference on KDD Cup 2011 - Volume 18(KDDCUP’11). JMLR.org, 3–18.Google ScholarGoogle Scholar
  6. David Hauger, Markus Schedl, Andrej Košir, and Marko Tkalčič. 2013. The million musical tweet dataset: what we can learn from microblogs. In Proceedings of the International Society for Music Information Retrieval Conference. Curitiva, Brazil, 189–194.Google ScholarGoogle Scholar
  7. Dominik Kowald, Peter Muellner, Eva Zangerle, Christine Bauer, Markus Schedl, and Elisabeth Lex. 2021. Support the underground: characteristics of beyond-mainstream music listeners. EPJ Data Science 10, 1 (2021), 1–26.Google ScholarGoogle ScholarCross RefCross Ref
  8. Oleg Lesota, Alessandro B. Melchiorre, Navid Rekabsaz, Stefan Brandl, Dominik Kowald, Elisabeth Lex, and Markus Schedl. 2021. Analyzing Item Popularity Bias of Music Recommender Systems: Are Different Genders Equally Affected?. In RecSys ’21: Fifteenth ACM Conference on Recommender Systems, Amsterdam, The Netherlands, 27 September 2021 - 1 October 2021, Humberto Jesús Corona Pampín, Martha A. Larson, Martijn C. Willemsen, Joseph A. Konstan, Julian J. McAuley, Jean Garcia-Gathright, Bouke Huurnink, and Even Oldridge (Eds.). ACM, 601–606. https://doi.org/10.1145/3460231.3478843Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Brian McFee and Gert RG Lanckriet. 2012. Hypergraph Models of Playlist Dialects. In Proceedings of the International Society for Music Information Retrieval Conference. ISMIR, Porto, Portugal, 343–348.Google ScholarGoogle Scholar
  10. Alessandro B. Melchiorre, Navid Rekabsaz, Emilia Parada-Cabaleiro, Stefan Brandl, Oleg Lesota, and Markus Schedl. 2021. Investigating gender fairness of recommendation algorithms in the music domain. Inf. Process. Manag. 58, 5 (2021), 102666. https://doi.org/10.1016/j.ipm.2021.102666Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A Poddar, E Zangerle, and Y Yang. 2018. nowplaying-RS: a new benchmark dataset for building context-aware music recommender systems. In Proceedings of the 15th Sound and Music Computing Conference.Google ScholarGoogle Scholar
  12. Markus Schedl. 2016. The LFM-1b Dataset for Music Retrieval and Recommendation. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval (New York, New York, USA) (ICMR ’16). Association for Computing Machinery, New York, NY, USA, 103–110. https://doi.org/10.1145/2911996.2912004Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Markus Schedl, Eelco Wiechert, and Christine Bauer. 2018. The Effects of Real-world Events on Music Listening Behavior: An Intervention Time Series Analysis. In Companion of the The Web Conference 2018 on The Web Conference 2018, WWW 2018, Lyon, France, April 23-27, 2018, Pierre-Antoine Champin, Fabien Gandon, Mounia Lalmas, and Panagiotis G. Ipeirotis (Eds.). ACM, 75–76. https://doi.org/10.1145/3184558.3186936Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Gabriel Vigliensoni and Ichiro Fujinaga. 2017. The Music Listening Histories Dataset. In Proceedings of the International Society for Music Information Retrieval Conference. ISMIR, Suzhou, China, 96–102.Google ScholarGoogle Scholar
  15. Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, 2019. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771(2019).Google ScholarGoogle Scholar
  16. Hamed Zamani, Markus Schedl, Paul Lamere, and Ching-Wei Chen. 2019. An Analysis of Approaches Taken in the ACM RecSys Challenge 2018 for Automatic Music Playlist Continuation. ACM Trans. Intell. Syst. Technol. 10, 5 (2019), 57:1–57:21. https://doi.org/10.1145/3344257Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. LFM-2b: A Dataset of Enriched Music Listening Events for Recommender Systems Research and Fairness Analysis
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            CHIIR '22: Proceedings of the 2022 Conference on Human Information Interaction and Retrieval
            March 2022
            399 pages
            ISBN:9781450391863
            DOI:10.1145/3498366

            Copyright © 2022 Owner/Author

            Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 14 March 2022

            Check for updates

            Qualifiers

            • demonstration
            • Research
            • Refereed limited

            Acceptance Rates

            Overall Acceptance Rate55of163submissions,34%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format