A Sparse Probabilistic Model of User Preference Data

Smith, Matthew; Charlin, Laurent; Pineau, Joelle

doi:10.1007/978-3-319-57351-9_36

A Sparse Probabilistic Model of User Preference Data

Matthew Smith¹⁵,
Laurent Charlin¹⁶ &
Joelle Pineau¹⁵

Conference paper
First Online: 11 April 2017

1817 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10233))

Abstract

Modern recommender systems rely on user preference data to understand, analyze and provide items of interest to users. However, for some domains, collecting and sharing such data can be problematic: it may be expensive to gather data from several users, or it may be undesirable to share real user data for privacy reasons. We therefore propose a new model for generating realistic preference data. Our Sparse Probabilistic User Preference (SPUP) model produces synthetic data by sparsifying an initially dense user preference matrix generated by a standard matrix factorization model. The model incorporates aggregate statistics of the original data, such as user activity level and item popularity, as well as their interaction, to produce realistic data. We show empirically that our model can reproduce real-world datasets from different domains to a high degree of fidelity according to several measures. Our model can be used by both researchers and practitioners to generate new datasets or to extend existing ones, enabling the sound testing of new models and providing an improved form of bootstrapping in cases where limited data is available.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The idea of using the combination of user budgets and item popularity has also been exploited for sampling preference matrices in the context of stochastic variational inference [17].
2.
Recent work has proposed the use of Poisson-observation matrix factorization models [18]. Using such models would alleviate the need for this discretization step but this is largely independent of our proposed approach.

References

Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. J. Comput. 42, 30–37 (2009)
Article Google Scholar
Maxwell Harper, F., Konstan, J.A.: The movielens datasets: history and context. ACM Trans. Interact. Intell. Syst. 5(4) (2015). Article no. 19
Google Scholar
ICAPS. Ipc. http://www.icaps-conference.org/index.php/Main/Competitions
Cassandra, T.: POMDP file repository. http://www.pomdp.org/examples/
RL-GLUE. Reinforcement learning glue. http://glue.rl-community.org/
Cointet, J.P., Roth, C.: How realistic should knowledge diffusion models be. J. Artif. Soc. Soc. Simul. 10(3), 1–11 (2007)
Google Scholar
Leskovec, J.: Dynamics of large networks. Ph.D. thesis, Carnegie Mellon University (2008)
Google Scholar
Rubin, D.B.: Discussion statistical disclosure limitation. JOS 9(2), 461–468 (1993)
Google Scholar
Salakhutdinov, R., Mnih, A.: Probabilistic matrix factorization. In: NIPS, pp. 1257–1264 (2008)
Google Scholar
Pasinato, M., Mello, C.E., Aufaure, M.A., Zimbro, G.: Generating synthetic data for context-aware recommender systems. In: BRICS-CCI CBIC 2013
Google Scholar
Tso, K.H.L., Schmidt-Thieme, L.: Empirical analysis of attribute-aware recommender system algorithms using synthetic data. J. Comput. 1(4), 18–29 (2006)
Article Google Scholar
Caron, F., Fox, E.B.: Sparse graphs using exchangeable random measures. ArXiv e-prints, January 2014
Google Scholar
Newman, M.E.J., Strogatz, S.H., Watts, D.J.: Random graphs with arbitrary degree distributions and their applications. Phys. Rev. E 64(2), 026118 (2001)
Article Google Scholar
Hu, Y., Koren, Y., Volinsky, C.: Collaborative filtering for implicit feedback datasets. In: Data Mining, 2008, pp. 263–272. IEEE, ICDM 2008 (2008)
Google Scholar
Aldous, D.J.: Representations for partially exchangeable arrays of random variables. J. Multivar. Anal. 11(4), 581–598 (1981)
Article MathSciNet MATH Google Scholar
Hoover, D.N.: Relations on probability spaces and arrays of random variables. Technical report, Institute for Advanced Study, Princeton, NJ (1979)
Google Scholar
Hernandez-Lobato, J.M., Houlsby, N., Ghahramani, Z.: Stochastic inference for scalable probabilistic modeling of binary matrices. In: ICML (2014)
Google Scholar
Gopalan, P., Hofman, J.M., Blei, D.M.: Scalable recommendation with hierarchical Poisson factorization. In: UAI (2015)
Google Scholar
Bertin-Mahieux, T., Ellis, D.P.W., Whitman, B., Lamere, P.: The million song dataset. In: Proceedings of 12th ISMIR (2011)
Google Scholar
Tang, J., Gao, H., Liu, H.: eTrust: discerning multi-faceted trust in a connected world. In: ACM International Conference on Web Search and Data Mining (2012)
Google Scholar
Tang, J., Gao, H., Liu, H., Das Sarma, A.: eTrust: Understanding trust evolution in an online world. In: Proceedings of the 18th ACM SIGKDD, pp. 253–261. ACM (2012)
Google Scholar
Ziegler, C.-N., McNee, S.M., Konstan, J.A., Lausen, G.: Improving recommendation lists through topic diversification. In: Proceedings of WWW (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, McGill University, Montréal, Québec, Canada
Matthew Smith & Joelle Pineau
HEC Montréal, Montréal, Québec, Canada
Laurent Charlin

Authors

Matthew Smith
View author publications
You can also search for this author in PubMed Google Scholar
Laurent Charlin
View author publications
You can also search for this author in PubMed Google Scholar
Joelle Pineau
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthew Smith .

Editor information

Editors and Affiliations

University of Regina, Regina, Saskatchewan, Canada
Malek Mouhoub
University of Montreal, Montreal, Québec, Canada
Philippe Langlais

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Smith, M., Charlin, L., Pineau, J. (2017). A Sparse Probabilistic Model of User Preference Data. In: Mouhoub, M., Langlais, P. (eds) Advances in Artificial Intelligence. Canadian AI 2017. Lecture Notes in Computer Science(), vol 10233. Springer, Cham. https://doi.org/10.1007/978-3-319-57351-9_36

Download citation

DOI: https://doi.org/10.1007/978-3-319-57351-9_36
Published: 11 April 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57350-2
Online ISBN: 978-3-319-57351-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics