Skip to main content

A Sparse Probabilistic Model of User Preference Data

  • Conference paper
  • First Online:
  • 1817 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10233))

Abstract

Modern recommender systems rely on user preference data to understand, analyze and provide items of interest to users. However, for some domains, collecting and sharing such data can be problematic: it may be expensive to gather data from several users, or it may be undesirable to share real user data for privacy reasons. We therefore propose a new model for generating realistic preference data. Our Sparse Probabilistic User Preference (SPUP) model produces synthetic data by sparsifying an initially dense user preference matrix generated by a standard matrix factorization model. The model incorporates aggregate statistics of the original data, such as user activity level and item popularity, as well as their interaction, to produce realistic data. We show empirically that our model can reproduce real-world datasets from different domains to a high degree of fidelity according to several measures. Our model can be used by both researchers and practitioners to generate new datasets or to extend existing ones, enabling the sound testing of new models and providing an improved form of bootstrapping in cases where limited data is available.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The idea of using the combination of user budgets and item popularity has also been exploited for sampling preference matrices in the context of stochastic variational inference [17].

  2. 2.

    Recent work has proposed the use of Poisson-observation matrix factorization models [18]. Using such models would alleviate the need for this discretization step but this is largely independent of our proposed approach.

References

  1. Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. J. Comput. 42, 30–37 (2009)

    Article  Google Scholar 

  2. Maxwell Harper, F., Konstan, J.A.: The movielens datasets: history and context. ACM Trans. Interact. Intell. Syst. 5(4) (2015). Article no. 19

    Google Scholar 

  3. ICAPS. Ipc. http://www.icaps-conference.org/index.php/Main/Competitions

  4. Cassandra, T.: POMDP file repository. http://www.pomdp.org/examples/

  5. RL-GLUE. Reinforcement learning glue. http://glue.rl-community.org/

  6. Cointet, J.P., Roth, C.: How realistic should knowledge diffusion models be. J. Artif. Soc. Soc. Simul. 10(3), 1–11 (2007)

    Google Scholar 

  7. Leskovec, J.: Dynamics of large networks. Ph.D. thesis, Carnegie Mellon University (2008)

    Google Scholar 

  8. Rubin, D.B.: Discussion statistical disclosure limitation. JOS 9(2), 461–468 (1993)

    Google Scholar 

  9. Salakhutdinov, R., Mnih, A.: Probabilistic matrix factorization. In: NIPS, pp. 1257–1264 (2008)

    Google Scholar 

  10. Pasinato, M., Mello, C.E., Aufaure, M.A., Zimbro, G.: Generating synthetic data for context-aware recommender systems. In: BRICS-CCI CBIC 2013

    Google Scholar 

  11. Tso, K.H.L., Schmidt-Thieme, L.: Empirical analysis of attribute-aware recommender system algorithms using synthetic data. J. Comput. 1(4), 18–29 (2006)

    Article  Google Scholar 

  12. Caron, F., Fox, E.B.: Sparse graphs using exchangeable random measures. ArXiv e-prints, January 2014

    Google Scholar 

  13. Newman, M.E.J., Strogatz, S.H., Watts, D.J.: Random graphs with arbitrary degree distributions and their applications. Phys. Rev. E 64(2), 026118 (2001)

    Article  Google Scholar 

  14. Hu, Y., Koren, Y., Volinsky, C.: Collaborative filtering for implicit feedback datasets. In: Data Mining, 2008, pp. 263–272. IEEE, ICDM 2008 (2008)

    Google Scholar 

  15. Aldous, D.J.: Representations for partially exchangeable arrays of random variables. J. Multivar. Anal. 11(4), 581–598 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  16. Hoover, D.N.: Relations on probability spaces and arrays of random variables. Technical report, Institute for Advanced Study, Princeton, NJ (1979)

    Google Scholar 

  17. Hernandez-Lobato, J.M., Houlsby, N., Ghahramani, Z.: Stochastic inference for scalable probabilistic modeling of binary matrices. In: ICML (2014)

    Google Scholar 

  18. Gopalan, P., Hofman, J.M., Blei, D.M.: Scalable recommendation with hierarchical Poisson factorization. In: UAI (2015)

    Google Scholar 

  19. Bertin-Mahieux, T., Ellis, D.P.W., Whitman, B., Lamere, P.: The million song dataset. In: Proceedings of 12th ISMIR (2011)

    Google Scholar 

  20. Tang, J., Gao, H., Liu, H.: eTrust: discerning multi-faceted trust in a connected world. In: ACM International Conference on Web Search and Data Mining (2012)

    Google Scholar 

  21. Tang, J., Gao, H., Liu, H., Das Sarma, A.: eTrust: Understanding trust evolution in an online world. In: Proceedings of the 18th ACM SIGKDD, pp. 253–261. ACM (2012)

    Google Scholar 

  22. Ziegler, C.-N., McNee, S.M., Konstan, J.A., Lausen, G.: Improving recommendation lists through topic diversification. In: Proceedings of WWW (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matthew Smith .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Smith, M., Charlin, L., Pineau, J. (2017). A Sparse Probabilistic Model of User Preference Data. In: Mouhoub, M., Langlais, P. (eds) Advances in Artificial Intelligence. Canadian AI 2017. Lecture Notes in Computer Science(), vol 10233. Springer, Cham. https://doi.org/10.1007/978-3-319-57351-9_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-57351-9_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-57350-2

  • Online ISBN: 978-3-319-57351-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics