Skip to main content
Log in

Using Implicit Preference Relations to Improve Recommender Systems

  • Original Article
  • Published:
Journal on Data Semantics

Abstract

Our work is generally focused on making recommendations for small or medium-sized e-commerce portals, where we are facing scarcity of explicit feedback, low user loyalty, short visit durations or a low number of visited objects. In this paper, we present a novel approach to use a specific user behavior pattern as implicit feedback, forming binary relations between objects. Our hypothesis is that if a user selects a specific object from the list of displayed objects, it is an expression of his/her binary preference between the selected object and others that are visible, but ignored. We expand this relation with content-based similarity of objects. We define implicit preference relation (IPR) a partial ordering of objects based on similarity expansion of ignored-selected preference relation. We propose a merging algorithm utilizing the synergic effect of two approaches this IPR partial ordering and a list of recommended objects based on any/another algorithm. We report on a series of offline experiments with various recommending algorithms on two real-world e-commerce datasets. The merging algorithm could improve the ranked list of most of the evaluated algorithms in terms of nDCG. Furthermore, we also provide access to the relevant datasets and source codes for further research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. By visit we mean that single user opens single webpage on a certain time, so the VisitID comprises identification of user, page and temporal context.

  2. We need to distinguish such an informed decision from the case where the user did not select \(o_2\) because he/she was not aware of its existence.

  3. Noticeability(oid) is defined as a probabilistic sum of nVisibleAbs and VisibleRel. We use a probabilistic sum instead of, e.g., the average or max as we expect some mutual benefit if both nVisibleAbs and VisibleRel values are high. Other fuzzy-logic disjunctions could be used too, but as this is not the key part of the paper, we opted for using a simple option like this one.

  4. http://www.antikvariat-ichtys.cz.

  5. http://www.slantour.cz.

  6. Only objects with completely orthogonal features have zero similarity.

  7. Visits taking less than 0.5 s were removed to omit accidental clicks.

  8. For an arbitrary fixed user u and methods \(M_1\) and \(M_2\) we compared position of each preferred object, whether it was improved, deteriorated or remained the same.

References

  1. Baltrunas L, Amatriain X (2009) Towards time-dependent recommendation based on implicit feedback. In: CARS 2009 (RecSys)

  2. Belluf T, Xavier L, Giglio R (2012) Case study on the business value impact of personalized recommendations on a large online retailer. In: Proceedings of the sixth ACM conference on Recommender systems. ACM, pp 277–280

  3. Chen J, Miller C, Dagher G (2014) Product recommendation system for small online retailers using association rules mining. Innovative Design and Manufacturing (ICIDM). In: Proceedings of the 2014 International Conference on, pp 71–77

  4. Cho Y, Kim J, Ahn D (2005) A personalized product recommender for web retailers. In: Systems modeling and simulation: theory and applications, vol 3398. Springer, Berlin, Heidelberg, pp 296–305

  5. Claypool M, Le P, Wased M, Brown D (2001) Implicit interest indicators. In: IUI ’01. ACM, pp 33–40

  6. Cremonesi P, Garzotto F, Turrin R (2013) User-centric vs. system-centric evaluation of recommender systems. In: INTERACT 2013, LNCS, vol 8119. Springer, pp 334–351

  7. Desarkar M, Saxena R, Sarkar S (2012) Preference relation based matrix factorization for recommender systems. In: UMAP 2012, LNCS, vol 7379. Springer, pp 63–75

  8. Eckhardt A, Horváth T, Vojtáš P (2007) PHASES: a user profile learning approach for web search. In: WI 2007, IEEE, pp 780–783

  9. Fang Y, Si L (2012) A latent pairwise preference learning approach for recommendation from implicit feedback. In: CIKM ’12. ACM, pp 2567–2570

  10. Gorgoglione M, Panniello U, Tuzhilin A (2011) The effect of context-aware recommendations on customer purchasing behavior and trust. In: Proceedings of the fifth ACM conference on Recommender systems. ACM, pp 85–92

  11. Hidasi B, Tikk D (2013) Initializing matrix factorization methods on implicit feedback databases. J UCS 19:1834–1853

    Google Scholar 

  12. Hu Y, Koren Y, Volinsky C (2008) Collaborative filtering for implicit feedback datasets. In: ICDM 2008. IEEE, pp 263–272

  13. Kaminskas M, Bridge D, Foping F, Roche D (2016) Product recommendation for small-scale retailers. In: Proceedings of EC-WEB 2015 Conference, LNBIP, vol 239. Springer

  14. Kelly D, Teevan J (2003) Implicit feedback for inferring user preference: a bibliography SIGIR Forum, vol 37. ACM, pp 18–28

  15. Kobsa A, Koenemann J, Pohl W (2001) Personalised hypermedia presentation techniques for improving online customer relationships. Knowl Eng Rev 16:111–155

    Article  MATH  Google Scholar 

  16. Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems, vol 42. Computer, IEEE Computer Society Press, pp 30–37

  17. Lai Y, Xu X, Yang Z, Liu Z (2012) User interest prediction based on behaviors analysis. Int J Dig Content Technol Appl 6(13):192–204

    Google Scholar 

  18. Lee DH, Brusilovsky P (2009) Reinforcing recommendation using implicit negative feedback. In: UMAP 2009, LNCS, vol 5535. Springer, pp 422–427

  19. Linden G, Smith B, York J (2003) Amazon.com recommendations: item-to-item collaborative filtering. Internet Comput IEEE 7:76–80

    Article  Google Scholar 

  20. Lops P, de Gemmis M, Semeraro G (2011) Content-based recommender systems: state of the art and trends. In: Recommender systems handbook. Springer, pp 73–105

  21. Ostuni VC, Di Noia T, Di Sciascio E, Mirizzi R (2013) Top-N recommendations from implicit feedback leveraging linked open data. In: RecSys 2013. ACM, pp 85–92

  22. Peska L (2014) IPIget the component for collecting implicit user preference indicators. In: ITAT 2014, Ustav informatiky AV CR, pp 22–26. http://itat.ics.upjs.sk/workshops

  23. Peska L, Eckhardt A, Vojtas P (2011) UPComp—a PHP component for recommendation based on user behaviour. In: WI-IAT 2011, vol 3. IEEE Computer Society, pp 306–309

  24. Peska L, Vojtas P (2012) Evaluating various implicit factors in e-commerce. In: RUE (RecSys) 2014, CEUR, vol 910, pp 51–55

  25. Peska L, Vojtas P (2013) Negative implicit feedback in e-commerce recommender systems. In: WIMS 2013. ACM, pp 45:1–45:4

  26. Peska L, Vojtas P (2013) Enhancing recommender system with linked open data. In: FQAS 2013, LNCS, vol 8132. Springer, pp 483–494

  27. Peska L, Vojtas P (2014) Recommending for disloyal customers with low consumption rate. In: SofSem 2014, LNCS, vol 8327. Springer, pp 455–465

  28. Peska L, Vojtas P (2015) How to Interpret Implicit User Feedback? In: Poster Proceedings of ACM RecSys 2015, CEUR, p 1441

  29. Raman K, Shivaswamy P, Joachims T (2012) Online learning to diversify from implicit feedback. In: KDD 2012. ACM, pp 705–713

  30. Rendle S, Freudenthaler C, Gantner Z, Schmidt-Thieme L (2009) BPR: Bayesian personalized ranking from implicit feedback. In: UAI 2009. AUAI Press, pp 452–461

  31. Rubens N, Kaplan D, Sugiyama M (2011) Active learning in recommender systems. In: Recommender Systems Handbook. Springer, US, pp 735–767

  32. Yang B, Lee S, Park S, Lee S (2012) Exploiting various implicit feedback for collaborative filtering. In: WWW 2012. ACM, pp 639–640

Download references

Acknowledgments

This work was supported by Czech Grants No. SVV-2015-260222, P46 and GAUK-126313. Some additional materials were made available online: – Secondhand Bookshop dataset: http://www.ksi.mff.cuni.cz/~peska/bookshop2015.zip. –Travel Agency dataset: http://www.ksi.mff.cuni.cz/~peska/travel Agency2015.zip. – IPIget component: https://github.com/lpeska/IPIget. – IPR source codes: https://github.com/lpeska/Implicit-Preference-Relations.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ladislav Peska.

Appendices

Appendix 1: Secondhand Bookshop Dataset Statistics

In this section we provide details about the Secondhand Bookshop dataset. This information pertains to the most up-to-date version of the dataset (end of September 2015) available. The experiments were conducted on an earlier version of the dataset (February 2015), however, the general statistics are quite stable, except for the total volume of interactions.

One of the main specifics of the Secondhand Bookshop domain is the limited availability of the products. Most of the books are available only in a single piece, so a direct consequence of a successful purchase is that the book becomes unavailable. This limited availability also affects design and usage of the book attributes. Especially for cheap books, it is not cost effective to fill in too many attributes as the potential income from the sold book would barely cover the cost of labor. In addition, a considerable portion of the books is old prints without, e.g., ISBN, so the unique identification of books is problematic. The dataset consists of two parts, both available as an exported SQL table. The first part is the user behavior collected by the IPIget component. The second part contains content-based book attributes.

The IPIget component was deployed on the bookshop website in February 2014, after a year and a half of behavior tracing, the component collected data about 89K visits made by 49K users on 18K different pages of which were 8947 different object detail pages. The dataset sparsity can be considered from multiple viewpoints. If we consider user feedback on page IDs, the sparsity is 99.9999 %. If we consider feedback on object details only, the sparsity is still 99.99985 %. However, if we distribute user feedback on category pages over all displayed objects, the sparsity gets to a more reasonable value of 99.997 %. Table 6 contains basic statistics of the dataset related to the different page types.

The average portion of page visible within the browser window is 31.8 % (standard deviation \(=\) 15.7 %). Table 7 further shows that each page type has a considerable portion of the page content hidden outside of the browser’s visible area. This is an argument for collecting scrolling and related feedback together with position of various content blocks in order to estimate its noticeability.

Different behavior types vary in frequency of occurrence, so Table 8 provides statistics about the occurrence of collected implicit preference indicators. This table also shows the correlation between each indicator and the indicator of purchase, which can be viewed as ground truth of user preference.

The process of sending an order form requires the user to click on the “Buy” button, so we consider the high correlation of clicks and purchases as a mere result of the GUI design. On the other hand, correlations among mouseMovingTime, scrollingTime and pageViews seem to be non-trivial and worth further study. In addition, notable is the absence of correlation for the dwellTime.

The second part of the dataset contains content-based attributes of the books. The bookshop domain is quite dynamic tens of books are added every week and a single successful purchase often makes the product unavailable for another buyer. Thus, we adopted a time-aware model of object attributes. The attributes are checked on a daily basis for updates, validFrom and validTo columns define dates when the book with specified attributes was available. Table 9 contains a description of the dataset columns.

The Bookshop Attributes dataset was collected from February, 2014 to September, 2015. We collected in total almost 14,000 records about 12,482 books. Comparing with the user behavior dataset, we collected considerably more books. The reason is quite simple given the total volumes of books and users, some books had never been visited.

Appendix 2: Travel Agency Dataset Statistics

In this section we provide details about the Travel Agency dataset. This information pertains to the most up-to-date version of the dataset (end of September 2015) available, the experiments were conducted on an earlier version of the dataset (up to June 2015). The dataset consists of two parts, both available as exported SQL tables. The first part is the user behavior collected by the IPIget component. The second part contains content-based tour attributes. Before we continue, please let us mention some specifics of the tours domain.

Tours are in general quite expensive, which makes their purchasing frequency low. Many users purchase tours only once per year or so. In addition buying more than a single tour at a time is extremely rare. This differs from, e.g., consumable goods, where the frequency is higher and the user often purchases multiple items at once. The latency between purchases makes it difficult to track the user between two consecutive purchases. Cookies identifying a user might have been deleted, the user changed his/her device, etc. Furthermore, the same tour is often repeated on different dates and prices; there might be multiple base prices per tour, e.g., depending on the selected accommodation. Thus, we provide aggregated information about prices in the content-based attributes table.

The IPIget component was deployed on the Travel Agency website in September 2014, after the one year of behavior tracing, the component collected data about 564 thousand visits of 219 thousand users on 12500 different pages, of which was 2500 object detail pages. The dataset sparsity on pages is 99.9998 %, sparsity on objects is 99.9992 % and if we distribute feedback on category pages over all displayed objects, the sparsity is 98.04 %. Table 10 contains basic statistics of the dataset related to the different page types.

The average visible page area is 51 % (standard deviation = 31 %). Similarly as for the Secondhand Bookshop, a large portion of the page is hidden for each page type (Table 11).

Finally, Table 12 contains the occurrence frequency and correlation with purchase indicator for other collected behavior types. There are some correlations between purchases and pageViews, mouseMovingTime and scrollingTime, however, the dependence is quite weak. This might have been caused by more variable content of the tours (e.g., in terms of length of the textual description, a number of images etc.) and some further transformation thus seems important (e.g., as proposed in [28]).

The content-based tour attributes were collected from the end of September, 2014 to October, 2015. We collected in total almost 30,000 records about 1,663 tours. Note that we distinguish between different dates of the same tour as they might also substantially differ in price. Some users might only search for specific dates, too. Table 13 contains a description of the dataset columns.

We would like to note for the interested reader some specifics of the attributes dataset coming from the data structure of the underlined website. If the value of MaxDiscount attribute is \(\le \) 100, consider it a discount in percent based on the price of ordered services. Otherwise it is a fixed discount in CZK per person. The values of meal, transportation and accommodation type and accommodation category can be considered ordered variables where the higher value stands for the better service (e.g., for mealType 0 = no meal, 1 = breakfast, 2 = half board, etc.). On the other hand tour type identifiers are strictly categorical with no ordering at all.

Some of the attributes of the tours may change over time, especially price- and discount-related information. However, as the tours are “soft” products other attributes might also change over time. Thus, we introduced temporal context of the attributes: ValidFrom, ValidTo. ValidFrom identifies the first occurrence of the tour with the specified attributes (i.e., it is either a new tour, or some of its attributes were changed). ValidTo defines the last date the tour with this attributes was available (i.e., the tour attributes might have been changed, the tour was deleted, sold out or has already started and it is no longer possible to buy it). ValidTo and ValidFrom context items are checked once per day.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Peska, L., Vojtas, P. Using Implicit Preference Relations to Improve Recommender Systems. J Data Semant 6, 15–30 (2017). https://doi.org/10.1007/s13740-016-0061-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13740-016-0061-8

Keywords

Navigation