Skip to main content
Log in

Predicting user demographics based on interest analysis in movie dataset

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

These days, due to the increasing amount of information generated on the web, most web service providers try to personalize their services. Users also interact with web-based systems in multiple ways and state their interests and preferences by rating the provided items. In this paper, we propose a framework to predict users’ demographic based on ratings registered by users in a system. To the best of our knowledge, this is the first time that the item ratings are employed for users’ demographic prediction problem, which has extensively been studied in recommendation systems and service personalization. We apply the framework to Movielens dataset’s ratings and predict users’ age and gender. The experimental results show that using all ratings registered by users improves the prediction accuracy by at least 16% compared with previously studied models. Moreover, by classifying the items as popular and unpopular, we eliminate ratings belong to 95% of items and still reach an acceptable level of accuracy. This significantly reduces update cost in a time-varying environment. Besides this classification, we propose other methods to reduce data volume while keeping the predictions accurate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data Availability

The data used in this research is readily available online and has been cited appropriately within the text of this paper. Specifically, we utilized two publicly accessible datasets: the Internet Movie Database (IMDb) dataset and the MovieLens dataset. The IMDb and MovieLens dataset can be accessed through the mentioned links in Section 2.1. Detailed information on our utilization of these datasets, including data preprocessing and citation, is provided in the relevant sections of this paper. Researchers interested in reproducing or further exploring our findings are encouraged to refer to these sources for access to the data used in this study.

Notes

  1. https://grouplens.org/datasets/movielens/

  2. https://www.imdb.com/

  3. https://www.motionpictures.org/film-ratings/

  4. http://bit.ly/3ulZch1

References

  1. Ahmadian S, Joorabloo N, Jalili M, Ren Y, Meghdadi M, Afsharchi M (2020) A social recommender system based on reliable implicit relationships. Knowl-Based Syst 192:105371. https://doi.org/10.1016/j.knosys.2019.105371

    Article  Google Scholar 

  2. Al-Zuabi IM, Jafar A, Aljoumaa K (2019) Predicting customer’s gender and age depending on mobile phone data. Journal of Big Data 6(1):1–16. https://doi.org/10.1186/s40537-019-0180-9

    Article  Google Scholar 

  3. Bin Tareaf R, Berger P, Hennig P, Jung J, Meinel C (2017) Identifying audience attributes: predicting age, gender and personality for enhanced article writing. In: Proceedings of the 2017 international conference on cloud and big data computing, pp 79–88. https://doi.org/10.1145/3141128.3141129

  4. Díez J, Martínez-Rego D, Alonso-Betanzos A, Luaces O, Bahamonde A (2019) Optimizing novelty and diversity in recommendations. Progress in Artificial Intelligence 8(1):101–109. https://doi.org/10.1007/s13748-018-0158-4

    Article  Google Scholar 

  5. Dong Y, Yang Y, Tang J, Yang Y, Chawla NV (2014) Inferring user demographics and social strategies in mobile social networks. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 15–24. https://doi.org/10.1145/2623330.2623703

  6. Eirinaki M, Gao J, Varlamis I, Tserpes K (2018). Recommender systems for large-scale social networks: a review of challenges and solutions. https://doi.org/10.1016/j.future.2017.09.015Get

    Article  Google Scholar 

  7. Garcia S, Derrac J, Cano J, Herrera F (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Anal Mach Intell 34(3):417–435. https://doi.org/10.1109/TPAMI.2011.142

    Article  PubMed  Google Scholar 

  8. Garcia-Guzman R, Andrade-Ambriz YA, Ibarra-Manzano MA, Ledesma S, Gomez JC, Almanza-Ojeda DL (2020) Trend-based categories recommendations and age-gender prediction for pinterest and twitter users. Appl Sci 10(17):5957. https://doi.org/10.3390/app10175957

    Article  CAS  Google Scholar 

  9. Gardner J, Brooks C (2018) Student success prediction in moocs. User Model User-Adap Inter 28(2):127–203. https://doi.org/10.1007/s11257-018-9203-z

    Article  Google Scholar 

  10. Gong W, Wu H, Wang X, Zhang X, Wang Y, Chen Y, Khosravi MR (2023) Diversified and compatible web apis recommendation based on game theory in iot. Digital Communications and Networks

  11. Guimaraes RG, Rosa RL, De Gaetano D, Rodriguez DZ, Bressan G (2017) Age groups classification in social network using deep learning. IEEE Access 5:10805–10816. https://doi.org/10.1109/ACCESS.2017.2706674

    Article  Google Scholar 

  12. Hamedani EM, Kaedi M (2019) Recommending the long tail items through personalized diversification. Knowl-Based Syst 164:348–357. https://doi.org/10.1016/j.knosys.2018.11.004

    Article  Google Scholar 

  13. Hu J, Zeng HJ, Li H, Niu C, Chen Z (2007) Demographic prediction based on user’s browsing behavior. In: Proceedings of the 16th international conference on World Wide Web, pp 151–160. https://doi.org/10.1145/1242572.1242594

  14. Huang J, Li B, Zhu J, Chen J (2017) Age classification with deep learning face representation. Multimedia Tools and Applications 76(19):20231–20247. https://doi.org/10.1007/s11042-017-4646-5

    Article  Google Scholar 

  15. Huang X, Wu F (2019) A novel topic-based framework for recommending long tail products. Computers & Industrial Engineering 137:106063. https://doi.org/10.1016/j.cie.2019.106063

    Article  Google Scholar 

  16. Japkowicz N, Shah M (2011) Evaluating learning algorithms: a classification perspective. Cambridge University Press. https://doi.org/10.1017/CBO9780511921803

    Article  Google Scholar 

  17. Kalimeri K, Beiró MG, Delfino M, Raleigh R, Cattuto C (2019) Predicting demographics, moral foundations, and human values from digital behaviours. Comput Hum Behav 92:428–445. https://doi.org/10.1016/j.chb.2018.11.024

    Article  Google Scholar 

  18. Karatzoglou A, Ebbing J, Ostheimer P, Hua W, Beigl M (2020) Sentient destination prediction. User Modeling and User-adapted Interaction, pp 1–33. https://doi.org/10.1007/s11257-020-09257-5

  19. Katna R, Kalsi K, Gupta S, Yadav D, Yadav AK (2022) Machine learning based approaches for age and gender prediction from tweets. Multimedia Tools and Applications, pp 1–19. https://doi.org/10.1007/s11042-022-12920-1

  20. Kim I, Pant G (2019) Predicting web site audience demographics using content and design cues. Information & Management 56(5):718–730. https://doi.org/10.1016/j.im.2018.11.005

    Article  Google Scholar 

  21. Li Y, Yang L, Xu B, Wang J, Lin H (2019) Improving user attribute classification with text and social network attention. Cogn Comput 11(4):459–468. https://doi.org/10.1007/s12559-019-9624-y

    Article  Google Scholar 

  22. Malmi E, Weber I (2016) You are what apps you use: demographic prediction based on user’s apps. In: Proceedings of the international AAAI conference on Web and social media, vol 10

  23. Morgan-Lopez AA, Kim AE, Chew RF, Ruddle P (2017) Predicting age groups of twitter users based on language and metadata features. PLoS ONE 12(8):e0183537. https://doi.org/10.1371/journal.pone.0183537

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Nguyen D, Gravel R, Trieschnigg D, Meder T (2013) ” how old do you think i am?” a study of language and age in twitter. In: Proceedings of the international AAAI conference on Web and social media, vol 7

  25. Pandya A, Oussalah M, Monachesi P, Kostakos P (2020) On the use of distributed semantics of tweet metadata for user age prediction. Futur Gener Comput Syst 102:437–452

    Article  Google Scholar 

  26. Park YJ, Tuzhilin A (2008) The long tail of recommender systems and how to leverage it. In: Proceedings of the 2008 ACM conference on Recommender systems, pp 11–18. https://doi.org/10.1145/1454008.1454012

  27. Sreepada RS, Patra BK (2020) Mitigating long tail effect in recommendations using few shot learning technique. Expert Syst Appl 140:112887. https://doi.org/10.1016/j.eswa.2019.112887

    Article  Google Scholar 

  28. Taeuscher K (2019) Uncertainty kills the long tail: demand concentration in peer-to-peer marketplaces. Electron Mark 29(4):649–660. https://doi.org/10.1007/s12525-019-00339-w

    Article  Google Scholar 

  29. Valcarce D, Parapar J, Barreiro Á (2016) Item-based relevance modelling of recommendations for getting rid of long tail products. Knowl-Based Syst 103:41–51. https://doi.org/10.1016/j.knosys.2016.03.021

    Article  Google Scholar 

  30. Wang S, Gong M, Li H, Yang J (2016) Multi-objective optimization for long tail recommendation. Knowl-Based Syst 104:145–155. https://doi.org/10.1016/j.knosys.2016.04.018

    Article  Google Scholar 

  31. Zhong E, Tan B, Mo K, Yang Q (2013) User demographics prediction based on mobile data. Pervasive Mob Comput 9(6):823–837. https://doi.org/10.1016/j.pmcj.2013.07.009

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marjan Kaedi.

Ethics declarations

Conflict of interest

The authors declare that there is no competing financial interests or personal relationships that influence the work in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shafiloo, R., Kaedi, M. & Pourmiri, A. Predicting user demographics based on interest analysis in movie dataset. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18422-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-024-18422-6

Keywords

Navigation