Abstract
Context
The game industry is increasingly growing in recent years. Every day, millions of people play video games, not only as a hobby, but also for professional competitions ( e.g., e-sports or speed-running) or for making business by entertaining others ( e.g., streamers). The latter daily produce a large amount of gameplay videos in which they also comment live what they experience. But no software and, thus, no video game is perfect: Streamers may encounter several problems (such as bugs, glitches, or performance issues) while they play. Also, it is unlikely that they explicitly report such issues to developers. The identified problems may negatively impact the user’s gaming experience and, in turn, can harm the reputation of the game and of the producer.
Objective
In this paper, we propose and empirically evaluate GELID, an approach for automatically extracting relevant information from gameplay videos by (i) identifying video segments in which streamers experienced anomalies; (ii) categorizing them based on their type ( e.g., logic or presentation); clustering them based on (iii) the context in which appear ( e.g., level or game area) and (iv) on the specific issue type ( e.g., game crashes).
Method
We manually defined a training set for step 2 of GELID (categorization) and a test set for validating in isolation the four components of GELID. In total, we manually segmented, labeled, and clustered 170 videos related to 3 video games, defining a dataset containing 604 segments.
Results
While in steps 1 (segmentation) and 4 (specific issue clustering) GELID achieves satisfactory results, it shows limitations on step 3 (game context clustering) and, above all, step 2 (categorization).
Similar content being viewed by others
Data Availability
All the datasets produced and the scripts implemented to obtain the results reported in this paper (including our implementation of GELID) are available in our replication package (Guglielmi et al. 2023).
Notes
33 and 48 in the training set, 31 and 4 in the test set for performance and balance, respectively.
Note that the results differ from the ones reported in Table 6 because we did not run any preprocessing step here.
References
Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) Optics: Ordering points to identify the clustering structure. ACM Sigmod Rec 28(2):49–60
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc Ser B (Methodol) 57(1):289–300
Bradley AP (1997) The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145–1159
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Chen N, Lin J, Hoi SC, Xiao X, Zhang B (2014) Ar-miner: mining informative reviews for developers from mobile app marketplace. In: Proceedings of the 36th international conference on software engineering, pp 767–778
Choudhury S, Bhowal A (2015) Comparative analysis of machine learning algorithms along with classifiers for network intrusion detection. In: 2015 International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM), IEEE, pp 89–95
Cliff N (1993) Dominance statistics: Ordinal analyses to answer ordinal questions. Psychol Bull 114(3):494
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Measur 20(1):37–46
Ester M, Kriegel HP, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: kdd, vol 96, pp 226–231
Flach PA (2016) Roc analysis. In: Encyclopedia of machine learning and data mining, Springer, pp 1–8
Fukunaga K, Hostetler L (1975) The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans Inf Theory 21(1):32–40
Gnanambal S, Thangaraj M, Meenatchi V, Gayathri V (2018) Classification algorithms with attribute selection: an evaluation study using weka. Int J Adv Netw Appl 9(6):3640–3644
Guglielmi E, Scalabrino S, Bavota G, Oliveto R (2022) Towards using gameplay videos for detecting issues in video games. arXiv preprint arXiv:220404182
Guglielmi E, Scalabrino S, Bavota G, Oliveto R (2023) Replication package of "using gameplay videos for detecting issues in video games". https://figshare.com/s/3de4d6958a57073dfa1b
Guzdial M, Shah S, Riedl M (2018) Towards automated let’s play commentary. arXiv preprint arXiv:180909424
Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B (1998) Support vector machines. IEEE Intell Syst Their Appl 13(4):18–28
Ho TK (1995) Random decision forests. In: Proceedings of 3rd international conference on document analysis and recognition, IEEE, vol 1, pp 278–282
Jones SE (2008) The meaning of video games: Gaming and textual strategies. Routledge
Karvelis P, Gavrilis D, Georgoulas G, Stylios C (2018) Topic recommendation using doc2vec. In: 2018 International Joint Conference on Neural Networks (IJCNN), IEEE, pp 1–6
Lewis C, Whitehead J, Wardrip-Fruin N (2010) What went wrong: a taxonomy of video game bugs. In: Proceedings of the fifth international conference on the foundations of digital games, pp 108–115
Li C, Gandhi S, Harrison B (2019) End-to-end let’s play commentary generation using multi-modal video representations. In: Proceedings of the 14th International Conference on the Foundations of Digital Games, pp 1–7
Lin D, Bezemer CP, Hassan AE (2017) Studying the urgent updates of popular games on the steam platform. Empir Softw Eng 22:2095–2126
Lin D, Bezemer CP, Hassan AE (2019) Identifying gameplay videos that exhibit bugs in computer games. Empir Softw Eng 24(6):4006–4033
MacFarland TW, Yates JM, MacFarland TW, Yates JM (2016) Mann–whitney u test. Introduction to nonparametric statistics for the biological sciences using R pp 103–132
Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics pp 50–60
Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41
Murphy-Hill E, Zimmermann T, Nagappan N (2014) Cowboys, ankle sprains, and keepers of quality: How is video game development different from software development? In: Proceedings of the 36th International Conference on Software Engineering, pp 1–11
Ozkok FO, Celik M (2017) A new approach to determine eps parameter of dbscan algorithm. Int J Intell Syst Appl Eng 5(4):247–251
Python (2023a) Opencv. https://opencv.org, [Online]
Python (2023b) spacy. https://spacy.io/, [Online]
Python (2023c) Video-kf. https://pypi.org/project/video-kf/, [Online]
Ramchoun H, Ghanou Y, Ettaouil M, Janati Idrissi MA (2016) Multilayer perceptron: Architecture optimization and training. Int J Interact Multimed Artif Intell 4(1):26–30
Rong X (2014) word2vec parameter learning explained. arXiv preprint arXiv:14112738
Santos RE, Magalhães CV, Capretz LF, Correia-Neto JS, da Silva FQ, Saher A (2018) Computer games are serious business and so is their quality: particularities of software testing in game development from the perspective of practitioners. In: Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, pp 1–10
Scalabrino S, Bavota G, Russo B, Di Penta M, Oliveto R (2017) Listening to the crowd for the release planning of mobile apps. IEEE Trans Software Eng 45(1):68–86
Shah S, Guzdial M, Riedl MO (2019) Automated let’s play commentary. arXiv preprint arXiv:190902195
Souček T, Lokoč J (2020) Transnet v2: An effective deep network architecture for fast shot transition detection
Steam (2023a) Conan exiles. https://store.steampowered.com/app/440900/Conan_Exiles/
Steam (2023b) Dayz. https://store.steampowered.com/app/221100/DayZ/
Steam (2023c ) New world. https://store.steampowered.com/app/1063730/New_World/
Tang S, Feng L, Kuang Z, Chen Y, Zhang W (2018) Fast video shot transition localization with deep structured models. In: Asian Conference on Computer Vision, Springer, pp 577–592
Tian Y, Lo D, Lawall J (2014) Sewordsim: Software-specific word similarity database. In: Companion Proceedings of the 36th International Conference on Software Engineering, pp 568–571
Toy EJ, Kummaragunta JV, Yoo JS (2018) Large-scale cross-country analysis of steam popularity. In: 2018 International Conference on Computational Science and Computational Intelligence (CSCI), IEEE, pp 1054–1058
Truelove A, de Almeida ES, Ahmed I (2021) We’ll fix it in post: What do bug fixes in video game update notes tell us? In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), IEEE, pp 736–747
Wan T, Jun H, Zhang H, Pan W, Hua H (2015) Kappa coefficient: a popular measure of rater agreement. Shanghai Arch Psychiatry 27(1):62
Wang Z, Bovik A, Sheikh H, Simoncelli E (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612. https://doi.org/10.1109/TIP.2003.819861
Wen Z, Tzerpos V (2004) An effectiveness measure for software clustering algorithms. In: Proceedings. 12th IEEE International Workshop on Program Comprehension, 2004., IEEE, pp 194–203
Zhang Y, Jin R, Zhou ZH (2010) Understanding bag-of-words model: a statistical framework. Int J Mach Learn Cybern 1(1):43–52
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Communicated by: Jin Guo, Raula Gaikovina Kula
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: Special Issue on Registered Reports
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Guglielmi, E., Scalabrino, S., Bavota, G. et al. Using gameplay videos for detecting issues in video games. Empir Software Eng 28, 136 (2023). https://doi.org/10.1007/s10664-023-10365-0
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-023-10365-0