Skip to main content
Log in

Using gameplay videos for detecting issues in video games

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Context

The game industry is increasingly growing in recent years. Every day, millions of people play video games, not only as a hobby, but also for professional competitions ( e.g., e-sports or speed-running) or for making business by entertaining others ( e.g., streamers). The latter daily produce a large amount of gameplay videos in which they also comment live what they experience. But no software and, thus, no video game is perfect: Streamers may encounter several problems (such as bugs, glitches, or performance issues) while they play. Also, it is unlikely that they explicitly report such issues to developers. The identified problems may negatively impact the user’s gaming experience and, in turn, can harm the reputation of the game and of the producer.

Objective

In this paper, we propose and empirically evaluate GELID, an approach for automatically extracting relevant information from gameplay videos by (i) identifying video segments in which streamers experienced anomalies; (ii) categorizing them based on their type ( e.g., logic or presentation); clustering them based on (iii) the context in which appear ( e.g., level or game area) and (iv) on the specific issue type ( e.g., game crashes).

Method

We manually defined a training set for step 2 of GELID (categorization) and a test set for validating in isolation the four components of GELID. In total, we manually segmented, labeled, and clustered 170 videos related to 3 video games, defining a dataset containing 604 segments.

Results

While in steps 1 (segmentation) and 4 (specific issue clustering) GELID achieves satisfactory results, it shows limitations on step 3 (game context clustering) and, above all, step 2 (categorization).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data Availability

All the datasets produced and the scripts implemented to obtain the results reported in this paper (including our implementation of GELID) are available in our replication package (Guglielmi et al. 2023).

Notes

  1. https://twitch.tv

  2. https://youtu.be/ybvXzSLy9Ew?t=1448

  3. https://steamcommunity.com/

  4. https://developers.google.com/youtube/v3

  5. http://www.cs.waikato.ac.nz/ml/weka/

  6. 33 and 48 in the training set, 31 and 4 in the test set for performance and balance, respectively.

  7. https://youtu.be/1duizy5DSOg?t=1540

  8. https://youtu.be/eDQIdqDC-sc?t=239

  9. Note that the results differ from the ones reported in Table 6 because we did not run any preprocessing step here.

References

  • Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) Optics: Ordering points to identify the clustering structure. ACM Sigmod Rec 28(2):49–60

    Article  Google Scholar 

  • Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc Ser B (Methodol) 57(1):289–300

    MathSciNet  MATH  Google Scholar 

  • Bradley AP (1997) The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145–1159

    Article  Google Scholar 

  • Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  MATH  Google Scholar 

  • Chen N, Lin J, Hoi SC, Xiao X, Zhang B (2014) Ar-miner: mining informative reviews for developers from mobile app marketplace. In: Proceedings of the 36th international conference on software engineering, pp 767–778

  • Choudhury S, Bhowal A (2015) Comparative analysis of machine learning algorithms along with classifiers for network intrusion detection. In: 2015 International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM), IEEE, pp 89–95

  • Cliff N (1993) Dominance statistics: Ordinal analyses to answer ordinal questions. Psychol Bull 114(3):494

    Article  Google Scholar 

  • Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Measur 20(1):37–46

    Article  Google Scholar 

  • Ester M, Kriegel HP, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: kdd, vol 96, pp 226–231

  • Flach PA (2016) Roc analysis. In: Encyclopedia of machine learning and data mining, Springer, pp 1–8

  • Fukunaga K, Hostetler L (1975) The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans Inf Theory 21(1):32–40

    Article  MathSciNet  MATH  Google Scholar 

  • Gnanambal S, Thangaraj M, Meenatchi V, Gayathri V (2018) Classification algorithms with attribute selection: an evaluation study using weka. Int J Adv Netw Appl 9(6):3640–3644

    Google Scholar 

  • Guglielmi E, Scalabrino S, Bavota G, Oliveto R (2022) Towards using gameplay videos for detecting issues in video games. arXiv preprint arXiv:220404182

  • Guglielmi E, Scalabrino S, Bavota G, Oliveto R (2023) Replication package of "using gameplay videos for detecting issues in video games". https://figshare.com/s/3de4d6958a57073dfa1b

  • Guzdial M, Shah S, Riedl M (2018) Towards automated let’s play commentary. arXiv preprint arXiv:180909424

  • Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B (1998) Support vector machines. IEEE Intell Syst Their Appl 13(4):18–28

    Article  Google Scholar 

  • Ho TK (1995) Random decision forests. In: Proceedings of 3rd international conference on document analysis and recognition, IEEE, vol 1, pp 278–282

  • Jones SE (2008) The meaning of video games: Gaming and textual strategies. Routledge

  • Karvelis P, Gavrilis D, Georgoulas G, Stylios C (2018) Topic recommendation using doc2vec. In: 2018 International Joint Conference on Neural Networks (IJCNN), IEEE, pp 1–6

  • Lewis C, Whitehead J, Wardrip-Fruin N (2010) What went wrong: a taxonomy of video game bugs. In: Proceedings of the fifth international conference on the foundations of digital games, pp 108–115

  • Li C, Gandhi S, Harrison B (2019) End-to-end let’s play commentary generation using multi-modal video representations. In: Proceedings of the 14th International Conference on the Foundations of Digital Games, pp 1–7

  • Lin D, Bezemer CP, Hassan AE (2017) Studying the urgent updates of popular games on the steam platform. Empir Softw Eng 22:2095–2126

    Article  Google Scholar 

  • Lin D, Bezemer CP, Hassan AE (2019) Identifying gameplay videos that exhibit bugs in computer games. Empir Softw Eng 24(6):4006–4033

    Article  Google Scholar 

  • MacFarland TW, Yates JM, MacFarland TW, Yates JM (2016) Mann–whitney u test. Introduction to nonparametric statistics for the biological sciences using R pp 103–132

  • Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics pp 50–60

  • Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41

    Article  Google Scholar 

  • Murphy-Hill E, Zimmermann T, Nagappan N (2014) Cowboys, ankle sprains, and keepers of quality: How is video game development different from software development? In: Proceedings of the 36th International Conference on Software Engineering, pp 1–11

  • Ozkok FO, Celik M (2017) A new approach to determine eps parameter of dbscan algorithm. Int J Intell Syst Appl Eng 5(4):247–251

    Article  Google Scholar 

  • Python (2023a) Opencv. https://opencv.org, [Online]

  • Python (2023b) spacy. https://spacy.io/, [Online]

  • Python (2023c) Video-kf. https://pypi.org/project/video-kf/, [Online]

  • Ramchoun H, Ghanou Y, Ettaouil M, Janati Idrissi MA (2016) Multilayer perceptron: Architecture optimization and training. Int J Interact Multimed Artif Intell 4(1):26–30

    Google Scholar 

  • Rong X (2014) word2vec parameter learning explained. arXiv preprint arXiv:14112738

  • Santos RE, Magalhães CV, Capretz LF, Correia-Neto JS, da Silva FQ, Saher A (2018) Computer games are serious business and so is their quality: particularities of software testing in game development from the perspective of practitioners. In: Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, pp 1–10

  • Scalabrino S, Bavota G, Russo B, Di Penta M, Oliveto R (2017) Listening to the crowd for the release planning of mobile apps. IEEE Trans Software Eng 45(1):68–86

    Article  Google Scholar 

  • Shah S, Guzdial M, Riedl MO (2019) Automated let’s play commentary. arXiv preprint arXiv:190902195

  • Souček T, Lokoč J (2020) Transnet v2: An effective deep network architecture for fast shot transition detection

  • Steam (2023a) Conan exiles. https://store.steampowered.com/app/440900/Conan_Exiles/

  • Steam (2023b) Dayz. https://store.steampowered.com/app/221100/DayZ/

  • Steam (2023c ) New world. https://store.steampowered.com/app/1063730/New_World/

  • Tang S, Feng L, Kuang Z, Chen Y, Zhang W (2018) Fast video shot transition localization with deep structured models. In: Asian Conference on Computer Vision, Springer, pp 577–592

  • Tian Y, Lo D, Lawall J (2014) Sewordsim: Software-specific word similarity database. In: Companion Proceedings of the 36th International Conference on Software Engineering, pp 568–571

  • Toy EJ, Kummaragunta JV, Yoo JS (2018) Large-scale cross-country analysis of steam popularity. In: 2018 International Conference on Computational Science and Computational Intelligence (CSCI), IEEE, pp 1054–1058

  • Truelove A, de Almeida ES, Ahmed I (2021) We’ll fix it in post: What do bug fixes in video game update notes tell us? In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), IEEE, pp 736–747

  • Wan T, Jun H, Zhang H, Pan W, Hua H (2015) Kappa coefficient: a popular measure of rater agreement. Shanghai Arch Psychiatry 27(1):62

    Google Scholar 

  • Wang Z, Bovik A, Sheikh H, Simoncelli E (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612. https://doi.org/10.1109/TIP.2003.819861

    Article  Google Scholar 

  • Wen Z, Tzerpos V (2004) An effectiveness measure for software clustering algorithms. In: Proceedings. 12th IEEE International Workshop on Program Comprehension, 2004., IEEE, pp 194–203

  • Zhang Y, Jin R, Zhou ZH (2010) Understanding bag-of-words model: a statistical framework. Int J Mach Learn Cybern 1(1):43–52

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Emanuela Guglielmi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by: Jin Guo, Raula Gaikovina Kula

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Registered Reports

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guglielmi, E., Scalabrino, S., Bavota, G. et al. Using gameplay videos for detecting issues in video games. Empir Software Eng 28, 136 (2023). https://doi.org/10.1007/s10664-023-10365-0

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-023-10365-0

Keywords

Navigation