ABSTRACT
The SoccerNet 2022 challenges were the second annual video understanding challenges organized by the SoccerNet team. In 2022, the challenges were composed of 6 vision-based tasks: (1) action spotting, focusing on retrieving action timestamps in long untrimmed videos, (2) replay grounding, focusing on retrieving the live moment of an action shown in a replay, (3) pitch localization, focusing on detecting line and goal part elements, (4) camera calibration, dedicated to retrieving the intrinsic and extrinsic camera parameters, (5) player re-identification, focusing on retrieving the same players across multiple views, and (6) multiple object tracking, focusing on tracking players and the ball through unedited video streams. Compared to last year's challenges, tasks (1-2) had their evaluation metrics redefined to consider tighter temporal accuracies, and tasks (3-6) were novel, including their underlying data and annotations. More information on the tasks, challenges and leaderboards are available on https://www.soccer-net.org. Baselines and development kits are available on https://github.com/SoccerNet.
- Andrei Boiarov and Eduard Tyantov. 2019. Large Scale Landmark Recognition via Deep Metric Learning. In ACM Int. Conf. Inf. Knowl. Manag. ACM, Beijing China, 169--178. https://doi.org/10.1145/3357384.3357956Google ScholarDigital Library
- Shimin Chen, Chen Chen, Wei Li, Xunqiang Tao, and Yandong Guo. 2022. Faster-TAD: Towards Temporal Action Detection with Proposal Generation and Classi#cation in a Uni#ed Network. arXiv abs/2204.02674 (2022), 16 pages. arXiv:2204.02674Google Scholar
- Bowen Cheng, Ishan Misra, Alexander G Schwing, Alexander Kirillov, and Rohit Girdhar. 2022. Masked-attention mask transformer for universal image segmentation. In IEEE Conf. Comput. Vis. Pattern Recog. New Orleans, LA, USA, 1290--1299.Google ScholarCross Ref
- Anthony Cioppa, Adrien Deliège, Silvio Giancola, Bernard Ghanem, and Marc Van Droogenbroeck. 2022. Scaling up SoccerNet with multi-view spatial localization and re-identi#cation. Scienti!c Data 9, 1 (June 2022), 1--9. https: //doi.org/10.1038/s41597-022-01469--1Google Scholar
- Anthony Cioppa, Adrien Deliège, Silvio Giancola, Floriane Magera, Olivier Barnich, Bernard Ghanem, and Marc Van Droogenbroeck. 2021. Camera Calibration and Player Localization in SoccerNet-v2 and Investigation of their Representations for Action Spotting. In IEEE Int. Conf. Comput. Vis. and Pattern Recogn. Work. (CVPRW), CVsports. Inst. Elect. and Electron. Engineers (IEEE), Nashville, TN, USA, 4537--4546. https://doi.org/10.1109/CVPRW53098.2021.00511Google ScholarCross Ref
- Anthony Cioppa, Silvio Giancola, Adrien Deliège, Le Kang, Xin Zhou, Cheng Zhiyu, Bernard Ghanem, and Marc Van Droogenbroeck. 2022. SoccerNetTracking: Multiple Object Tracking Dataset and Benchmark in Soccer Videos. In IEEE Int. Conf. Comput. Vis. and Pattern Recogn. Work. (CVPRW), CVsports. Inst. Elect. and Electron. Engineers (IEEE), New Orleans, LA, USA, 3491--3502.Google Scholar
- Bharath Comandur. 2022. Sports Re-ID: Improving Re-Identi#cation Of Players In Broadcast Videos Of Team Sports. arXiv abs/2206.02373 (2022), 11 pages. arXiv:2206.02373Google Scholar
- Ekin Dogus Cubuk, Barret Zoph, Dandelion Mané, Vijay Vasudevan, and Quoc V. Le. 2018. AutoAugment: Learning Augmentation Policies from Data. arXiv abs/1805.09501 (2018), 14 pages. arXiv:1805.09501Google Scholar
- Adrien Deliège, Anthony Cioppa, Silvio Giancola, Meisam J. Seikavandi, Jacob V. Dueholm, Kamal Nasrollahi, Bernard Ghanem, Thomas B. Moeslund, and Marc Van Droogenbroeck. 2021. SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of Broadcast Soccer Videos. In IEEE Int. Conf. Comput. Vis. and Pattern Recogn. Work. (CVPRW), CVsports. Inst. Elect. and Electron. Engineers (IEEE), Nashville, TN, USA, 4508--4519. https://doi.org/10.1109/CVPRW53098. 2021.00508Google ScholarCross Ref
- Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. 2019. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. In IEEE/CVF Conf. Comput. Vis. and Pattern Recogn. (CVPR). Inst. Elect. and Electron. Engineers (IEEE), Long Beach, CA, USA, 4685--4694. https://doi.org/10.1109/cvpr.2019.00482Google Scholar
- Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv abs/2010.11929 (2021), 22 pages. arXiv:2010.11929Google Scholar
- Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. 2019. SlowFast Networks for Video Recognition. In Int. Conf. Comput. Vis. Inst. Elect. and Electron. Engineers (IEEE), Seoul, South Korea, 6201--6210. https://doi.org/10. 1109/iccv.2019.00630Google Scholar
- Dengpan Fu, Dongdong Chen, Jianmin Bao, Hao Yang, Lu Yuan, Lei Zhang, Houqiang Li, and Dong Chen. 2021. Unsupervised Pre-training for Person Reidenti#cation. In IEEE/CVF Conf. Comput. Vis. and Pattern Recogn. (CVPR). Inst. Elect. and Electron. Engineers (IEEE), Nashville, TN, USA, 14745--14754. https: //doi.org/10.1109/cvpr46437.2021.01451Google Scholar
- Silvio Giancola, Mohieddine Amine, Tarek Dghaily, and Bernard Ghanem. 2018. SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos. In IEEE Int. Conf. Comput. Vis. and Pattern Recogn. Work. (CVPRW), CVsports. Inst. Elect. and Electron. Engineers (IEEE), Salt Lake City, UT, USA, 1711--1721. https: //doi.org/10.1109/CVPRW.2018.00223Google Scholar
- Dan Hendrycks, Norman Mu, Ekin D. Cubuk, Barret Zoph, Justin Gilmer, and Balaji Lakshminarayanan. 2019. AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty. arXiv abs/1912.02781 (2019), 15 pages. arXiv:1912.02781Google Scholar
- Xin Huang, Xinxin Wang, Wenyu Lv, Xiaying Bai, Xiang Long, Kaipeng Deng, Qingqing Dang, Shumin Han, Qiwen Liu, Xiaoguang Hu, et al. 2021. PP-YOLOv2: A practical object detector. arXiv abs/2104.10419 (2021), 7 pages. arXiv:2104.10419Google Scholar
- Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. 2020. Supervised Contrastive Learning. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., Virtual conference, 18661--18673.Google Scholar
- Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2020. Focal Loss for Dense Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 42, 2 (Feb. 2020), 318--327. https://doi.org/10.1109/tpami.2018.2858826Google ScholarCross Ref
- Ze Liu, Jia Ning, Yue Cao, Yixuan Wei, Zheng Zhang, Stephen Lin, and Han Hu. 2021. Video Swin Transformer. arXiv abs/2106.13230 (2021), 12 pages. arXiv:2106.13230Google Scholar
- Jonathon Luiten, Aljosa Osep, Patrick Dendorfer, Philip Torr, Andreas Geiger, Laura Leal-Taixé, and Bastian Leibe. 2021. HOTA: A higher order metric for evaluating multi-object tracking. Int. J. Comput. Vis. 129, 2 (Oct. 2021), 548--578. https://doi.org/10.1007/s11263-020-01375--2Google ScholarCross Ref
- Hao Luo, Pichao Wang, Yi Xu, Feng Ding, Yanxin Zhou, Fan Wang, Hao Li, and Rong Jin. 2021. Self-Supervised Pre-Training for Transformer-Based Person Re-Identi#cation. arXiv abs/2111.12084 (2021), 15 pages. arXiv:2111.12084Google Scholar
- Haowen Luo, Pichao Wang, Yi Xu, Feng Ding, Yanxin Zhou, Fan Wang, Hao Li, and Rong Jin. 2021. Self-Supervised Pre-Training for Transformer-Based Person Re-Identi#cation. arXiv abs/2111.12084 (2021), 15 pages. arXiv:2111.12084Google Scholar
- Florian Schro", Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A uni#ed embedding for face recognition and clustering. In IEEE/CVF Conf. Comput. Vis. and Pattern Recogn. (CVPR). Inst. Elect. and Electron. Engineers (IEEE), Boston, MA, USA, 815--823. https://doi.org/10.1109/cvpr.2015.7298682Google Scholar
- João V. B. Soares and Avijit Shah. 2022. Action Spotting using Dense Detection Anchors Revisited: Submission to the SoccerNet Challenge 2022. arXiv abs/2206.07846 (2022), 3 pages. arXiv:2206.07846Google Scholar
- João V. B. Soares, Avijit Shah, and Topojoy Biswas. 2022. Temporally Precise Action Spotting in Soccer Videos Using Dense Detection Anchors. arXiv abs/2205.10450 (2022), 5 pages. arXiv:2205.10450Google Scholar
- Aäron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation Learning with Contrastive Predictive Coding. arXiv abs/1807.03748 (2018), 13 pages. arXiv:1807.03748Google Scholar
- Xiaogang Wang, Gianfranco Doretto, Thomas Sebastian, Jens Rittscher, and Peter Tu. 2007. Shape and Appearance Context Modeling. In Int. Conf. Comput. Vis. Inst. Elect. and Electron. Engineers (IEEE), Rio de Janeiro, Brazil, 1--8. https: //doi.org/10.1109/iccv.2007.4409019Google Scholar
- Mitchell Wortsman, Gabriel Ilharco, Samir Yitzhak Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S. Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, and Ludwig Schmidt. 2022. Model soups: averaging weights of multiple #ne-tuned models improves accuracy without increasing inference time. arXiv abs/2203.05482 (2022), 34 pages. arXiv:2203.05482Google Scholar
- Mang Ye, Jianbing Shen, Gaojie Lin, Tao Xiang, Ling Shao, and Steven C. H. Hoi. 2022. Deep Learning for Person Re-Identi#cation: A Survey and Outlook. IEEE Trans. Pattern Anal. Mach. Intell. 44, 6 (June 2022), 2872--2893. https: //doi.org/10.1109/tpami.2021.3054775Google ScholarCross Ref
- Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. 2015. Scalable Person Re-identi#cation: A Benchmark. In Int. Conf. Comput. Vis. Inst. Elect. and Electron. Engineers (IEEE), Santiago, Chile, 1116--1124. https: //doi.org/10.1109/iccv.2015.133Google Scholar
- Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, and Yi Yang. 2020. Random Erasing Data Augmentation. In AAAI, Vol. 34. Association for the Advancement of Arti#cial Intelligence, New York, USA, 13001--13008. https://doi.org/10.1609/ aaai.v34i07.7000Google ScholarCross Ref
- Xin Zhou, Le Kang, Zhiyu Cheng, Bo He, and Jingyu Xin. 2021. Feature Combination Meets Attention: Baidu Soccer Embeddings and Transformer based Temporal Detection. arXiv abs/2106.14447 (2021), 7 pages. arXiv:2106.1444Google Scholar
Index Terms
- SoccerNet 2022 Challenges Results
Recommendations
Overview of the Multimedia Grand Challenges 2022
MM '22: Proceedings of the 30th ACM International Conference on MultimediaThe Multimedia Grand Challenge track was first presented as part of ACM Multimedia 2009 and has established itself as a prestigious competition in the multimedia community. The purpose of the Multimedia Grand Challenges is to engage the multimedia ...
Multi-task Learning for Joint Re-identification, Team Affiliation, and Role Classification for Sports Visual Tracking
MMSports '23: Proceedings of the 6th International Workshop on Multimedia Content Analysis in SportsEffective tracking and re-identification of players is essential for analyzing soccer videos. But, it is a challenging task due to the non-linear motion of players, the similarity in appearance of players from the same team, and frequent occlusions. ...
Appearance-based multiple hypothesis tracking
Soccer is a popular sport in the world with the growth of demand for automatically analyzing matches and tactics. Since players are the focus of attention in soccer videos and they manage the entire game, player tracking is fundamental to most soccer ...
Comments