ABSTRACT
In this paper, we present the second release of VideoCLIP, an interactive CLIP-based video retrieval system that participated in the Video Browser Showdown 2023. While we continue to use the underlying architecture to map the content between image and text, we concentrate on improving the user experience for novice users. Specifically, we have implemented three different query modalities and redesigned the user interface in order to adapt to the context of the Interactive Video Retrieval for Beginners (IVR4B) workshop. These modifications ultimately aim to provide newcomers with a simple and efficient user experience to locate the desired videos.
- Giuseppe Amato, Paolo Bolettieri, Fabio Carrara, Fabrizio Falchi, Claudio Gennaro, Nicola Messina, Lucia Vadicamo, and Claudio Vairo. 2022. VISIONE at Video Browser Showdown 2022. In MultiMedia Modeling(Lecture Notes in Computer Science), Björn Þór Jónsson, Cathal Gurrin, Minh-Triet Tran, Duc-Tien Dang-Nguyen, Anita Min-Chun Hu, Binh Huynh Thi Thanh, and Benoit Huet (Eds.). Springer International Publishing, Cham, 543–548. https://doi.org/10.1007/978-3-030-98355-0_52Google ScholarDigital Library
- Giuseppe Amato, Paolo Bolettieri, Fabio Carrara, Fabrizio Falchi, Claudio Gennaro, Nicola Messina, Lucia Vadicamo, and Claudio Vairo. 2023. VISIONE at Video Browser Showdown 2023. In MultiMedia Modeling(Lecture Notes in Computer Science), Duc-Tien Dang-Nguyen, Cathal Gurrin, Martha Larson, Alan F. Smeaton, Stevan Rudinac, Minh-Son Dao, Christoph Trattner, and Phoebe Chen (Eds.). Springer International Publishing, Cham, 615–621. https://doi.org/10.1007/978-3-031-27077-2_48Google ScholarDigital Library
- Fabian Berns, Luca Rossetto, Klaus Schoeffmann, Christian Beecks, and George Awad. 2019. V3C1 Dataset: An Evaluation of Content Characteristics. In Proceedings of the 2019 on International Conference on Multimedia Retrieval (Ottawa ON, Canada) (ICMR ’19). Association for Computing Machinery, New York, NY, USA, 334–338. https://doi.org/10.1145/3323873.3325051Google ScholarDigital Library
- Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2020. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. https://doi.org/10.48550/ARXIV.2010.11929Google ScholarCross Ref
- Cathal Gurrin, Björn Þór Jónsson, Klaus Schöffmann, Duc-Tien Dang-Nguyen, Jakub Lokoč, Minh-Triet Tran, Wolfgang Hürst, Luca Rossetto, and Graham Healy. 2021. Introduction to the Fourth Annual Lifelog Search Challenge, LSC’21. In Proceedings of the 2021 International Conference on Multimedia Retrieval (Taipei, Taiwan) (ICMR ’21). Association for Computing Machinery, New York, NY, USA, 690–691. https://doi.org/10.1145/3460426.3470945Google ScholarDigital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. https://doi.org/10.48550/ARXIV.1512.03385Google ScholarCross Ref
- Silvan Heller, Viktor Gsteiger, Werner Bailer, Cathal Gurrin, Björn Þór Jónsson, Jakub Lokoč, Andreas Leibetseder, František Mejzlík, Ladislav Peška, Luca Rossetto, Konstantin Schall, Klaus Schoeffmann, Heiko Schuldt, Florian Spiess, Ly-Duyen Tran, Lucia Vadicamo, Patrik Veselý, Stefanos Vrochidis, and Jiaxin Wu. 2022. Interactive video retrieval evaluation at a distance: comparing sixteen interactive video search systems in a remote setting at the 10th Video Browser Showdown. International Journal of Multimedia Information Retrieval 11 (2022), 1 – 18.Google ScholarCross Ref
- Nico Hezel, Konstantin Schall, Klaus Jung, and Kai Uwe Barthel. 2022. Efficient Search and Browsing of Large-Scale Video Collections with Vibro. In MultiMedia Modeling(Lecture Notes in Computer Science), Björn Þór Jónsson, Cathal Gurrin, Minh-Triet Tran, Duc-Tien Dang-Nguyen, Anita Min-Chun Hu, Binh Huynh Thi Thanh, and Benoit Huet (Eds.). Springer International Publishing, Cham, 487–492. https://doi.org/10.1007/978-3-030-98355-0_43Google ScholarDigital Library
- Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data 7, 3 (2019), 535–547.Google ScholarCross Ref
- Jakub Lokoč, Patrik Veselý, František Mejzlík, Gregor Kovalčík, Tomáš Souček, Luca Rossetto, Klaus Schoeffmann, Werner Bailer, Cathal Gurrin, Loris Sauter, Jaeyub Song, Stefanos Vrochidis, Jiaxin Wu, and Björn þóR Jónsson. 2021. Is the Reign of Interactive Search Eternal? Findings from the Video Browser Showdown 2020. ACM Trans. Multimedia Comput. Commun. Appl. 17, 3, Article 91 (jul 2021), 26 pages. https://doi.org/10.1145/3445031Google ScholarDigital Library
- Thao-Nhu Nguyen, Tu-Khiem Le, Van-Tu Ninh, Minh-Triet Tran, Thanh Binh Nguyen, Graham Healy, Sinéad Smyth, Annalina Caputo, and Cathal Gurrin. 2023. E-LifeSeeker: An Interactive Lifelog Search Engine for LSC’23. In Proceedings of the 6th Annual on Lifelog Search Challenge (Thessaloniki, Greece) (LSC’23). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3592573.3593098Google ScholarDigital Library
- Thao-Nhu Nguyen, Bunyarit Puangthamawathanakun, Annalina Caputo, Graham Healy, Binh T. Nguyen, Chonlameth Arpnikanondt, and Cathal Gurrin. 2023. VideoCLIP: An Interactive CLIP-Based Video Retrieval System At VBS2023. In MultiMedia Modeling: 29th International Conference, MMM 2023, Bergen, Norway, January 9–12, 2023, Proceedings, Part I (Bergen, Norway). Springer-Verlag, Berlin, Heidelberg, 671–677. https://doi.org/10.1007/978-3-031-27077-2_57Google ScholarDigital Library
- Thao-Nhu Nguyen, Bunyarit Puangthamawathanakun, Graham Healy, Binh T. Nguyen, Cathal Gurrin, and Annalina Caputo. 2022. Videofall - A Hierarchical Search Engine for VBS2022. In MultiMedia Modeling, Björn Þór Jónsson, Cathal Gurrin, Minh-Triet Tran, Duc-Tien Dang-Nguyen, Anita Min-Chun Hu, Binh Huynh Thi Thanh, and Benoit Huet (Eds.). Springer International Publishing, Cham, 518–523.Google Scholar
- Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 8748–8763. https://proceedings.mlr.press/v139/radford21a.htmlGoogle Scholar
- Luca Rossetto, Klaus Schoeffmann, and Abraham Bernstein. 2021. Insights on the V3C2 Dataset. CoRR abs/2105.01475 (2021). arXiv:2105.01475https://arxiv.org/abs/2105.01475Google Scholar
- Konstantin Schall, Nico Hezel, Klaus Jung, and Kai Uwe Barthel. 2023. Vibro: Video Browsing with Semantic and Visual Image Embeddings. In MultiMedia Modeling, Duc-Tien Dang-Nguyen, Cathal Gurrin, Martha Larson, Alan F. Smeaton, Stevan Rudinac, Minh-Son Dao, Christoph Trattner, and Phoebe Chen (Eds.). Springer International Publishing, Cham, 665–670.Google Scholar
Index Terms
- Efficient Search with an Interactive Video Retrieval System for Novice Users in IVR4B
Recommendations
VISIONE for newbies: an easier-to-use video retrieval system
CBMI '23: Proceedings of the 20th International Conference on Content-based Multimedia IndexingThis paper presents a revised version of the VISIONE video retrieval system, which offers a wide range of search functionalities, including free text search, spatial color and object search, visual and semantic similarity search, and temporal search. ...
Merging storyboard strategies and automatic retrieval for improving interactive video search
CIVR '07: Proceedings of the 6th ACM international conference on Image and video retrievalThe Carnegie Mellon University Informedia group has enjoyed consistent success with TRECVID interactive search using traditional storyboard interfaces for shot-based retrieval. For TRECVID 2006 the output of automatic search was included for the first ...
K-Space Interactive Search
CIVR '08: Proceedings of the 2008 international conference on Content-based image and video retrievalIn this paper we will present the K-Space1 Interactive Search system for content-based video information retrieval to be demonstrated in the VideOlympics. This system is an extension of the system we developed as part of our participation in TRECVID ...
Comments