Published March 28, 2017 | Version v1
Dataset Open

Concept detection scores for the MED16train dataset (TRECVID MED task)

Description

We provide concept detection scores for the MED16train dataset which is used at the TRECVID Multimedia Event Detection (MED) task [1]. First, each video is decoded into a set of keyframes at fixed temporal intervals (2 keyframes per second). Then, we calculated concept detection scores for the two following concept sets: i) 487 sport-related concepts from YouTube Sports-1M Dataset[1] and ii) 345 TRECVID SIN concepts [3]. The scores have been generated as follows:
1) For the 487 concepts for the Sports-1M Dataset, a Googlenet network [4] originally trained on 5055 ImageNet concepts was fine-tuned, following the extension strategy of [2] with one extension layer of dimension 128.
2) For the 345 TRECVID SIN concepts, a pre-trained Googlenet network [4] on 5055 ImageNet concepts was fine-tuned on these concepts, again following the extension strategy of [2] with one extension layer of dimension 1024. 

After unpacking the compressed file two different folders can be found, namely "Prob_sports_MED16train" and "Prob_SIN_MED16train", one for each concept set. We provide one file for every video of the MED16train dataset for each concept set. Each file consists of N columns (where N = 345 for TRECVID SIN and N = 487 for Sports-1M Dataset) and M rows (where M is the number of extracted keyframes for the corresponding video). Each column corresponds to a different concept, with all concept scores being in the range [0,1]. The higher the score the more likely that the corresponding concept appears in the keyframe. Two additional files are provided; files "sports_487_Classes.txt" and "SIN_345_Classes.txt" indicate the order of the concepts that is used in the concept score files.

[1] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar and L. Fei-Fei, "Large-scale video classification with convolutional neural networks", In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1725-1732, 2014.
[2] N. Pittaras, F. Markatopoulou, V. Mezaris and I. Patras, "Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neural Networks", Proc. 23rd Int. Conf. on MultiMedia Modeling (MMM'17), Reykjavik, Iceland, Springer LNCS vol. 10132, pp. 102-114, Jan. 2017.
[3] G. Awad, C. Snoek, A. Smeaton, and G. Quénot, "TRECVid semantic indexing of video: a 6-year retrospective", ITE Transactions on Media Technology and Applications, 4 (3). pp. 187-208, 2016.
[4] C. Szegedy, Wei Liu, Yangqing Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke and A. Rabinovich, "Going deeper with convolutions", In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-9, 2015.

Notes

Linked publications: (1) N. Pittaras, F. Markatopoulou, V. Mezaris, I. Patras, "Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neural Networks", Proc. 23rd Int. Conf. on MultiMedia Modeling (MMM'17), Reykjavik, Iceland, Jan. 2017 (2) F. Markatopoulou, A. Moumtzidou, D. Galanopoulos, T. Mironidis, V. Kaltsa, A. Ioannidou, S. Symeonidis, K. Avgerinakis, S. Andreadis, I. Gialampoukidis, S. Vrochidis, A. Briassouli, V. Mezaris, I. Kompatsiaris, I. Patras, "ITI-CERTH participation to TRECVID 2016", In TRECVID 2016 Workshop, Gaithersburg, MD, USA, 2016.

Files

Files (5.8 GB)

Name Size Download all
md5:e727a0453fd3cadc1993747728aaabb0
5.8 GB Download

Additional details

Funding

MOVING – Training towards a society of data-savvy information professionals to enable open leadership innovation 693092
European Commission