Abstract
Concept detection for semantic annotation of video fragments (e.g. keyframes) is a popular and challenging problem. A variety of visual features is typically extracted and combined in order to learn the relation between feature-based keyframe representations and semantic concepts. In recent years the available pool of features has increased rapidly, and features based on deep convolutional neural networks in combination with other visual descriptors have significantly contributed to improved concept detection accuracy. This work proposes an algorithm that dynamically selects, orders and combines many base classifiers, trained independently with different feature-based keyframe representations, in a cascade architecture for video concept detection. The proposed cascade is more accurate and computationally more efficient, in terms of classifier evaluations, than state-of-the-art classifier combination approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bao, L., et al.: CMU-Informedia@TRECVID 2011 semantic indexing. In: TRECVID 2011 Workshop, Gaithersburg, MD, USA (2011)
Bay, H., et al.: Speeded-up robust features (surf). Comput. Vis. Image Underst. 110(3), 346–359 (2008)
Chellapilla, K., Shilman, M., Simard, P.Y.: Combining multiple classifiers for faster optical character recognition. In: Bunke, H., Spitz, A.L. (eds.) DAS 2006. LNCS, vol. 3872, pp. 358–367. Springer, Heidelberg (2006)
Cheng, W.C., Jhan, D.M.: A cascade classifier using adaboost algorithm and support vector machine for pedestrian detection. In: IEEE International Conference on SMC, pp. 1430–1435 (2011)
Jegou, H., et al.: Aggregating local descriptors into a compact image representation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2010), San Francisco, CA, pp. 3304–3311 (2010)
Krizhevsky, A., Ilya, S., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105. Curran Associates, Inc., Red Hook (2012)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Markatopoulou, F., Pittaras, N., Papadopoulou, O., Mezaris, V., Patras, I.: A study on the use of a binary local descriptor and color extensions of local descriptors for video concept detection. In: He, X., Luo, S., Tao, D., Xu, C., Yang, J., Hasan, M.A. (eds.) MMM 2015, Part I. LNCS, vol. 8935, pp. 282–293. Springer, Heidelberg (2015)
Markatopoulou, F., Mezaris, V., Patras, I.: Cascade of classifiers based on binary, non-binary and deep convolutional network descriptors for video concept detection. In: IEEE International Conference on Image Processing (ICIP 2015). IEEE, Canada (2015)
Nguyen, C., Vu Le, H., Tokuyama, T.: Cascade of multi-level multi-instance classifiers for image annotation. In: KDIR 2011, pp. 14–23 (2011)
Over, P., et al.: Trecvid 2013 - an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of TRECVID 2013. NIST, USA (2013)
Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Maragos, P., Paragios, N., Daniilidis, K. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010)
Safadi, B., Quénot, G.: Re-ranking by local re-scoring for video indexing and retrieval. In: 20th ACM International Conference on Information and Knowledge Management, pp. 2081–2084. ACM, NY (2011)
Sidiropoulos, P., Mezaris, V., Kompatsiaris, I.: Video tomographs and a base detector selection strategy for improving large-scale video concept detection. IEEE Trans. Circ. Syst. Video Technol. 24(7), 1251–1264 (2014)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv technical report (2014)
Strat, S.T., Benoit, A., Bredin, H., Quénot, G., Lambert, P.: Hierarchical late fusion for concept detection in videos. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012 Ws/Demos, Part III. LNCS, vol. 7585, pp. 335–344. Springer, Heidelberg (2012)
Szegedy, C., et al.: Going deeper with convolutions. In: CVPR 2015 (2015). http://arxiv.org/abs/1409.4842
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: CVPR (2001), vol. 1, pp. 511–518 (2001)
Yilmaz, E., Kanoulas, E., Aslam, J.A.: A simple and efficient sampling method for estimating AP and NDCG. In: 31st ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 603–610. ACM, USA (2008)
Acknowledgements
This work was supported by the European Commission under contract FP7-600826 ForgetIT.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Markatopoulou, F., Mezaris, V., Patras, I. (2016). Ordering of Visual Descriptors in a Classifier Cascade Towards Improved Video Concept Detection. In: Tian, Q., Sebe, N., Qi, GJ., Huet, B., Hong, R., Liu, X. (eds) MultiMedia Modeling. MMM 2016. Lecture Notes in Computer Science(), vol 9516. Springer, Cham. https://doi.org/10.1007/978-3-319-27671-7_73
Download citation
DOI: https://doi.org/10.1007/978-3-319-27671-7_73
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27670-0
Online ISBN: 978-3-319-27671-7
eBook Packages: Computer ScienceComputer Science (R0)