Minimum-risk temporal alignment of videos

Wang, Zhen; Piccardi, Massimo

doi:10.1007/s11042-017-5073-3

Minimum-risk temporal alignment of videos

Published: 14 August 2017

Volume 77, pages 14891–14906, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

126 Accesses
Explore all metrics

Abstract

Temporal alignment of videos is an important requirement of tasks such as video comparison, analysis and classification. Most of the approaches proposed to date for video alignment leverage dynamic programming algorithms whose parameters are manually tuned. Conversely, this paper proposes a model that can learn its parameters automatically by minimizing a meaningful loss function over a given training set of videos and alignments. For learning, we exploit the effective framework of structural SVM and we extend it with an original scoring function that suitably scores the alignment of two given videos, and a loss function that quantifies the accuracy of a predicted alignment. The experimental results from four video action datasets show that the proposed model has been able to outperform a baseline and a state-of-the-art algorithm by a large margin in terms of alignment accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

Anderson TW (1984) An introduction to multivariate statistical analysis. Wiley
Bengio Y, Frasconi P (1994) An input output HMM architecture. In: Proceedings of the 7th International Conference on Neural Information Processing Systems (NIPS), pp 427–434
Google Scholar
Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: Proceedings of KDD-94, AAAI-94 Workshop on Knowledge Discovery in Databases, pp 359–370
Google Scholar
Caiani E, Porta A, Baselli G, Turiel M, Muzzupappa S, Pieruzzi F, Crema C, Malliani A, Cerutti S (1998) Warped-average template technique to track on a cycle-by-cycle basis the cardiac filling phases on left ventricular volume
Cosine distance. http://reference.wolfram.com/language/ref/CosineDistance.html
Durbin R, Eddy S, Krogh A, Mitchison G (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press
Gong D, Medioni GG (2011) Dynamic manifold warping for view invariant action recognition. In: ICCV, pp 571–578
Google Scholar
Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2007) Actions as space-time shapes. IEEE Trans Pattern Anal Mach Intell 29(12):2247–2253
Article Google Scholar
Gritai A, Sheikh Y, Shah M (2004) On the use of anthropometry in the invariant analysis of human actions. In: 17th International Conference on Pattern Recognition (ICPR’04), pp 923–926
Google Scholar
Hsu E, Pulli K, Popović J (2005) Style translation for human motion. ACM Trans Graph 24(3):1082–1089
Article Google Scholar
Joachims T SVM struct. https://www.cs.cornell.edu/people/tj/svm_light/svm_struct.html
Joachims T, Galor T, Elber R (2005) Learning to align sequences: A maximum-margin approach. In: New Algorithms for Macromolecular Simulation, B. Leimkuhler, LNCS Vol 49, Springer, pp 57–69
Google Scholar
Joachims T, Finley T, Yu CJ (2009) Cutting-plane training of structural SVMs. Mach Learn 77(1):27–59
Article MATH Google Scholar
Keogh E, Pazzani M (1998) An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. In: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD’98), pp 239–241
Google Scholar
Keogh EJ, Pazzani MJ (2001) Derivative dynamic time warping. In: Proceedings of First SIAM International Conference on Data Mining (SDM’2001)
Google Scholar
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008), pp 1–8
Google Scholar
Maurer CR, Qi R, Raghavan V, Member S (2003) A linear time algorithm for computing exact Euclidean distance transforms of binary images in arbitrary dimensions. IEEE Trans Pattern Anal Mach Intell 25(2):265–270
Article Google Scholar
Myers C, Rabiner L, Rosenberg A (1980) Performance tradeoffs in dynamic time warping algorithms for isolated word recognition. IEEE Trans Acoust Speech, Signal Process 28(6):623–635
Article MATH Google Scholar
Niebles JC, Chen C-W, Fei-Fei L (2010) Modeling temporal structure of decomposable motion segments for activity classification. In: Proceedings 11th European Conference in Computer Vision, pp 392–405
Google Scholar
Rabiner L, Juang B (1993) Fundamentals of speech recognition prentice-hall signal processing series. Englewood Cliffs, New Jersey
Google Scholar
Ryan MS, Nudd GR (1993) The viterbi algorithm. Technical Report, Coventry, UK
Google Scholar
Sakoe H, Chiba S (1990) Readings in speech recognition. chapter Dynamic Programming Algorithm Optimization for Spoken Word Recognition. Morgan Kaufmann Publishers Inc, CA, USA, pp 159–165
Google Scholar
Skutkova H, Vítek M, Babula P, Kizek R, Provaznik I (2013) Classification of genomic signals using dynamic time warping. BMC Bioinforma 14 (S-10):S1
Article Google Scholar
Soomro K, Zamir AR, Shah M (2012) UCF101 A dataset of 101 human actions classes from videos in the wild CoRR, arXiv:abs/1212.0402
Tsochantaridis I, Joachims T, Hofmann T, Altun Y (2005) Large margin methods for structured and interdependent output variables. JMLR 6:1453–1484
MathSciNet MATH Google Scholar
Vedaldi A, Fulkerson B (2010) VLFeat An open and portable library of computer vision algorithms. In: Proceedings of the 18th ACM International Conference on Multimedia, MM ’10, pp 1469–1472
Google Scholar
Wang Z, Piccardi M (2016) A pair hidden Markov support vector machine for alignment of human actions. In: Proceedings of the 2016 IEEE International Conference on Multimedia and Expo (ICME), pp 800–805
Google Scholar
Wu Y (2012) Mining actionlet ensemble for action recognition with depth cameras. In: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1290–1297
Google Scholar
Zhou F Software for canonical time warping. http://www.f-zhou.com/ta_code.html
Zhou F, De la Torre F (2016) Generalized canonical time warping. IEEE Trans Pattern Anal Mach Intell 38(2):279–294
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Technology Sydney, Broadway, NSW, Australia
Zhen Wang & Massimo Piccardi

Authors

Zhen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Massimo Piccardi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhen Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Z., Piccardi, M. Minimum-risk temporal alignment of videos. Multimed Tools Appl 77, 14891–14906 (2018). https://doi.org/10.1007/s11042-017-5073-3

Download citation

Received: 11 January 2017
Revised: 20 June 2017
Accepted: 31 July 2017
Published: 14 August 2017
Issue Date: June 2018
DOI: https://doi.org/10.1007/s11042-017-5073-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Minimum-risk temporal alignment of videos

Abstract

Access this article

Similar content being viewed by others

Semi-global Alignment of Range Videos

Aligning Videos in Space and Time

Non-local NetVLAD Encoding for Video Classification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Minimum-risk temporal alignment of videos

Abstract

Access this article

Similar content being viewed by others

Semi-global Alignment of Range Videos

Aligning Videos in Space and Time

Non-local NetVLAD Encoding for Video Classification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation