Abstract
Siamese network-based trackers are commonly used for tracking due to their balanced accuracy and speed. However, the fixed template from the initial frame is used to localize the target in the current frame, and never being updated in Siamese trackers. This loses the power of updating the appearance model online and easily induces tracking failures for the intrinsic properties of tracking processes such as constantly changing scenes and endless distractors. To address this problem, we propose a simple yet effective visual tracking framework for updating the appearance model in a novel formulation that introduces a peculiar mixup method both for training and inference phase of Siamese trackers (named PMUM-Siam). It consists of a template matching network and a mixup module. First, instead of center-cropping the inexact predictions for tracking, we use the template matching network which trained with predefined anchor boxes to learn to select the best candidate from similar distractors. Second, the mixup module is used to fuse and update the trade-off appearance model between the best candidate and the groundtruth. Our method greatly enhances the capability of target identification and target localization in Siamese trackers. To further demonstrate the generality of the proposed method, we integrate our PMUM-Siam into two representative Siamese trackers (SiamFC and SiamRPN+ +). Extensive experimental results and comparisons on five challenging object tracking benchmarks including OTB-2013, OTB-2015, OTB-50, VOT-2016, and VOT-2018 show that PMUM-Siam achieves leading performance with an average speed of 300 FPS.
Similar content being viewed by others
References
Li X, Hu W, Shen C, Zhang Z, Dick A, Hengel AVD (2013) A survey of appearance models in visual object tracking. ACM Trans Intell Syst Technol (TIST) 4(4):1–48
Smeulders AW, Chu DM, Cucchiara R, Calderara S, Dehghan A, Shah M (2013) Visual tracking: an experimental survey. IEEE Trans Pattern Anal Mach Intell 36(7):1442–1468
Wu Y, Lim J, Yang M-H (2013) Online object tracking: a benchmark. In: Proceedings of the IEEE conference computer vision pattern recognition, pp 2411–2418
Yang H, Shao L, Zheng F, Wang L, Song Z (2011) Recent advances and trends in visual tracking: a review. Neurocomputing 74(18):3823–3831
Sun S, Akhtar N, Song H, Mian A, Shah M (2019) Deep affinity network for multiple object tracking. IEEE Trans Pattern Anal Mach Intell 43(1):104–119
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016) Fully-convolutional siamese networks for object tracking. In: European conference on computer vision. Springer, pp 850– 865
Held D, Thrun S, Savarese S (2016) Learning to track at 100 fps with deep regression networks. In: European conference on computer vision. Springer, pp 749–765
Tao R, Gavves E, Smeulders AW (2016) Siamese instance search for tracking. In: Proceedings of the IEEE conference computer vision and pattern recognition, pp 1420–1429
Fan H, Ling H (2019) Siamese cascaded region proposal networks for real-time visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7952–7961
Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8971–8980
Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) Siamrpn+ +: evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4282–4291
Zhang Z, Peng H (2019) Deeper and wider siamese networks for real-time visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4591–4600
Bolme DS, Beveridge JR, Draper BA, Lui YM (2010) Visual object tracking using adaptive correlation filters. In: 2010 IEEE Computer society conference on computer vision and pattern recognition. IEEE, pp 2544–2550
Danelljan M, Hager G, Shahbaz Khan F, Felsberg M (2015) Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 4310–4318
Danelljan M, Robinson A, Khan FS, Felsberg M (2016) Beyond correlation filters: learning continuous convolution operators for visual tracking. In: European conference on computer vision. Springer, pp 472–488
Henriques JF, Caseiro R, Martins P, Batista J (2014) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596
Kiani Galoogahi H, Fagg A, Lucey S (2017) Learning background-aware correlation filters for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 1135–1143
Zhang L, Gonzalez-Garcia A, Weijer JVD, Danelljan M, Khan FS (2019) Learning the model update for siamese trackers. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4010–4019
Wu Y, Lim J, Yang M-H (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848
Hadfield S., Bowden R., Lebeda K. (2016) The visual object tracking vot2016 challenge results. Lect. Notes Comput. Sci 9914:777–823
Kristan M, Leonardis A, Matas J, Felsberg M, Pflugfelder R, Cehovin Zajc L, Vojir T, Bhat G, Lukezic A, Eldesokey A et al (2018) The sixth visual object tracking vot2018 challenge results. In: Proceedings of the European conference on computer vision (ECCV) workshops, pp 0–0
Xu Y, Wang Z, Li Z, Yuan Y, Yu G (2020) Siamfc+ +: towards robust and accurate visual tracking with target estimation guidelines. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 12549–12556
Zhang Z, Peng H, Fu J, Li B, Hu W (2020) Ocean: object-aware anchor-free tracking. In: Computer Vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXI 16. Springer, pp 771–787
Shen J, Tang X, Dong X, Shao L (2019) Visual object tracking by hierarchical attention siamese network. IEEE Trans Cybern 50(7):3068–3080
Chen Z, Zhong B, Li G, Zhang S, Ji R (2020) Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6668–6677
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:91–99
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Chen X, Yan B, Zhu J, Wang D, Yang X, Lu H (2021) Transformer tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8126–8135
Liu S, Liu D, Srivastava G, Poȧp D, Woźniak M (2021) Overview and methods of correlation filter algorithms in object tracking. Complex Intell Syst 7(4):1895–1917
Zhang J, Sun J, Wang J, Yue X-G (2021) Visual object tracking based on residual network and cascaded correlation filters. J Ambient Intell Humanized Comput 12(8):8427–8440
Danelljan M, Bhat G, Shahbaz Khan F, Felsberg M (2017) Eco: efficient convolution operators for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6638–6646
Danelljan M, Bhat G, Khan FS, Felsberg M (2019) Atom: accurate tracking by overlap maximization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4660–4669
Shewchuk JR, et al. (1994) An introduction to the conjugate gradient method without the agonizing pain. Carnegie-mellon University. Department of Computer Science Pittsburgh
Guo H, Mao Y, Zhang R (2019) Mixup as locally linear out-of-manifold regularization. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 3714–3722
Zhang H, Cisse M, Dauphin YN, Lopez-Paz D (2018) Mixup: beyond empirical risk minimization. In: International conference on learning representations
Thulasidasan S, Chennupati G, Bilmes JA, Bhattacharya T, Michalak S (2019) On mixup training: improved calibration and predictive uncertainty for deep neural networks. Adv Neural Inf Process Syst 32:13888–13899
Carratino L, Cissé M, Jenatton R, Vert J-P (2020) On mixup regularization. arXiv:2006.06049
Zhang L, Deng Z, Kawaguchi K, Ghorbani A, Zou J (2020) How does mixup help with robustness and generalization?. In: International conference on learning representations
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252
Real E, Shlens J, Mazzocchi S, Pan X, Vanhoucke V (2017) Youtube-boundingboxes: a large high-precision human-annotated data set for object detection in video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5296–5305
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, pp 740–755
Danelljan M, Hager G, Shahbaz Khan F, Felsberg M (2015) Convolutional features for correlation filter based visual tracking. In: Proceedings of the IEEE international conference on computer vision workshops, pp 58–66
Danelljan M, Hager G, Shahbaz Khan F, Felsberg M (2016) Adaptive decontamination of the training set: a unified formulation for discriminative visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1430–1438
Ma C, Huang JB, Yang X, Yang MH (2015) Hierarchical convolutional features for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 3074–3082
Hong S, You T, Kwak S, Han B (2015) Online tracking by learning discriminative saliency map with convolutional neural network. In: International conference on machine learning. PMLR, pp 597–606
Ma C, Yang X, Zhang C, Yang MH (2015) Long-term correlation tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5388–5396
Bertinetto L, Valmadre J, Golodetz S, Miksik O, Torr PH (2016) Staple: complementary learners for real-time tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1401–1409
Qi Y, Zhang S, Qin L, Yao H, Huang Q, Lim J, Yang M-H (2016) Hedged deep tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4303–4311
Li Y, Zhu J (2014) A scale adaptive kernel correlation filter tracker with feature integration. In: European conference on computer vision. Springer, pp 254–265
Qi Y, Qin L, Zhang S, Huang Q, Yao H (2019) Robust visual tracking via scale-and-state-awareness. Neurocomputing 329:75–85
Nam H, Baek M, Han B (2016) Modeling and propagating cnns in a tree structure for visual tracking. arXiv:1608.07242
Bhat G, Johnander J, Danelljan M, Khan FS, Felsberg M (2018) Unveiling the power of deep tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 483–498
Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware siamese networks for visual object tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 101–117
Xu T, Feng Z-H, Wu X-J, Kittler J (2019) Learning adaptive discriminative correlation filters via temporal consistency preserving spatial feature selection for robust visual object tracking. IEEE Trans Image Process 28(11):5596–5609
Zhu G, Porikli F, Li H (2016) Beyond local search: tracking objects everywhere with instance-specific proposals. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 943–951
Acknowledgements
The authors gratefully acknowledge the scientific support and High Performance Computing (HPC) resources provided by the Erlangen National High Performance Computing Center (NHR@FAU) of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU). The hardware is funded by the German Research Foundation (DFG). The work of Fei Wu was supported by the China Scholarship Council (CSC) from the Ministry of Education, China.
Funding
The first author is funded by the China Scholarship Council (CSC) from the Ministry of Education of P.R. China.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wu, F., Zhang, J., Xu, Z. et al. Updating Siamese trackers using peculiar mixup. Appl Intell 53, 22531–22545 (2023). https://doi.org/10.1007/s10489-023-04546-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04546-z