Multi-view region proposal network predictive learning for tracking

Guo, Wen; Li, Dong; Liang, Bowen; Shan, Bin

doi:10.1007/s00530-022-01001-w

Multi-view region proposal network predictive learning for tracking

Regular Paper
Published: 21 September 2022

Volume 29, pages 333–346, (2023)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Wen Guo ORCID: orcid.org/0000-0002-3691-4942¹,
Dong Li¹,
Bowen Liang¹ &
…
Bin Shan¹

162 Accesses
Explore all metrics

Abstract

Visual tracking is one of the most challenging problems in computer vision. Most state-of-the-art visual trackers suffer from three challenging problems: nondiverse discriminate feature representation, coarse object locator, and limited quantities of positive samples. In this paper, a multi-view multi-expert region proposal prediction algorithm for tracking is proposed to solve the above problems concurrently in one framework. The proposed algorithm integrates multiple views and exploits powerful multiple sources of information, which can solve nondiverse discriminate feature representation problem effectively. It builds multiple SVM classifier models on the expanded bounding boxes and adds the regional suggestion network module to accurately optimize it to predict optimal object location, which naturally alleviates the coarse object locator and limited quantities of positive samples problems at the same time. A comprehensive evaluation of the proposed approach on various benchmark sequences has been performed. The evaluation results demonstrate our method can significantly improve the tracking performance by combining the advantages of lightweight region proposal network predictive learning model and multi-view expert groups. The experimental results demonstrate the proposed approach outperforms other state-of-the-art visual trackers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

Tausif Diwan, G. Anirudh & Jitendra V. Tembhurne

End-to-End Object Detection with Transformers

References

Yilmaz, A., Javed, O., Shah, M.: Object tracking: a survey. ACM Comput. Surv. 38(4), 1–45 (2006)
Article Google Scholar
Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2411–2418 (2013)
Smeulders, A.W., Chu, D.M., Cucchiara, R., et al.: Visual tracking: an experimental survey. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1442–1468 (2014)
Article Google Scholar
Danelljan, M., Hager, G., Khan, F., Felsberg, M.: Accurate scale estimation for robust visual tracking. In: Proceedings of British machine vision conference, pp. 1–11 (2014)
Gao, J., Ling, H., Hu, W., Xing, J.: Transfer learning based visual tracking with Gaussian processes regression. In: Proceedings of European Conference on Computer Vision, pp. 188–203 (2014)
Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2015)
Article Google Scholar
Zhang, J., Ma, S., Sclaroff, S.: MEEM: robust tracking via multiple experts using entropy minimization. In: Proceedings of European Conference on Computer Vision, pp. 188–203 (2014)
Hare, S., Saffari, A., Torr, P.: Struck: structured output tracking with kernels. In: Proceedings of IEEE International Conference on Computer Vision, pp. 263–270 (2011)
Zhang, K., Zhang, L., Yang, M.-H.: Real-time compressive tracking. In: Proceedings of European Conference on Computer Vision, pp. 866–879 (2012)
Wang, N., Shi, J., Yeung, D., Jia, J.: Understanding and diagnosing visual tracking systems. In: Proceedings of IEEE International Conference on Computer Vision, pp. 3101–3109 (2015)
Comaniciu, D., Ramesh, V., Meer, P.: Real-time tracking of non-rigid objects using mean shift. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 142–149 (2002)
Ahonen, T., Hadid, A., Pietikainen, M.: Face description with local binary patterns: application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 28(12), 2037–2041 (2006)
Article MATH Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 886–893 (2005)
Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1409–1422 (2012)
Article Google Scholar
Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2016)
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.: Fully-convolutional siamese networks for object tracking. In: Proceedings of European Conference on Computer Vision, pp. 850–865 (2016)
Jack, V., Luca, B.: End-to-end representation learning for Correlation Filter based tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2017)
Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: Proceedings of IEEE International Conference on Computer Vision (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of Conference and Workshop on Neural Information Processing Systems (2015)
Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: SiamRPN++: evolution of siamese visual tracking with very deep networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2019)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Zhang, Y., Sohny, K., Villegasy, R.: ”Improving Object Detection with Deep Convolutional Networks via Bayesian Optimization and Structured Prediction,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (2015): 249C-258
Yoon, J., Kim, D., Yoon, K.: Visual tracking via adaptive tracker selection with multiple features. In: Proceedings of European Conference on Computer Vision, pp. 28–41 (2012)
Ma, L., Lu, J., Feng, J., Zhou, J.: Multiple feature fusion via weighted entropy for visual tracking. In: Proceedings of European Conference on Computer Vision, pp. 3128–3136 (2015)
Grabner, H., Bischof, H.: Online boosting and vision. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 260–267 (2006)
Hong, Z., Mei, X., Prokhorov, D., Tao, D.: Tracking via robust multi-task multi-view joint sparse representation. In: Proceedings of European Conference on Computer Vision, pp. 649–656 (2013)
Danelljan, M., Shahbaz Khan, F., Felsberg, M., Weijer, J.: Adaptive color attributes for real-time visual tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1090–1097 (2014)
Medioni, G., Vo, N., Ba, T.: Context tracker: exploring supporters and distracters in unconstrained environments. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp.1177–1184 (2011)
Ross, D.A., Lim, J., Lin, R.S., Yang, M.H.: Incremental learning for robust visual tracking. Int. J. Comput. Vis. 77(3), 125–141 (2008)
Article Google Scholar
Sun, X., Yao, H., Zhang, S., Li, D.: Non-rigid object contour tracking via a novel supervised level set model. IEEE Trans. Image Process. 24(11), 3386–99 (2015)
Article MathSciNet MATH Google Scholar
Mei, X., Ling, H., Wu, Y., Blasch, E., Bai, L.: Minimum error bounded efficient L1 tracker with occlusion detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1257–1264 (2011)
Zhang, T., Bibi, A., Ghanem, B.: In defense of sparse tracking: circulant sparse tracker. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3080–3088 (2016)
Hu, D., Zhou, X., Wu, J.: Visual tracking based on convolutional deep belief network. In: Proceedings of Conference and Workshop on Neural Information Processing Systems, pp. 103–115. Springer (2015)
Kuen, J., Lim, K.M., Lee, C.P.: Self-taught learning of a deep invariant representation for visual tracking via temporal slowness principle. Pattern Recogn. 48(10), 2964–2982 (2016)
Article Google Scholar
Wang, L., Ouyang, W., Wang, X., Lu, H.: Visual tracking with fully convolutional networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3119–3127 (2015)
Zhang, K., Liu, Q., Wu, Y., Yang, M.-H.: Robust visual tracking via convolutional networks without training. IEEE Trans. Image Process. 25(4), 1779–1792 (2016)
MathSciNet MATH Google Scholar
Bertinetto, L., Valmadre, J., Henriques, J. F., Vedaldi, A., Torr, P.H.: Fully-convolutional siamese networks for object tracking. In: Proceedings of European Conference on Computer Vision, pp. 850–865 (2016)
Li, H., Li, Y., Porikli, F.: DeepTrack: learning discriminative feature representations online for robust visual tracking. IEEE Trans. Image Process. 25(4), 1834–1848 (2016)
Article MathSciNet MATH Google Scholar
Birchfield, S., Sriram, R.: Spatiograms versus histograms for region-based tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1158–1163 (2005)
Grandvalet, Y., Bengio, Y.: Semi-supervised learning by entropy minimization. In: Proceedings of Conference and Workshop on Neural Information Processing Systems, pp. 529–536 (2005)
Danelljan, M., Robinson, A., Shahbaz Khan, F., et al.: Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: Proceedings of European Conference on Computer Vision, pp. 472–488(2016)
Guo, W., Cao, L., Han, T.X., Yan, S., Xu, C.: Max-confidence boosting with uncertainty for visual tracking. IEEE Trans. Image Process. 24(5), 1650–1659 (2015)
Article MathSciNet MATH Google Scholar
Zhang, Y., Sohny, K., Villegasy, R.: Improving object detection with deep convolutional networks via bayesian optimization and structured prediction. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 249–258 (2015)
Rasmussen, C., Williams, C.: Gaussian Processes for Machine Learning, pp. 7–31. The MIT Press, Cambridge (2006)
MATH Google Scholar

Download references

Acknowledgements

The author would like to thank the anonymous reviewers for their helpful comments on an earlier draft of this paper. The work was supported in part by the National Natural Science Foundation of China under Grant 62072286 and Grant 61572296.

Author information

Authors and Affiliations

School of Information and Electrical Engineering, Shandong Technology And Business University, Yantai, 264005, China
Wen Guo, Dong Li, Bowen Liang & Bin Shan

Authors

Wen Guo
View author publications
You can also search for this author in PubMed Google Scholar
Dong Li
View author publications
You can also search for this author in PubMed Google Scholar
Bowen Liang
View author publications
You can also search for this author in PubMed Google Scholar
Bin Shan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wen Guo.

Additional information

Communicated by B. Bao.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Guo, W., Li, D., Liang, B. et al. Multi-view region proposal network predictive learning for tracking. Multimedia Systems 29, 333–346 (2023). https://doi.org/10.1007/s00530-022-01001-w

Download citation

Received: 30 June 2021
Accepted: 19 August 2022
Published: 21 September 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s00530-022-01001-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-view region proposal network predictive learning for tracking

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

End-to-End Object Detection with Transformers

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-view region proposal network predictive learning for tracking

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

End-to-End Object Detection with Transformers

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation