Skip to main content
Log in

Multi-view region proposal network predictive learning for tracking

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Visual tracking is one of the most challenging problems in computer vision. Most state-of-the-art visual trackers suffer from three challenging problems: nondiverse discriminate feature representation, coarse object locator, and limited quantities of positive samples. In this paper, a multi-view multi-expert region proposal prediction algorithm for tracking is proposed to solve the above problems concurrently in one framework. The proposed algorithm integrates multiple views and exploits powerful multiple sources of information, which can solve nondiverse discriminate feature representation problem effectively. It builds multiple SVM classifier models on the expanded bounding boxes and adds the regional suggestion network module to accurately optimize it to predict optimal object location, which naturally alleviates the coarse object locator and limited quantities of positive samples problems at the same time. A comprehensive evaluation of the proposed approach on various benchmark sequences has been performed. The evaluation results demonstrate our method can significantly improve the tracking performance by combining the advantages of lightweight region proposal network predictive learning model and multi-view expert groups. The experimental results demonstrate the proposed approach outperforms other state-of-the-art visual trackers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Yilmaz, A., Javed, O., Shah, M.: Object tracking: a survey. ACM Comput. Surv. 38(4), 1–45 (2006)

    Article  Google Scholar 

  2. Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2411–2418 (2013)

  3. Smeulders, A.W., Chu, D.M., Cucchiara, R., et al.: Visual tracking: an experimental survey. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1442–1468 (2014)

    Article  Google Scholar 

  4. Danelljan, M., Hager, G., Khan, F., Felsberg, M.: Accurate scale estimation for robust visual tracking. In: Proceedings of British machine vision conference, pp. 1–11 (2014)

  5. Gao, J., Ling, H., Hu, W., Xing, J.: Transfer learning based visual tracking with Gaussian processes regression. In: Proceedings of European Conference on Computer Vision, pp. 188–203 (2014)

  6. Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2015)

    Article  Google Scholar 

  7. Zhang, J., Ma, S., Sclaroff, S.: MEEM: robust tracking via multiple experts using entropy minimization. In: Proceedings of European Conference on Computer Vision, pp. 188–203 (2014)

  8. Hare, S., Saffari, A., Torr, P.: Struck: structured output tracking with kernels. In: Proceedings of IEEE International Conference on Computer Vision, pp. 263–270 (2011)

  9. Zhang, K., Zhang, L., Yang, M.-H.: Real-time compressive tracking. In: Proceedings of European Conference on Computer Vision, pp. 866–879 (2012)

  10. Wang, N., Shi, J., Yeung, D., Jia, J.: Understanding and diagnosing visual tracking systems. In: Proceedings of IEEE International Conference on Computer Vision, pp. 3101–3109 (2015)

  11. Comaniciu, D., Ramesh, V., Meer, P.: Real-time tracking of non-rigid objects using mean shift. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 142–149 (2002)

  12. Ahonen, T., Hadid, A., Pietikainen, M.: Face description with local binary patterns: application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 28(12), 2037–2041 (2006)

    Article  MATH  Google Scholar 

  13. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 886–893 (2005)

  14. Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1409–1422 (2012)

    Article  Google Scholar 

  15. Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2016)

  16. Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.: Fully-convolutional siamese networks for object tracking. In: Proceedings of European Conference on Computer Vision, pp. 850–865 (2016)

  17. Jack, V., Luca, B.: End-to-end representation learning for Correlation Filter based tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2017)

  18. Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: Proceedings of IEEE International Conference on Computer Vision (2018)

  19. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of Conference and Workshop on Neural Information Processing Systems (2015)

  20. Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: SiamRPN++: evolution of siamese visual tracking with very deep networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2019)

  21. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

  22. Zhang, Y., Sohny, K., Villegasy, R.: ”Improving Object Detection with Deep Convolutional Networks via Bayesian Optimization and Structured Prediction,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (2015): 249C-258

  23. Yoon, J., Kim, D., Yoon, K.: Visual tracking via adaptive tracker selection with multiple features. In: Proceedings of European Conference on Computer Vision, pp. 28–41 (2012)

  24. Ma, L., Lu, J., Feng, J., Zhou, J.: Multiple feature fusion via weighted entropy for visual tracking. In: Proceedings of European Conference on Computer Vision, pp. 3128–3136 (2015)

  25. Grabner, H., Bischof, H.: Online boosting and vision. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 260–267 (2006)

  26. Hong, Z., Mei, X., Prokhorov, D., Tao, D.: Tracking via robust multi-task multi-view joint sparse representation. In: Proceedings of European Conference on Computer Vision, pp. 649–656 (2013)

  27. Danelljan, M., Shahbaz Khan, F., Felsberg, M., Weijer, J.: Adaptive color attributes for real-time visual tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1090–1097 (2014)

  28. Medioni, G., Vo, N., Ba, T.: Context tracker: exploring supporters and distracters in unconstrained environments. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp.1177–1184 (2011)

  29. Ross, D.A., Lim, J., Lin, R.S., Yang, M.H.: Incremental learning for robust visual tracking. Int. J. Comput. Vis. 77(3), 125–141 (2008)

    Article  Google Scholar 

  30. Sun, X., Yao, H., Zhang, S., Li, D.: Non-rigid object contour tracking via a novel supervised level set model. IEEE Trans. Image Process. 24(11), 3386–99 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  31. Mei, X., Ling, H., Wu, Y., Blasch, E., Bai, L.: Minimum error bounded efficient L1 tracker with occlusion detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1257–1264 (2011)

  32. Zhang, T., Bibi, A., Ghanem, B.: In defense of sparse tracking: circulant sparse tracker. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3080–3088 (2016)

  33. Hu, D., Zhou, X., Wu, J.: Visual tracking based on convolutional deep belief network. In: Proceedings of Conference and Workshop on Neural Information Processing Systems, pp. 103–115. Springer (2015)

  34. Kuen, J., Lim, K.M., Lee, C.P.: Self-taught learning of a deep invariant representation for visual tracking via temporal slowness principle. Pattern Recogn. 48(10), 2964–2982 (2016)

    Article  Google Scholar 

  35. Wang, L., Ouyang, W., Wang, X., Lu, H.: Visual tracking with fully convolutional networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3119–3127 (2015)

  36. Zhang, K., Liu, Q., Wu, Y., Yang, M.-H.: Robust visual tracking via convolutional networks without training. IEEE Trans. Image Process. 25(4), 1779–1792 (2016)

    MathSciNet  MATH  Google Scholar 

  37. Bertinetto, L., Valmadre, J., Henriques, J. F., Vedaldi, A., Torr, P.H.: Fully-convolutional siamese networks for object tracking. In: Proceedings of European Conference on Computer Vision, pp. 850–865 (2016)

  38. Li, H., Li, Y., Porikli, F.: DeepTrack: learning discriminative feature representations online for robust visual tracking. IEEE Trans. Image Process. 25(4), 1834–1848 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  39. Birchfield, S., Sriram, R.: Spatiograms versus histograms for region-based tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1158–1163 (2005)

  40. Grandvalet, Y., Bengio, Y.: Semi-supervised learning by entropy minimization. In: Proceedings of Conference and Workshop on Neural Information Processing Systems, pp. 529–536 (2005)

  41. Danelljan, M., Robinson, A., Shahbaz Khan, F., et al.: Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: Proceedings of European Conference on Computer Vision, pp. 472–488(2016)

  42. Guo, W., Cao, L., Han, T.X., Yan, S., Xu, C.: Max-confidence boosting with uncertainty for visual tracking. IEEE Trans. Image Process. 24(5), 1650–1659 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  43. Zhang, Y., Sohny, K., Villegasy, R.: Improving object detection with deep convolutional networks via bayesian optimization and structured prediction. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 249–258 (2015)

  44. Rasmussen, C., Williams, C.: Gaussian Processes for Machine Learning, pp. 7–31. The MIT Press, Cambridge (2006)

    MATH  Google Scholar 

Download references

Acknowledgements

The author would like to thank the anonymous reviewers for their helpful comments on an earlier draft of this paper. The work was supported in part by the National Natural Science Foundation of China under Grant 62072286 and Grant 61572296.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wen Guo.

Additional information

Communicated by B. Bao.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, W., Li, D., Liang, B. et al. Multi-view region proposal network predictive learning for tracking. Multimedia Systems 29, 333–346 (2023). https://doi.org/10.1007/s00530-022-01001-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-022-01001-w

Keywords

Navigation