Skip to main content
Log in

Multiple frequency–spatial network for RGBT tracking in the presence of motion blur

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

RGBT tracking combines visible and thermal infrared images to achieve tracking and faces challenges due to motion blur caused by camera and target movement. In this study, we observe that the tracking in motion blur is significantly affected by both frequency and spatial aspects. And blurred targets exhibit sharp texture details that are represented as high-frequency information. But existing trackers capture low-frequency components while ignoring high-frequency information. To enhance the representation of sharp information in blurred scenes, we introduce multi-frequency and multi-spatial information in network, called FSBNet. First, we construct a modality-specific unsymmetrical architecture and integrate an adaptive soft threshold mechanism into a DCT-based multi-frequency channel attention adapter (DFDA). DFDA adaptively integrates rich multi-frequency information. Second, we propose a masked frequency-based translation adapter (MFTA) to refine drifting failure boxes caused by camera motion. Moreover, we find that small targets get more affected by motion blur compared to larger targets, and we mitigate this issue by designing a cross-scale mutual conversion adapter (CFCA) between the frequency and spatial domains. Extensive experiments on GTOT, RGBT234 and LasHeR benchmarks demonstrate the promising performance of our method in the presence of motion blur.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data Availability

All data generated or analyzed during this study are included in this published article.

References

  1. Wang T, Shi C (2023) Basketball motion video target tracking algorithm based on improved gray neural network. Neural Comput Appl 35(6):4267–4282. https://doi.org/10.1007/s00521-022-07026-6

    Article  Google Scholar 

  2. Zhai M, Xiang X (2021) Geometry understanding from autonomous driving scenarios based on feature refinement. Neural Comput Appl 33(8):3209–3220. https://doi.org/10.1007/s00521-020-05192-z

    Article  Google Scholar 

  3. Abbasi A, Zadeh SM, Yazdani A, Moshayedi AJ (2022) Feasibility assessment of Kian-i mobile robot for autonomous navigation. Neural Comput Appl 34(2):1199–1218. https://doi.org/10.1007/s00521-021-06428-2

    Article  Google Scholar 

  4. Zhang C, Ren K (2022) LRATD: a lightweight real-time abnormal trajectory detection approach for road traffic surveillance. Neural Comput Appl 34(24):22417–22434. https://doi.org/10.1007/s00521-022-07626-2

    Article  Google Scholar 

  5. Yuan D, Chang X, Huang P-Y, Liu Q, He Z (2020) Self-supervised deep correlation tracking. IEEE Trans Image Process 30:976–985

    Article  Google Scholar 

  6. Yang K, He Z, Pei W, Zhou Z, Li X, Yuan D, Zhang H (2021) Siamcorners: Siamese corner networks for visual tracking. IEEE Trans Multimed 24:1956–1967

    Article  Google Scholar 

  7. Ke X, Li Y, Guo W, Huang Y (2022) Learning deep convolutional descriptor aggregation for efficient visual tracking. Neural Comput Appl 34(5):3745–3765. https://doi.org/10.1007/s00521-021-06638-8

    Article  Google Scholar 

  8. Liu Q, Yuan D, He Z (2017) Thermal infrared object tracking via siamese convolutional neural networks. In: 2017 international conference on security, pattern analysis, and Cybernetics (SPAC), pp. 1–6. IEEE

  9. Liu Q, Yuan D, Fan N, Gao P, Li X, He Z (2022) Learning dual-level deep representation for thermal infrared tracking. IEEE Trans Multimed 25:1269–1281

    Article  Google Scholar 

  10. Liu Q, Li X, He Z, Fan N, Yuan D, Wang H (2020) Learning deep multi-level similarity for thermal infrared object tracking. IEEE Trans Multimed 23:2114–2126

    Article  Google Scholar 

  11. Li C, Cheng H, Hu S, Liu X, Tang J, Lin L (2016) Learning collaborative sparse representation for grayscale-thermal tracking. IEEE Trans Image Process. https://doi.org/10.1109/TIP.2016.2614135

    Article  MathSciNet  MATH  Google Scholar 

  12. Yun X, Jing Z, Xiao G, Jin B, Zhang C (2016) A compressive tracking based on time-space Kalman fusion model. Sci China Inf Sci. https://doi.org/10.1007/s11432-015-5356-0

    Article  Google Scholar 

  13. Cvejic N, Nikolov SG, Knowles HD, Loza A, Achim A, Bull DR, Canagarajah CN (2007) The effect of pixel-level fusion on object tracking in multi-sensor surveillance video. In: 2007 IEEE computer society conference on computer vision and pattern recognition (CVPR 2007), 18-23 June 2007, Minneapolis, Minnesota, USA (2007). https://doi.org/10.1109/CVPR.2007.383433

  14. Li C, Hu S, Gao S, Tang J (2016) Real-time grayscale-thermal tracking via laplacian sparse representation. In: Tian Q, Sebe N, Qi G, Huet B, Hong R, Liu X (eds.) MultiMedia modeling : 22nd international conference, MMM 2016, Miami, FL, USA, January 4-6, 2016, Proceedings, Part II. Lecture Notes in Computer Science, vol. 9517, pp. 54–65 . https://doi.org/10.1007/978-3-319-27674-8_6

  15. Lu A, Li C, Yan Y, Tang J, Luo B (2021) RGBT tracking via multi-adapter network with hierarchical divergence loss. IEEE Trans Image Process. https://doi.org/10.1109/TIP.2021.3087341

    Article  Google Scholar 

  16. Zhu Y, Li C, Tang J, Luo B (2021) Quality-aware feature aggregation network for robust RGBT tracking. IEEE Trans Intell Veh. https://doi.org/10.1109/TIV.2020.2980735

    Article  Google Scholar 

  17. Li C, Xiang Z, Tang J, Luo B, Wang F (2022) RGBT tracking via noise-robust cross-modal ranking. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2021.3067107

    Article  MathSciNet  Google Scholar 

  18. Mao Z, Chen X, Wang Y, Yan J (2021) Robust tracking for motion blur via context enhancement. In: 2021 IEEE international conference on image processing, ICIP 2021, Anchorage, AK, USA, September 19-22, . https://doi.org/10.1109/ICIP42928.2021.9506594

  19. Iraei I, Faez K (2021) A motion parameters estimating method based on deep learning for visual blurred object tracking. IET Image Process. https://doi.org/10.1049/ipr2.12189

    Article  Google Scholar 

  20. Wang Z, Yao Z, Wang Q (2017) Improved scheme of estimating motion blur parameters for image restoration. Digit Signal Process. https://doi.org/10.1016/j.dsp.2017.02.010

    Article  MathSciNet  Google Scholar 

  21. Zhang Y, Li Q, Qi M, Liu D, Kong J, Wang J (2022) Multi-scale frequency separation network for image deblurring. CoRR https://doi.org/10.48550/arXiv.2206.00798

  22. Liu K, Yeh C, Chung J, Chang C (2020) A motion deblur method based on multi-scale high frequency residual image learning. IEEE Access. https://doi.org/10.1109/ACCESS.2020.2985220

    Article  Google Scholar 

  23. Jung I, Son J, Baek M, Han B (2018) Real-time mdnet. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision - ECCV 2018 - 15th European conference, Munich, Germany, September 8-14, 2018, Proceedings, Part IV . https://doi.org/10.1007/978-3-030-01225-0_6

  24. Xu Q, Mei Y, Liu J, Li C (2022) Multimodal cross-layer bilinear pooling for RGBT tracking. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2021.3055362

    Article  Google Scholar 

  25. Zhang P, Wang D, Lu H, Yang X (2021) Learning adaptive attribute-driven representation for real-time RGB-T tracking. Int J Comput Vis. https://doi.org/10.1007/s11263-021-01495-3

    Article  Google Scholar 

  26. Li C, Liu L, Lu A, Ji Q, Tang J (2020) Challenge-aware RGBT tracking. In: Vedaldi A, Bischof H, Brox T, Frahm J (eds.) Computer vision—ECCV 2020: 16th European conference, Glasgow, UK, August 23-28 Proceedings, Part XXII (2020). https://doi.org/10.1007/978-3-030-58542-6_14

  27. Qin Z, Zhang P, Wu F, Li X (2021) Fcanet: frequency channel attention networks. In: 2021 IEEE/CVF international conference on computer vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. https://doi.org/10.1109/ICCV48922.2021.00082

  28. Xu ZJ, Zhang Y, Luo T, Xiao Y, Ma Z (2019) Frequency principle: Fourier analysis sheds light on deep neural networks. CoRR

  29. Basri R, Galun M, Geifman A, Jacobs DW, Kasten Y, Kritchman S (2020) Frequency bias in neural networks for input of non-uniform density. In: Proceedings of the 37th international conference on machine learning, ICML 2020, 13-18 July 2020, Virtual Event

  30. Hai J, Yang R, Yu Y, Han S (2022) Combining spatial and frequency information for image deblurring. IEEE Signal Process Lett. https://doi.org/10.1109/LSP.2022.3194807

    Article  Google Scholar 

  31. Yin D, Lopes RG, Shlens J, Cubuk ED, Gilmer J (2019) A fourier perspective on model robustness in computer vision. In: Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EB, Garnett R (eds.) Advances in neural information processing systems 32: annual conference on neural information processing systems 2019, NeurIPS 2019, December 8-14, Vancouver, BC, Canada (2019)

  32. Wang H, Wu X, Huang Z, Xing EP (2020) High-frequency component helps explain the generalization of convolutional neural networks. In: 2020 IEEE/CVF conference on computer vision and pattern recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 8681–8691 . https://doi.org/10.1109/CVPR42600.2020.00871

  33. Ding J, Huang Y, Liu W, Huang K (2016) Severely blurred object tracking by learning deep image representations. IEEE Trans Circuits Syst Video Technol. https://doi.org/10.1109/TCSVT.2015.2406231

    Article  Google Scholar 

  34. Mozhdehi RJ, Reznichenko Y, Siddique A, Medeiros H (2018) Deep convolutional particle filter with adaptive correlation maps for visual tracking. In: 2018 IEEE international conference on image processing, ICIP 2018, Athens, Greece, October 7-10, 2018 . https://doi.org/10.1109/ICIP.2018.8451069

  35. Yuan D, Chang X, Li Z, He Z (2022) Learning adaptive spatial-temporal context-aware correlation filters for UAV tracking. ACM Trans Multimed Comput Commun Appl TOMM 18(3):1–18

    Article  Google Scholar 

  36. Yuan D, Chang X, Liu Q, Yang Y, Wang D, Shu M, He Z, Shi G (2023) Active learning for deep visual tracking. IEEE Trans Neural Netw Learn Syst

  37. Yang K, He Z, Pei W, Zhou Z, Li X, Yuan D, Zhang H (2021) Siamcorners: Siamese corner networks for visual tracking. IEEE Trans Multimed 24:1956–1967

    Article  Google Scholar 

  38. Yuan D, Chang X, Huang P-Y, Liu Q, He Z (2020) Self-supervised deep correlation tracking. IEEE Trans Image Process 30:976–985

    Article  Google Scholar 

  39. El-Shafie AA, Zaki MH, Habib SE (2019) Fast CNN-based object tracking using localization layers and deep features interpolation. In: 15th international wireless communications & mobile computing conference, IWCMC 2019, Tangier, Morocco, June 24-28, 2019 (2019). https://doi.org/10.1109/IWCMC.2019.8766466

  40. Ning G, Zhang Z, Huang C, Ren X, Wang H, Cai C, He Z (2017) Spatially supervised recurrent convolutional neural networks for visual object tracking. In: IEEE international symposium on circuits and systems, ISCAS 2017, Baltimore, MD, USA, May 28-31, 2017 . https://doi.org/10.1109/ISCAS.2017.8050867

  41. Gan W, Wang S, Lei X, Lee M, Kuo C-J (2018) Online CNN-based multiple object tracking with enhanced model updates and identity association. Signal Process Image Commun. https://doi.org/10.1016/j.image.2018.05.008

    Article  Google Scholar 

  42. Yuan D, Shu X, Liu Q, Zhang X, He Z (2023) Robust thermal infrared tracking via an adaptively multi-feature fusion model. Neural Comput Appl 35(4):3423–3434

    Article  Google Scholar 

  43. Yuan D, Shu X, Liu Q, He Z (2022) Aligned spatial-temporal memory network for thermal infrared target tracking. IEEE Trans Circuits Syst II Express Briefs 70(3):1224–1228

    Google Scholar 

  44. Yuan D, Shu X, Liu Q, He Z (2022) Structural target-aware model for thermal infrared tracking. Neurocomputing 491:44–56

    Article  Google Scholar 

  45. Zhang P, Zhao J, Bo C, Wang D, Lu H, Yang X (2021) Jointly modeling motion and appearance cues for robust RGB-T tracking. IEEE Trans Image Process 30: 3335–3347 https://doi.org/10.1109/TIP.2021.3060862

  46. Xiao Y, Yang M, Li C, Liu L, Tang J (2022) Attribute-based progressive fusion network for RGBT tracking. In: Thirty-Sixth AAAI conference on artificial intelligence, AAAI 2022, thirty-fourth conference on innovative applications of artificial intelligence, IAAI 2022, The twelveth symposium on educational advances in artificial intelligence, EAAI 2022 Virtual Event, February 22 - March 1, 2022

  47. Lu A, Qian C, Li C, Tang J, Wang L (2020) Duality-gated mutual condition network for RGBT tracking. CoRR

  48. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Guyon I, Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds.) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp. 5998–6008 (2017). https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html

  49. Hong D, Wu X, Ghamisi P, Chanussot J, Yokoya N, Zhu XX (2020) Invariant attribute profiles: a spatial-frequency joint feature extractor for hyperspectral image classification. IEEE Trans Geosci Remote Sens. https://doi.org/10.1109/TGRS.2019.2957251

    Article  Google Scholar 

  50. Li Q, Shen L, Guo S, Lai Z (2021) Wavecnet: wavelet integrated CNNS to suppress aliasing effect for noise-robust image classification. IEEE Trans Image Process. https://doi.org/10.1109/TIP.2021.3101395

    Article  MathSciNet  Google Scholar 

  51. Gal R, Hochberg DC, Bermano A, Cohen-Or D (2021) SWAGAN: a style-based wavelet-driven generative model. ACM Trans Graph. https://doi.org/10.1145/3450626.3459836

    Article  Google Scholar 

  52. Koh J, Lee J, Yoon S (2021) Single-image deblurring with neural networks: a comparative survey. Comput Vis Image Underst. https://doi.org/10.1016/j.cviu.2020.103134

    Article  Google Scholar 

  53. Jiang W, Liu A (2022) Image motion deblurring based on deep residual shrinkage and generative adversarial networks. Comput Intell Neurosci. https://doi.org/10.1155/2022/5605846

    Article  Google Scholar 

  54. Liu Y, Fang F, Wang T, Li J, Sheng Y, Zhang G (2022) Multi-scale grid network for image deblurring with high-frequency guidance. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2021.3090206

    Article  Google Scholar 

  55. Ahmed N, Natarajan TR, Rao KR (1974) Discrete cosine transform. IEEE Trans Comput. https://doi.org/10.1109/T-C.1974.223784

    Article  MathSciNet  MATH  Google Scholar 

  56. Zhao M, Zhong S, Fu X, Tang B, Pecht MG (2020) Deep residual shrinkage networks for fault diagnosis. IEEE Trans Ind Inf. https://doi.org/10.1109/TII.2019.2943898

    Article  Google Scholar 

  57. Stone HS, Orchard MT, Chang E, Martucci SA (2001) A fast direct Fourier-based algorithm for subpixel registration of images. IEEE Trans Geosci Remote Sens. https://doi.org/10.1109/36.957286

    Article  Google Scholar 

  58. Ren J, Vlachos T, Zhang Y, Zheng J, Jiang J (2014) Gradient-based subspace phase correlation for fast and effective image alignment. J Vis Commun Image Represent. https://doi.org/10.1016/j.jvcir.2014.07.001

    Article  Google Scholar 

  59. Shekarforoush H, Berthod M, Zerubia J (1996) Subpixel image registration by estimating the polyphase decomposition of cross power spectrum. In: 1996 conference on computer vision and pattern recognition (CVPR ’96), June 18-20, 1996 San Francisco, CA, USA . https://doi.org/10.1109/CVPR.1996.517123

  60. Xu Q, Mei Y, Liu J, Li C (2022) Multimodal cross-layer bilinear pooling for RGBT tracking. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2021.3055362

    Article  Google Scholar 

  61. Li C, Liang X, Lu Y, Zhao N, Tang J (2019) RGB-T object tracking: benchmark and baseline. Pattern Recognit. https://doi.org/10.1016/j.patcog.2019.106977

    Article  Google Scholar 

  62. Li C, Xue W, Jia Y, Qu Z, Luo B, Tang J, Sun D (2022) Lasher: a large-scale high-diversity benchmark for RGBT tracking. IEEE Trans Image Process. https://doi.org/10.1109/TIP.2021.3130533

    Article  Google Scholar 

  63. Gao Y, Li C, Zhu Y, Tang J, He T, Wang F (2019) Deep adaptive fusion network for high performance RGBT tracking. In: 2019 IEEE/CVF international conference on computer vision workshops, ICCV workshops 2019, Seoul, Korea (South), October 27-28, 2019 . https://doi.org/10.1109/ICCVW.2019.00017

  64. Zhu Y, Li C, Luo B, Tang J, Wang X (2019) Dense feature aggregation and pruning for RGBT tracking. In: Amsaleg L, Huet B, Larson MA, Gravier G, Hung H, Ngo C, Ooi WT (eds.) Proceedings of the 27th ACM international conference on multimedia, MM 2019, Nice, France, October 21-25, 2019 . https://doi.org/10.1145/3343031.3350928

  65. Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016 . https://doi.org/10.1109/CVPR.2016.465

  66. Danelljan M, Robinson A, Khan FS, Felsberg M (2016) Beyond correlation filters: learning continuous convolution operators for visual tracking. In: Leibe B, Matas J, Sebe N, Welling M (eds.) Computer vision: ECCV 2016: 14th European conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V. Lecture Notes in Computer Science (2016). https://doi.org/10.1007/978-3-319-46454-1_29

  67. Valmadre J, Bertinetto L, Henriques JF, Vedaldi A, Torr PHS (2017) End-to-end representation learning for correlation filter based tracking. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017 . https://doi.org/10.1109/CVPR.2017.531

  68. Zhang J, Ma S, Sclaroff S (2014) MEEM: robust tracking via multiple experts using entropy minimization. In: Fleet, D.J., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, Proceedings, Part VI. Lecture Notes in Computer Science (2014). https://doi.org/10.1007/978-3-319-10599-4_13

  69. Lu A, Li C, Yan Y, Tang J, Luo B (2021) RGBT tracking via multi-adapter network with hierarchical divergence loss. IEEE Trans Image Process. https://doi.org/10.1109/TIP.2021.3087341

    Article  Google Scholar 

  70. Tu Z, Lin C, Zhao W, Li C, Tang J (2022) M\({}^{\text{5}}\)l: Multi-modal multi-margin metric learning for RGBT tracking. IEEE Trans Image Process 31: 85–98 https://doi.org/10.1109/TIP.2021.3125504

  71. Zhang H, Zhang L, Zhuo L, Zhang J (2020) Object tracking in RGB-T videos using modal-aware attention network and competitive learning. Sensors 20(2):393. https://doi.org/10.3390/s20020393

    Article  Google Scholar 

  72. Zhang L, Danelljan M, Gonzalez-Garcia A, Weijer J, Khan FS (2019) Multi-modal fusion for end-to-end RGB-T tracking. In: 2019 IEEE/CVF international conference on computer vision workshops, ICCV Workshops 2019, Seoul, Korea (South), October 27-28, 2019 . https://doi.org/10.1109/ICCVW.2019.00278

  73. Li C, Zhu C, Huang Y, Tang J, Wang L (2018) Cross-modal ranking with soft consistency and noisy labels for robust RGB-T tracking. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds.) Computer vision—ECCV 2018: 15th european conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XIII . https://doi.org/10.1007/978-3-030-01261-8_49

  74. Chen Z, Zhong B, Li G, Zhang S, Ji R (2020) Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6668–6677

  75. Bhat G, Danelljan M, Van Gool L, Timofte R (2020) Know your surroundings: exploiting scene information for object tracking. In: Computer vision—ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIII 16, pp. 205–221 . Springer

  76. Wang N, Zhou W, Wang J, Li H (2021) Transformer meets tracker: exploiting temporal context for robust visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1571–1580

  77. Danelljan M, Gool LV, Timofte R (2020) Probabilistic regression for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7183–7192

Download references

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 41371342) and the National Key Research and Development Program of China (2016YFC0803000)

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Xi Chen or Chu He.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fan, S., Chen, X., He, C. et al. Multiple frequency–spatial network for RGBT tracking in the presence of motion blur. Neural Comput & Applic 35, 24389–24406 (2023). https://doi.org/10.1007/s00521-023-09024-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-09024-8

Keywords

Navigation