skip to main content
10.1145/3647649.3647696acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicigpConference Proceedingsconference-collections
research-article

Multi-Scale Feature Adaptive Fusion for Multi-Task Dense Prediction

Authors Info & Claims
Published:03 May 2024Publication History

ABSTRACT

In dense scene analysis, multi-scale features are pivotal for effective contextual representation. While they are prevalent in multi-task networks, the disparate scale requirements across tasks challenge the effective fusion of these features. To address this, we present the Multi-Scale feature Adaptive Fusion Network (MAFNet) which incorporates a Scale-Adaptive Fusion module (SAF). Unlike conventional methods that rely on simple feature concatenation, SAF dynamically learns the optimal feature scale combination for each task. Augmenting this, the Asymmetric Information Comparison Module (AICM) is introduced, enhancing inter-task feature interactions by discerning shared from unique features and judiciously applying attention mechanisms. Quantitative and qualitative evaluations on the PASCAL-Context and NYUD-v2 datasets confirm the superiority of our approach over existing state-of-the-art techniques.

References

  1. Badrinarayanan V, Kendall A, Cipolla R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation[J/OL]. arXiv:1511.00561 [Cs], 2016[2022–03–03]. http://arxiv.org/abs/1511.00561.Google ScholarGoogle Scholar
  2. Zhou Z, Fan X, Shi P, R-msfm: recurrent multi-scale feature modulation for monocular depth estimating[C].Google ScholarGoogle Scholar
  3. Xu D. PAD-net: multi-tasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing[C/OL]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT: IEEE, 2018: 675–684[2022–11–10]. https://ieeexplore.ieee.org/document/8578175/. DOI:10.1109/CVPR.2018.00077.Google ScholarGoogle ScholarCross RefCross Ref
  4. Vandenhende S, Georgoulis S, Van Gool L. MTI-net: multi-scale task interaction networks for multi-task learning:arXiv: 2001.06902[Z/OL].arXiv,2020(2020–07–08)[2023–08–04]. http://arxiv.org /abs/2001.06902.Google ScholarGoogle Scholar
  5. Zhang Y, Li H. 3D multi-attention guided multi-task learning network for automatic gastric tumor segmentation and lymph node classification[J]. IEEE Transactions on Medical Imaging, 2021, 40(6): 1618–1631. DOI:10.1109/TMI.2021.3062902.Google ScholarGoogle ScholarCross RefCross Ref
  6. Bruggemann D, Kanakis M, Obukhov A, Exploring relational context for multi-task dense prediction[J]. [no date]: 10.Google ScholarGoogle Scholar
  7. Xu Y, Li X, Yuan H, Yang Y, Zhang L. Multi-task learning with multi-query transformer for dense prediction[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023.Google ScholarGoogle Scholar
  8. Song T-J, Jeong J, Kim J-H. End-to-end real-time obstacle detection network for safe self-driving via multi-task learning[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(9): 16318–16329. DOI:10.1109/TITS.2022.3149789.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Qiu X, Xiao Y, Wang C, Pixel-pair occlusion relationship map(p2orm): formulation, inference & application[J/OL]. 2020. [2022–11–02]. http://arxiv.org/abs/2007.12088.Google ScholarGoogle Scholar
  10. Kundu J N, Lakkakula N, Babu R V. UM-adapt: unsupervised multi-task adaptation using adversarial cross-task distillation: arXiv: 1908.03884 [Z/OL]. arXiv, z2019(2019–09–16) [2022–11–02]. http:// arxiv.org/ abs/ 1908. 03884.Google ScholarGoogle Scholar
  11. Sarva Naveen Kumar and Ch. Sumanth Kumar, "Fusion of CNN-QCSO for Content Based Image Retrieval," Journal of Advances in Information Technology, Vol. 14, No. 4, pp. 668-673, 2023.Google ScholarGoogle ScholarCross RefCross Ref
  12. Guizhou Wang, An Li, Guojin He, Jianbo Liu, Zhaoming Zhang, and Mengmeng Wang, "Classification of High Spatial Resolution Remote Sensing Images Based on Decision Fusion," Vol. 8, No. 1, pp. 42-46, February, 2017. doi: 10.12720/jait.8.1.42-46.Google ScholarGoogle ScholarCross RefCross Ref
  13. Ye H, Xu D. Inverted pyramid multi-task transformer for dense scene understanding[J]. [no date]: 23.Google ScholarGoogle Scholar
  14. Xu X, Zhao H, Vineet V, Lim S, Torralba A. MTFormer: Multi-task learning via transformer and cross-task reasoning[C]. European Conference on Computer Vision, Tel Aviv, Israel, October 23-27, 2022: 304-321.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Wang J, Chen K, Xu R, CARAFE: content-aware reassembly of features: arXiv:1905.02188[Z/OL]. arXiv, 2019(2019–10–29) [2022–11–24]. http://arxiv.org/abs/1905.02188.Google ScholarGoogle Scholar
  16. Szegedy C, Liu W, Jia Y, Going deeper with convolutions: arXiv:1409.4842[Z/OL]. arXiv, 2014(2014–09–16) [2022–11–24]. http://arxiv.org/abs/1409.4842.Google ScholarGoogle Scholar
  17. Li F, Zhang Y, Cosman P C. MMMNet: an end-to-end multi-task deep convolution neural network with multi-scale and multi-hierarchy fusion for blind image quality assessment[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(12): 4798–4811. DOI:10.1109/TCSVT.2021.3055197.Google ScholarGoogle ScholarCross RefCross Ref
  18. Hu J, Shen L, Albanie S, Squeeze-and-excitation networks: arXiv:1709.01507[Z/OL]. arXiv, 2019(2019–05–16) [2022–11–09]. http://arxiv.org/abs/1709.01507.Google ScholarGoogle Scholar
  19. Chen X, Mottaghi R. Detect what you can: detecting and representing objects using holistic models and body parts: arXiv:1406.2031[Z/OL]. arXiv, 2014(2014–06–08) [2022–11–09]. http://arxiv.org/abs/1406. 2031.Google ScholarGoogle Scholar
  20. Silberman N, Hoiem D. Indoor segmentation and support inference from rgbd images[M/OL]. Springer Berlin Heidelberg, 2012: 746–760[2022–12–07]. http://link.springer.com/10.1007/978-3-642-33715-4_54. DOI:10.1007/978-3-642-33715-4_54.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Wang J, Sun K, Cheng T, Deep high-resolution representation learning for visual recognition: arXiv:1908.07919[Z/OL]. arXiv, 2020(2020–03–13) [2022–11–24]. http://arxiv.org/abs/1908.07919.Google ScholarGoogle Scholar
  22. Maninis K-K, Radosavovic I, Kokkinos I. Attentive single-tasking of multiple tasks: arXiv:1904.08918[Z/OL]. arXiv, 2019(2019–04–18) [2022–11–09]. http://arxiv.org/abs/1904.08918.Google ScholarGoogle Scholar
  23. Liu Z, Lin Y, Cao Y, Swin transformer: hierarchical vision transformer using shifted windows: arXiv:2103.14030[Z/OL]. arXiv, 2021(2021–08–17) [2023–01–27]. http://arxiv.org/abs/2103.14030.Google ScholarGoogle Scholar

Index Terms

  1. Multi-Scale Feature Adaptive Fusion for Multi-Task Dense Prediction

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICIGP '24: Proceedings of the 2024 7th International Conference on Image and Graphics Processing
      January 2024
      480 pages
      ISBN:9798400716720
      DOI:10.1145/3647649

      Copyright © 2024 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 May 2024

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)4

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format