research-article

Multi-Scale Feature Adaptive Fusion for Multi-Task Dense Prediction

Authors:
Huilan Luo

the School of Information Engineering, Jiangxi University of Science and Technology, China

the School of Information Engineering, Jiangxi University of Science and Technology, China

0000-0002-5912-2331
View Profile

,
Weixia Hu

the School of Information Engineering, Jiangxi University of Science and Technology, China

the School of Information Engineering, Jiangxi University of Science and Technology, China

0009-0009-5794-3898
View Profile

ICIGP '24: Proceedings of the 2024 7th International Conference on Image and Graphics ProcessingJanuary 2024Pages 294–300https://doi.org/10.1145/3647649.3647696

Published:03 May 2024Publication History

ICIGP '24: Proceedings of the 2024 7th International Conference on Image and Graphics Processing

Pages 294–300

ABSTRACT

In dense scene analysis, multi-scale features are pivotal for effective contextual representation. While they are prevalent in multi-task networks, the disparate scale requirements across tasks challenge the effective fusion of these features. To address this, we present the Multi-Scale feature Adaptive Fusion Network (MAFNet) which incorporates a Scale-Adaptive Fusion module (SAF). Unlike conventional methods that rely on simple feature concatenation, SAF dynamically learns the optimal feature scale combination for each task. Augmenting this, the Asymmetric Information Comparison Module (AICM) is introduced, enhancing inter-task feature interactions by discerning shared from unique features and judiciously applying attention mechanisms. Quantitative and qualitative evaluations on the PASCAL-Context and NYUD-v2 datasets confirm the superiority of our approach over existing state-of-the-art techniques.

References

Badrinarayanan V, Kendall A, Cipolla R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation[J/OL]. arXiv:1511.00561 [Cs], 2016[2022–03–03]. http://arxiv.org/abs/1511.00561.Google Scholar
Zhou Z, Fan X, Shi P, R-msfm: recurrent multi-scale feature modulation for monocular depth estimating[C].Google Scholar
Xu D. PAD-net: multi-tasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing[C/OL]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT: IEEE, 2018: 675–684[2022–11–10]. https://ieeexplore.ieee.org/document/8578175/. DOI:10.1109/CVPR.2018.00077.Google ScholarCross Ref
Vandenhende S, Georgoulis S, Van Gool L. MTI-net: multi-scale task interaction networks for multi-task learning:arXiv: 2001.06902[Z/OL].arXiv,2020(2020–07–08)[2023–08–04]. http://arxiv.org /abs/2001.06902.Google Scholar
Zhang Y, Li H. 3D multi-attention guided multi-task learning network for automatic gastric tumor segmentation and lymph node classification[J]. IEEE Transactions on Medical Imaging, 2021, 40(6): 1618–1631. DOI:10.1109/TMI.2021.3062902.Google ScholarCross Ref
Bruggemann D, Kanakis M, Obukhov A, Exploring relational context for multi-task dense prediction[J]. [no date]: 10.Google Scholar
Xu Y, Li X, Yuan H, Yang Y, Zhang L. Multi-task learning with multi-query transformer for dense prediction[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023.Google Scholar
Song T-J, Jeong J, Kim J-H. End-to-end real-time obstacle detection network for safe self-driving via multi-task learning[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(9): 16318–16329. DOI:10.1109/TITS.2022.3149789.Google ScholarDigital Library
Qiu X, Xiao Y, Wang C, Pixel-pair occlusion relationship map(p2orm): formulation, inference & application[J/OL]. 2020. [2022–11–02]. http://arxiv.org/abs/2007.12088.Google Scholar
Kundu J N, Lakkakula N, Babu R V. UM-adapt: unsupervised multi-task adaptation using adversarial cross-task distillation: arXiv: 1908.03884 [Z/OL]. arXiv, z2019(2019–09–16) [2022–11–02]. http:// arxiv.org/ abs/ 1908. 03884.Google Scholar
Sarva Naveen Kumar and Ch. Sumanth Kumar, "Fusion of CNN-QCSO for Content Based Image Retrieval," Journal of Advances in Information Technology, Vol. 14, No. 4, pp. 668-673, 2023.Google ScholarCross Ref
Guizhou Wang, An Li, Guojin He, Jianbo Liu, Zhaoming Zhang, and Mengmeng Wang, "Classification of High Spatial Resolution Remote Sensing Images Based on Decision Fusion," Vol. 8, No. 1, pp. 42-46, February, 2017. doi: 10.12720/jait.8.1.42-46.Google ScholarCross Ref
Ye H, Xu D. Inverted pyramid multi-task transformer for dense scene understanding[J]. [no date]: 23.Google Scholar
Xu X, Zhao H, Vineet V, Lim S, Torralba A. MTFormer: Multi-task learning via transformer and cross-task reasoning[C]. European Conference on Computer Vision, Tel Aviv, Israel, October 23-27, 2022: 304-321.Google ScholarDigital Library
Wang J, Chen K, Xu R, CARAFE: content-aware reassembly of features: arXiv:1905.02188[Z/OL]. arXiv, 2019(2019–10–29) [2022–11–24]. http://arxiv.org/abs/1905.02188.Google Scholar
Szegedy C, Liu W, Jia Y, Going deeper with convolutions: arXiv:1409.4842[Z/OL]. arXiv, 2014(2014–09–16) [2022–11–24]. http://arxiv.org/abs/1409.4842.Google Scholar
Li F, Zhang Y, Cosman P C. MMMNet: an end-to-end multi-task deep convolution neural network with multi-scale and multi-hierarchy fusion for blind image quality assessment[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(12): 4798–4811. DOI:10.1109/TCSVT.2021.3055197.Google ScholarCross Ref
Hu J, Shen L, Albanie S, Squeeze-and-excitation networks: arXiv:1709.01507[Z/OL]. arXiv, 2019(2019–05–16) [2022–11–09]. http://arxiv.org/abs/1709.01507.Google Scholar
Chen X, Mottaghi R. Detect what you can: detecting and representing objects using holistic models and body parts: arXiv:1406.2031[Z/OL]. arXiv, 2014(2014–06–08) [2022–11–09]. http://arxiv.org/abs/1406. 2031.Google Scholar
Silberman N, Hoiem D. Indoor segmentation and support inference from rgbd images[M/OL]. Springer Berlin Heidelberg, 2012: 746–760[2022–12–07]. http://link.springer.com/10.1007/978-3-642-33715-4_54. DOI:10.1007/978-3-642-33715-4_54.Google ScholarDigital Library
Wang J, Sun K, Cheng T, Deep high-resolution representation learning for visual recognition: arXiv:1908.07919[Z/OL]. arXiv, 2020(2020–03–13) [2022–11–24]. http://arxiv.org/abs/1908.07919.Google Scholar
Maninis K-K, Radosavovic I, Kokkinos I. Attentive single-tasking of multiple tasks: arXiv:1904.08918[Z/OL]. arXiv, 2019(2019–04–18) [2022–11–09]. http://arxiv.org/abs/1904.08918.Google Scholar
Liu Z, Lin Y, Cao Y, Swin transformer: hierarchical vision transformer using shifted windows: arXiv:2103.14030[Z/OL]. arXiv, 2021(2021–08–17) [2023–01–27]. http://arxiv.org/abs/2103.14030.Google Scholar

Index Terms

Multi-Scale Feature Adaptive Fusion for Multi-Task Dense Prediction
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Scene understanding

Recommendations

A deep recursive multi-scale feature fusion network for image super-resolution
Abstract
Recently, Convolutional Neural Networks (CNNs) have achieved great success in Single Image Super-Resolution (SISR). In particular, the recursive networks are now widely used. However, existing recursion-based SISR networks can only ...
Read More
Adaptive fusion with multi-scale features for interactive image segmentation
Abstract
Multi-scale features are usually utilized to improve the performance of interactive image segmentation, however, they have varying leverages over the result of segmentation, for example, thinner segmentation results could be achieved by pixel-...
Read More
Feature learning network with transformer for multi-label image classification
Highlights
- A novel framework termed FL-Tran is proposed to solve the multi-label image classification task.
- A multi-scale fusion mechanism is designed to align high-level features and low-level features to learn multi-scale features.
- A ...
Abstract
The purpose of multi-label image classification task is to accurately assign a set of labels to the objects in images. Although promising results have been achieved, most of the existing methods cannot effectively learn multi-scale features, so ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICIGP '24: Proceedings of the 2024 7th International Conference on Image and Graphics Processing
January 2024
480 pages
ISBN:9798400716720
DOI:10.1145/3647649

Copyright © 2024 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 May 2024
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Scale Adaptive Fusion
multi-scale features
multi-task networks
quantitative and qualitative evaluations
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 4
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Multi-Scale Feature Adaptive Fusion for Multi-Task Dense Prediction

ICIGP '24: Proceedings of the 2024 7th International Conference on Image and Graphics Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

A deep recursive multi-scale feature fusion network for image super-resolution

Adaptive fusion with multi-scale features for interactive image segmentation

Feature learning network with transformer for multi-label image classification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Multi-Scale Feature Adaptive Fusion for Multi-Task Dense Prediction

ICIGP '24: Proceedings of the 2024 7th International Conference on Image and Graphics Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

A deep recursive multi-scale feature fusion network for image super-resolution

Adaptive fusion with multi-scale features for interactive image segmentation

Feature learning network with transformer for multi-label image classification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media