ABSTRACT
In dense scene analysis, multi-scale features are pivotal for effective contextual representation. While they are prevalent in multi-task networks, the disparate scale requirements across tasks challenge the effective fusion of these features. To address this, we present the Multi-Scale feature Adaptive Fusion Network (MAFNet) which incorporates a Scale-Adaptive Fusion module (SAF). Unlike conventional methods that rely on simple feature concatenation, SAF dynamically learns the optimal feature scale combination for each task. Augmenting this, the Asymmetric Information Comparison Module (AICM) is introduced, enhancing inter-task feature interactions by discerning shared from unique features and judiciously applying attention mechanisms. Quantitative and qualitative evaluations on the PASCAL-Context and NYUD-v2 datasets confirm the superiority of our approach over existing state-of-the-art techniques.
- Badrinarayanan V, Kendall A, Cipolla R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation[J/OL]. arXiv:1511.00561 [Cs], 2016[2022–03–03]. http://arxiv.org/abs/1511.00561.Google Scholar
- Zhou Z, Fan X, Shi P, R-msfm: recurrent multi-scale feature modulation for monocular depth estimating[C].Google Scholar
- Xu D. PAD-net: multi-tasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing[C/OL]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT: IEEE, 2018: 675–684[2022–11–10]. https://ieeexplore.ieee.org/document/8578175/. DOI:10.1109/CVPR.2018.00077.Google ScholarCross Ref
- Vandenhende S, Georgoulis S, Van Gool L. MTI-net: multi-scale task interaction networks for multi-task learning:arXiv: 2001.06902[Z/OL].arXiv,2020(2020–07–08)[2023–08–04]. http://arxiv.org /abs/2001.06902.Google Scholar
- Zhang Y, Li H. 3D multi-attention guided multi-task learning network for automatic gastric tumor segmentation and lymph node classification[J]. IEEE Transactions on Medical Imaging, 2021, 40(6): 1618–1631. DOI:10.1109/TMI.2021.3062902.Google ScholarCross Ref
- Bruggemann D, Kanakis M, Obukhov A, Exploring relational context for multi-task dense prediction[J]. [no date]: 10.Google Scholar
- Xu Y, Li X, Yuan H, Yang Y, Zhang L. Multi-task learning with multi-query transformer for dense prediction[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023.Google Scholar
- Song T-J, Jeong J, Kim J-H. End-to-end real-time obstacle detection network for safe self-driving via multi-task learning[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(9): 16318–16329. DOI:10.1109/TITS.2022.3149789.Google ScholarDigital Library
- Qiu X, Xiao Y, Wang C, Pixel-pair occlusion relationship map(p2orm): formulation, inference & application[J/OL]. 2020. [2022–11–02]. http://arxiv.org/abs/2007.12088.Google Scholar
- Kundu J N, Lakkakula N, Babu R V. UM-adapt: unsupervised multi-task adaptation using adversarial cross-task distillation: arXiv: 1908.03884 [Z/OL]. arXiv, z2019(2019–09–16) [2022–11–02]. http:// arxiv.org/ abs/ 1908. 03884.Google Scholar
- Sarva Naveen Kumar and Ch. Sumanth Kumar, "Fusion of CNN-QCSO for Content Based Image Retrieval," Journal of Advances in Information Technology, Vol. 14, No. 4, pp. 668-673, 2023.Google ScholarCross Ref
- Guizhou Wang, An Li, Guojin He, Jianbo Liu, Zhaoming Zhang, and Mengmeng Wang, "Classification of High Spatial Resolution Remote Sensing Images Based on Decision Fusion," Vol. 8, No. 1, pp. 42-46, February, 2017. doi: 10.12720/jait.8.1.42-46.Google ScholarCross Ref
- Ye H, Xu D. Inverted pyramid multi-task transformer for dense scene understanding[J]. [no date]: 23.Google Scholar
- Xu X, Zhao H, Vineet V, Lim S, Torralba A. MTFormer: Multi-task learning via transformer and cross-task reasoning[C]. European Conference on Computer Vision, Tel Aviv, Israel, October 23-27, 2022: 304-321.Google ScholarDigital Library
- Wang J, Chen K, Xu R, CARAFE: content-aware reassembly of features: arXiv:1905.02188[Z/OL]. arXiv, 2019(2019–10–29) [2022–11–24]. http://arxiv.org/abs/1905.02188.Google Scholar
- Szegedy C, Liu W, Jia Y, Going deeper with convolutions: arXiv:1409.4842[Z/OL]. arXiv, 2014(2014–09–16) [2022–11–24]. http://arxiv.org/abs/1409.4842.Google Scholar
- Li F, Zhang Y, Cosman P C. MMMNet: an end-to-end multi-task deep convolution neural network with multi-scale and multi-hierarchy fusion for blind image quality assessment[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(12): 4798–4811. DOI:10.1109/TCSVT.2021.3055197.Google ScholarCross Ref
- Hu J, Shen L, Albanie S, Squeeze-and-excitation networks: arXiv:1709.01507[Z/OL]. arXiv, 2019(2019–05–16) [2022–11–09]. http://arxiv.org/abs/1709.01507.Google Scholar
- Chen X, Mottaghi R. Detect what you can: detecting and representing objects using holistic models and body parts: arXiv:1406.2031[Z/OL]. arXiv, 2014(2014–06–08) [2022–11–09]. http://arxiv.org/abs/1406. 2031.Google Scholar
- Silberman N, Hoiem D. Indoor segmentation and support inference from rgbd images[M/OL]. Springer Berlin Heidelberg, 2012: 746–760[2022–12–07]. http://link.springer.com/10.1007/978-3-642-33715-4_54. DOI:10.1007/978-3-642-33715-4_54.Google ScholarDigital Library
- Wang J, Sun K, Cheng T, Deep high-resolution representation learning for visual recognition: arXiv:1908.07919[Z/OL]. arXiv, 2020(2020–03–13) [2022–11–24]. http://arxiv.org/abs/1908.07919.Google Scholar
- Maninis K-K, Radosavovic I, Kokkinos I. Attentive single-tasking of multiple tasks: arXiv:1904.08918[Z/OL]. arXiv, 2019(2019–04–18) [2022–11–09]. http://arxiv.org/abs/1904.08918.Google Scholar
- Liu Z, Lin Y, Cao Y, Swin transformer: hierarchical vision transformer using shifted windows: arXiv:2103.14030[Z/OL]. arXiv, 2021(2021–08–17) [2023–01–27]. http://arxiv.org/abs/2103.14030.Google Scholar
Index Terms
- Multi-Scale Feature Adaptive Fusion for Multi-Task Dense Prediction
Recommendations
A deep recursive multi-scale feature fusion network for image super-resolution
AbstractRecently, Convolutional Neural Networks (CNNs) have achieved great success in Single Image Super-Resolution (SISR). In particular, the recursive networks are now widely used. However, existing recursion-based SISR networks can only ...
Adaptive fusion with multi-scale features for interactive image segmentation
AbstractMulti-scale features are usually utilized to improve the performance of interactive image segmentation, however, they have varying leverages over the result of segmentation, for example, thinner segmentation results could be achieved by pixel-...
Feature learning network with transformer for multi-label image classification
Highlights- A novel framework termed FL-Tran is proposed to solve the multi-label image classification task.
- A multi-scale fusion mechanism is designed to align high-level features and low-level features to learn multi-scale features.
- A ...
AbstractThe purpose of multi-label image classification task is to accurately assign a set of labels to the objects in images. Although promising results have been achieved, most of the existing methods cannot effectively learn multi-scale features, so ...
Comments