Abstract
Deep convolution neural networks (DCNNs) in deep learning have been widely used in semantic segmentation. However, the filters of most regular convolutions in DCNNs are spatially invariant to local transformations, which reduces localization accuracy and hinders the improvement of semantic segmentation. Dynamic convolution with pixel-level filters can enhance the localization accuracy through its region-awareness, but these are sensitive to objects with large-scale variations in semantic segmentation. To simultaneously address the low localization accuracy and objects with large-scale variations, we propose a filter-varying atrous convolution (FAC) to efficiently enlarge the per-pixel receptive fields pertaining to various objects. FAC mainly consists of a conditional-filter-generating network (CFGN) and a dynamic local filtering operation (DLFO). In the CFGN, a class probability map is used to generate the corresponding filters, making the FAC genuinely dynamic. In the DLFO, by replacing the sliding convolution operation one by one with a one-time dot product operation, the efficiency of the algorithm is greatly improved. Also, a dense scale module (DSM) is constructed to generate denser scales and larger receptive fields for exploring long-range contextual information. Finally, a dense-scale dynamic network (DsDNet) simultaneously enhances the localization accuracy and reduces the effect of large-scale variations of the object, by assigning FAC to different spatial locations at dense scales. In addition, to accelerate network convergence and improve segmentation accuracy, our network employs two pixel-wise cross-entropy loss functions. One is between the Backbone and DSM, and the other is at the network’s end. Extensive experiments on Cityscapes, PASCAL VOC 2012, and ADE20K datasets verify that the performance of our DsDNet is superior to the non-dynamic and multi-scale convolution neural networks.
Similar content being viewed by others
References
Li Z, Jiang J, Chen X, Qi H, Li Q, Liu J, Zheng L, Liu M, Zhang Y (2022) Superdense-scale network for semantic segmentation. Neurocomputing 504:30–41
Wang D, Zhang J, Du B, Zhang L, Tao D (2023) Dcn-t: Dual context network with transformer for hyperspectral image classification. IEEE Trans Image Process 32:2536–2551. https://doi.org/10.1109/TIP.2023.3270104
Sang S, Zhou Y, Islam MT, Xing L (2023) Small-object sensitive segmentation using across feature map attention. IEEE Trans Pattern Anal Mach Intell 45(5):6289–6306. https://doi.org/10.1109/TPAMI.2022.3211171
Zhang J, Liu Y, Guo C, Zhan J (2022) Optimized segmentation with image inpainting for semantic mapping in dynamic scenes. Appl Intell 1–16
Hou C, Zhang W, Wang H, Liu F, Liu D, Chang J (2022) A semantic segmentation model for lumbar mri images using divergence loss. Appl Intell 1–14
Wang C, Zhong J, Dai Q, Li R, Yu Q, Fang B (2022) Local structure consistency and pixel-correlation distillation for compact semantic segmentation. Appl Intell 1–17
Minaee S, Boykov Y, Porikli F, Plaza A, Kehtarnavaz N, Terzopoulos D (2022) Image segmentation using deep learning: A survey. IEEE Trans Pattern Anal Mach Intell 44(7):3523–3542. https://doi.org/10.1109/TPAMI.2021.3059968
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184
Kwon HJ, Koo HI, Soh JW, Cho NI (2022) Inverse-based approach to explaining and visualizing convolutional neural networks. IEEE Trans Neural Netw Learn Syst 33(12):7318–7329. https://doi.org/10.1109/TNNLS.2021.3084757
Liu J, He J, Qiao Y, Ren JS, Li H (2020) Learning to predict contextadaptive convolution for semantic segmentation. In: Vedaldi A, Bischof H, Brox T, Frahm J-M (eds) Computer Vision - ECCV 2020. Springer, Cham, pp 769–786
Yang B, Bender G, Le QV, Ngiam J (2019) Condconv: Conditionally parameterized convolutions for efficient inference. In: Advances in neural information processing systems, pp 1307–1318
Chen Y, Dai X, Liu M, Chen D, Yuan L, Liu Z (2020) Dynamic convolution: Attention over convolution kernels. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11030–11039
Dong Q, Gong S, Zhu X (2018) Imbalanced deep learning by minority class incremental rectification. IEEE Trans Pattern Anal Mach Intell 41(6):1367–1381
Chen J, Wang X, Guo Z, Zhang X, Sun J (2021) Dynamic region-aware convolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8064–8073
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3213–3223
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88(2):303–338
Zhou B, Zhao H, Puig X, Fidler S, Barriuso A, Torralba A (2017) Scene parsing through ade20k dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 633–641
Yu B, Jiao L, Liu X, Li L, Liu F, Yang S, Tang X (2022) Entire deformable convnets for semantic segmentation. Knowl-Based Syst 108871
Lu L, Xiao Y, Chang X, Wang X, Ren P, Ren Z (2022) Deformable attention-oriented feature pyramid network for semantic segmentation. Knowl-Based Syst 109623
Zhou J, Jampani V, Pi Z, Liu Q, Yang M-H (2021) Decoupled dynamic filter networks. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6643–6652 . https://doi.org/10.1109/CVPR46437.2021.00658
Ding J, Xue N, Xia G-S, Bai X, Yang W, Yang MY, Belongie S, Luo J, Datcu M, Pelillo M, Zhang L (2022) Object detection in aerial images: A large-scale benchmark and challenges. IEEE Trans Pattern Anal Mach Intell 44(11):7778–7796. https://doi.org/10.1109/TPAMI.2021.3117983
Liu Y, Fan B, Wang L, Bai J, Xiang S, Pan C (2018) Semantic labeling in very high resolution images via a self-cascaded convolutional neural network. ISPRS J Photogrammetry Remote Sensing 145:78–95
Yang M, Yu K, Zhang C, Li Z, Yang K (2018) Denseaspp for semantic segmentation in street scenes. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 3684–3692
Xu J, Li Y, Wang S (2022) Adazoom: Towards scale-aware large scene object detection. IEEE Trans Multimedia 1–1. https://doi.org/10.1109/TMM.2022.3178871
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881–2890
Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoderdecoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818
He J, Deng Z, Zhou L, Wang Y, Qiao Y (2019) Adaptive pyramid context network for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7519–7528
Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3146–3154
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3146–3154
Zhang H, Dana K, Shi J, Zhang Z, Wang X, Tyagi A, Agrawal A (2018) Context encoding for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7151–7160
Wang J, Sun K, Cheng T, Jiang B, Deng C, Zhao Y, Liu D, Mu Y, Tan M, Wang X, et al. (2020) Deep high-resolution representation learning for visual recognition. IEEE Trans Pattern Anal Mach Intell
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Jia X, De Brabandere B, Tuytelaars T, Gool LV (2016) Dynamic filter networks. In: Advances in neural information processing systems, pp 667–675
Rota Buló S, Porzi L, Kontschieder P (2018) In-place activated batchnorm for memory-optimized training of dnns. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5639–5647
Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. In: International conference on learning representations, pp 10–19
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252
Chen L-C, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587
Liu C, Chen L-C, Schroff F, Adam H, Hua W, Yuille AL, Fei-Fei L (2019) Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 82–92
Chen L-C, Collins M, Zhu Y, Papandreou G, Zoph B, Schroff F, Adam H, Shlens J (2018) Searching for efficient multi-scale architectures for dense image prediction. In: Advances in neural information processing systems, pp 8699–8710
Wang H, Zhu Y, Green B, Adam H, Yuille A, Chen L-C (2020) Axial-deeplab: Stand-alone axial-attention for panoptic segmentation. In: European conference on computer vision, pp 108–126
Yuan Y, Chen X, Wang J (2020) Object-contextual representations for semantic segmentation. In: European conference on computer vision, pp 173–190
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision, pp 740–755
Lin G, Shen C, Van Den Hengel A, Reid I (2016) Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3194–3203
Zhang H, Zhang H, Wang C, Xie J (2019) Co-occurrent features in semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 548–557
He J, Deng Z, Qiao Y (2019) Dynamic multi-scale filters for semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 3562–3572
Hariharan B, Arbeláez P, Girshick R, Malik J (2015) Hypercolumns for object segmentation and fine-grained localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 447–456
Huang Z, Wang X, Huang L, Huang C, Wei Y, Liu W (2019) Ccnet: Criss-cross attention for semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 603–612
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7794–7803
Chen Y, Kalantidis Y, Li J, Yan S, Feng J (2018) A \(\hat{}\) 2-nets: Double attention networks. In: Advances in neural information processing systems, pp 352–361
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Acknowledgements
This work was supported in part by the National Key Research and Development Project of China (2017YFE0100700), in part by the National Natural Science Foundation of China (Grant No.41871340), in part by the Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University, No. 2020KEY003, in part by Beijing Municipal Science & Technology Commission No. Z201100003920003, in part by the Key Research Project of Shanghai Agricultural Science and Technology (Grant No. SASTI-2018-2-1), in part by the Fundamental Research Funds for the Central Universities, and in part by Shanghai "Science and Technology Innovation Action Plan" Project (22002400300)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Jie Jiang, Xi Chen, Robert Laganière, Qingli Li, Min Liu, Honggang Qi, Yong Wang and Min Zhang contributed equally to this work.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, Z., Jiang, J., Chen, X. et al. Dense-scale dynamic network with filter-varying atrous convolution for semantic segmentation. Appl Intell 53, 26810–26826 (2023). https://doi.org/10.1007/s10489-023-04935-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04935-4