Abstract
Deep Convolutional Neural Networks (DCNNs) can well extract the features from natural images. However, the classification functions in the existing network architecture of CNNs are simple and lack capabilities to handle important spatial information as have been done by many well-known traditional variational image segmentation models. Priors such as spatial regularization, volume prior and shapes priors cannot be handled by existing DCNNs. We propose a novel Soft Threshold Dynamics (STD) framework which can integrate many spatial priors of the classic variational models into the DCNNs for image segmentation. The novelty of our method is to interpret the softmax activation function as a dual variable in a variational problem, and thus many spatial priors can be imposed in the dual space. From this viewpoint, we can build a STD based framework which can enable the outputs of DCNNs to have many special priors such as spatial regularization, volume preservation and star-shape prior. The proposed method is a general mathematical framework and it can be applied to any image segmentation DCNNs with a softmax classification layer. To show the efficiency of our method, we applied it to the popular DeepLabV3+ image segmentation network, and the experiments results show that our method can work efficiently on data-driven image segmentation DCNNs.
Similar content being viewed by others
References
Mumford, D., Shah, J.: Optimal approximations by piecewise smooth functions and associated variational problems. Commun. Pure Appl. Math. 42(5), 577–685 (1989)
Chan, T.F., Vese, L.A.: Active contours without edges. IEEE Trans. Image Process. 10(2), 266–277 (2001)
Cai, X., Chan, R., Schönlieb, C.-B., Steidl, G., Zeng, T.: Linkage between piecewise constant Mumford–Shah model and Rudin–Osher–Fatemi model and its virtue in image segmentation. SIAM J. Sci. Comput. 41(6), B1310–B1340 (2019)
Merriman, B., Bence, J.K., Osher, S.: Diffusion Generated Motion by Mean Curvature, vol. 27. Department of Mathematics, University of California, Los Angeles (1992)
Merriman, B., Bence, J.K., Osher, S.: Motion of multiple junctions: a level set approach. J. Comput. Phys. 112(2), 334–363 (1994)
Evans, L.C.: Convergence of an algorithm for mean curvature motion. Indiana Univ. Math. J. 42(2), 533–557 (1993)
Tai, X.-C., Christiansen, O., Lin, P., Skjælaaen, I.: Image segmentation using some piecewise constant level set methods with MBO type of projection. Int. J. Comput. Vis. 73(1), 61–76 (2007)
Esedoglu, S., Otto, F.: Threshold dynamics for networks with arbitrary surface tensions. Commun. Pure Appl. Math. 68(5), 808–864 (2015)
Esedog, S., Tsai, Y.H.R.: Threshold dynamics for the piecewise constant Mumford–Shah functional. J. Comput. Phys. 211(1), 367–384 (2006)
Wang, D., Li, H.-H., Wei, X.-Y., Wang, X.-P.: An efficient iterative thresholding method for image segmentation. J. Comput. Phys. 350, 657–667 (2017)
Merkurjev, E., Kostic, T., Bertozzi, A.L.: An MBO scheme on graphs for classification and image processing. SIAM J. Imaging Sci. 6(4), 1903–1930 (2013)
Ruuth, S.J., Wetton, B.T.R.: A simple scheme for volume-preserving motion by mean curvature. J. Sci. Comput. 19(1–3), 373–384 (2003)
Merriman, B., Ruuth, S.J.: Convolution-generated motion and generalized Huygens’ principles for interface motion. SIAM J. Appl. Math. 60(3), 868–890 (2000)
Van Gennip, Y., Guillen, N., Osting, B., Bertozzi, A.L.: Mean curvature, threshold dynamics, and phase field theory on finite graphs. Milan J. Math. 82(1), 3–65 (2014)
Jacobs, M., Merkurjev, E., Esedoglu, S.: Auction dynamics: A volume constrained MBO scheme. J. Comput. Phys. 354, 288–310 (2018)
Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2017)
Liu, Z., Li, X., Luo, P., Loy, C.-C., Tang, X.: Semantic image segmentation via deep parsing network. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1377–1385 (2015)
Ghiasi, G., Fowlkes, C.C.: Laplacian pyramid reconstruction and refinement for semantic segmentation. In: European Conference on Computer Vision, pp. 519–534. Springer (2016)
Lin, G., Shen, C., Van Den Hengel, A., Reid, I.: Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3194–3203 (2016)
Lin, G., Shen, C., Van Den Hengel, A., Reid, I.: Exploring context with deep structured models for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1352–1366 (2017)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Springer (2015)
Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J.: UNet++: a nested U-Net architecture for medical image segmentation. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pp. 3–11. Springer (2018)
Milletari, F., Navab, N., Ahmadi, S.-A.: V-Net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–57 (2016)
Brosch, T., Tang, L.Y., Yoo, Y., Li, D.K., Traboulsee, A., Tam, R.: Deep 3D convolutional encoder networks with shortcuts for multiscale feature integration applied to multiple sclerosis lesion segmentation. IEEE Trans. Med. Imaging 35(5), 1229–1239 (2016)
Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3D U-Net: learning dense volumetric segmentation from sparse annotation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 424–432. Springer (2016)
Zhang, Z.-X., Liu, Q.-J., Wang, Y.-H.: Road extraction by deep residual U-Net. IEEE Geosci. Remote Sens. Lett. 15(5), 749–753 (2018)
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2–4, 2016, Conference Track Proceedings (2016)
Wu, H.-K., Zhang, J.-G., Huang, K.-Q., Liang, K.-M., Yu, Y.-Z.: FastFCN: Rethinking dilated convolution in the backbone for semantic segmentation (2019). arXiv:1903.11816
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)
He, J., Deng, Z., Qiao, Y.: Dynamic multi-scale filters for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3562–3572 (2019)
Ding, H.-H., Jiang, X.-D., Shuai, B., Liu, A.Q., Wang, G.: Context contrasted feature and gated multi-scale aggregation for scene segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2393–2402 (2018)
He, J., Deng, Z., Zhou, L., Wang, Y., Qiao, Y.: Adaptive pyramid context network for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7519–7528 (2019)
Lin, D., Ji, Y.-F., Lischinski, D., Cohen-Or, D., Huang, H.: Multi-scale context intertwining for semantic segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 603–619 (2018)
Li, G., Xie, Y., Lin, L., Yu, Y.: Instance-level salient object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2386–2395 (2017)
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder–decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision, pp. 801–818 (2018)
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. Comput. Sci. 4, 357–361 (2014)
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)
Chen, L.-C., Papandreou, G., Schro, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation (2017). arXiv:1706.05587
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning, pp. 282–289 (2001)
Philipp, K., Koltun, V.: Efficient inference in fully connected CRFs with Gaussian edge potentials. In: Advances in Neural Information Processing Systems, pp. 109–117 (2011)
Vemulapalli, R., Tuzel, O., Liu, M.-Y., Chellapa, R.: Gaussian conditional random field network for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3224–3233 (2016)
De Brabandere, B., Neven, D., Van Gool, L.: Semantic instance segmentation with a discriminative loss function (2017). arXiv:1708.02551
Liu, S., Ding, W., Liu, C., Liu, Y., Wang, Y., Li, H.: ERN: edge loss reinforced semantic segmentation network for remote sensing images. Remote Sens 10(9), 1339 (2018)
Takikawa, T., Acuna, D., Jampani, V., Fidler, S.: Gated-SCNN: gated shape CNNs for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5229–5238 (2019)
Niu, R., Sun, X., Tian, Y., Diao, W., Chen, K., Fu, K.: Hybrid multiple attention network for semantic segmentation in aerial images. IEEE Trans. Geosci. Remote Sens. 60, 1–18 (2022)
Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W.: CCNet: Criss-cross attention for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 603–612 (2019)
Zhao, H., Zhang, Y., Liu, S., Shi, J., Loy, C.C., Lin, D., Jia, J.: PSANet: point-wise spatial attention network for scene parsing. In: Proceedings of the European Conference on Computer Vision, pp. 267–283 (2018)
Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr, P.H.S.: Conditional random fields as recurrent neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1529–1537 (2015)
Arnab, A., Jayasumana, S., Zheng, S., Torr, P.H.S.: Higher order conditional random fields in deep neural networks. In: European Conference on Computer Vision, pp. 524–540. Springer (2016)
Colovic, A., Knöbelreiter, P., Shekhovtsov, A., Pock, T.: End-to-end training of hybrid CNN-CRF models for semantic segmentation using structured learning. In: Computer Vision Winter Workshop, vol. 2 (2017)
Monteiro, M., Figueiredo, M.A.T., Oliveira, A.L.: Conditional random fields as recurrent neural networks for 3D medical imaging segmentation (2018). arXiv:1807.07464
Ranftl, R., Pock, T.: A deep variational model for image segmentation. In: Pattern Recognition, pp. 107–118 (2014)
Kobler, E., Klatzer, T., Hammernik, K., Pock, T.: Variational networks: connecting variational methods and deep learning. In: Pattern Recognition, pp. 281–293 (2017)
Jia, F., Liu, J., Tai, X.-C.: A regularized convolutional neural network for semantic image segmentation. Anal. Appl. 19(1), 147–165 (2021)
Potts, R.B.: Some generalized order-disorder transformations. Math. Proc. Camb. Philos. Soc. 48(1), 106–109 (1952)
Chambolle, A.: Total variation minimization and a class of binary mrf models. In: Energy Minimization Methods in Computer Vision and Pattern Recognition, pp. 136–152. Springer (2005)
Tai, X, Li, L., Bae, E.: The potts model with different piecewise constant representations and fast algorithms: a survey. In: Handbook of Mathematical Models and Algorithms in Computer Vision and Imaging: Mathematical Imaging and Vision, pp. 1–41 (2021)
Rother, C., Kolmogorov, V., Blake, A.: “GrabCut”: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23, 309–314 (2004)
Yuan, J., Bae, E., Tai, X.-C.: A study on continuous max-flow and min-cut approaches. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2217–2224 (2010)
Bresson, X., Esedoglu, S., Vandergheynst, P., Thiran, J.-P., Osher, S.: Fast global minimization of the active contour/snake model. J. Math. Imaging Vis. 28(2), 151–167 (2007)
Miranda, M. Jr., Pallara, D., Paronetto, F., Preunkert, M.: Short-time heat flow and functions of bounded variation in \(R^N\). Annales de la faculté des sciences de Toulouse Mathématiques, 16(1), 125–145 (2007)
Liu, J., Tai, X.-C., Huang, H., Huan, Z.: A fast segmentation method based on constraint optimization and its applications: Intensity inhomogeneity and texture segmentation. Pattern Recogn. 44(9), 2093–2108 (2011)
Wang, D., Wang, X.-P.: The iterative convolution-thresholding method (ICTM) for image segmentation (2019). arXiv:1904.10917
Ochs, P., Ranftl, R., Brox, T., Pock, T.: Techniques for gradient-based bilevel optimization with non-smooth lower level problems. J. Math. Imaging Vis. 56(2), 175–194 (2016)
Bae, E., Yuan, J., Tai, X.-C.: Global minimization for continuous multiphase partitioning problems using a dual approach. Int. J. Comput. Vis. 92(1), 112–129 (2011)
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York (2011)
Gao, B., Pavel, L.: On the properties of the softmax function with application in game theory and reinforcement learning (2017). arXiv:1704.00805
Liu, J., Tai, X.-C., Huang, H.-Y., Huan, Z.-D.: A weighted dictionary learning model for denoising images corrupted by mixed noise. IEEE Trans. Image Process. 22(3), 1108–1120 (2012)
Tao, P.D., An, L.T.H.: Convex analysis approach to dc programming: theory, algorithms and applications. Acta Math. Vietnam 22(1), 289–355 (1997)
Cuturi, M., Peyré, G.: A smoothed dual approach for variational Wasserstein problems. SIAM J. Imaging Sci. 9(1), 320–343 (2016)
Cuturi, M.: Sinkhorn distances: lightspeed computation of optimal transport. Adv. Neural Inf. Process. Syst. 26, 2292–2300 (2013)
Franklin, J., Lorenz, J.: On the scaling of multidimensional matrices. Linear Algebra Appl. 114, 717–735 (1989)
Veksler, O.: Star shape prior for graph-cut image segmentation. In: European Conference on Computer Vision, pp. 454–467. Springer (2008)
Yuan, J., Ukwatta, E., Tai, X.C., Fenster, A., Schnoerr, C.: A fast global optimization-based approach to evolving contours with generic shape prior. submission in IEEE TPAMI, also UCLA Tech. Report CAM, 1238 (2012)
Xiang, Y., Fox, D.: DA-RNN: semantic mapping with data associated recurrent neural networks (2017). arXiv:1703.03098
Visin, F., Ciccone, M., Romero, A., Kastner, K., Cho, K., Bengio, Y., Matteucci, M., Courville, A.: ReSeg: a recurrent neural network-based model for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 41–48 (2016)
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016)
Codella, N.C.F., Gutman, D., Celebi, M.E., Helba, B., Marchetti, M.A., Dusza, S.W., Kalloo, A., Liopyris, K., Mishra, N., Kittler, H., et al.: Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic). In: 2018 IEEE 15th International Symposium on Biomedical Imaging, pp. 168–172 (2018)
Tschandl, P., Rosendahl, C., Kittler, H.: The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data 5(1), 1–9 (2018)
Li, H., Liu, J., Cui, L., Huang, H., Tai, X.-C.: Volume preserving image segmentation with entropic regularization optimal transport and its applications in deep learning. J. Vis. Commun. Image Represent. 71, 1–14 (2020)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Liu was supported by the National Key Research and Development Program of China (No. 2017YFA0604903) and the National Natural Science Foundation of China (No. 11871035). The work of Tai was supported by Hong Kong Baptist University through grants RG(R)-RC/17-18/02-MATH, HKBU 12300819, NSF/RGC Grant N-HKBU214-19, ANR/RGC Joint Research Scheme (A-HKBU203-19) and RC-FNRA-IG/19-20/SCI/01.
Appendices
Calculating Subgradient of \({\mathcal {R}}\)
Since \({\mathcal {R}}\) is smooth and thus \(\partial {\mathcal {R}}(\varvec{u})=\{\delta {\mathcal {R}}(\varvec{u})\}\). Let us first calculate the directional derivative
Here \({\hat{k}}\) is the conjugate function of k and the last equation follows by the fact that \({\hat{k}}=k\) when k is a symmetric kernel function \(k(x)=k(-x)\) such as Gaussian kernel. Therefore \(\delta {\mathcal {R}}(\varvec{u})= \lambda ((k*(1-\varvec{u}))e-k*(e\varvec{u}))\) according to the variational equation \(\frac{\text {d} {\mathcal {R}}(\varvec{u}+\tau \varvec{v})}{\text {d}\tau }\Big |_{\tau =0}=\langle \varvec{v}, \delta {\mathcal {R}}(\varvec{u})\rangle \). Therefore \(p=\lambda ((k*(1-\varvec{u}))e-k*(e\varvec{u})).\)
Proof of Theorem 1
Proof
According to (7)
and thus
Since \({\mathcal {R}}(\varvec{u})\) is concave, by the definition of subgradient for a concave function, \(\forall \varvec{p}^{t_1}\in \partial {\mathcal {R}}(\varvec{u}^{t_1}),\) one can have
Therefore,
which completes the proof. \(\square \)
Proof of Proposition 3
Proof
By introducing Lagrangian multipliers \(\varvec{q}, \widehat{\varvec{q}}\) associated to the constraints \(\int _{\Omega }u_i(x){\text {d} x}=V_i\) and \(\sum _{i=1}^I u_{i}(x)=1,\forall x\in \Omega \), we have the related Lagrangian function
Then
The derivative of \({\mathcal {L}}\) with respect to \(u_i\)
therefore, by the first order optimization condition
Furthermore, using the condition
we can obtain
Substituting this into the saddle problem of \({\mathcal {L}}(\varvec{u},\varvec{q},\widehat{\varvec{q}})\), we can obtain
which completes the proof. \(\square \)
Proof of Theorem 2
Proof by contradiction. Suppose \(u^{-1}[\gamma ,+\infty )\) is not a star-shaped domain.
Then \(\exists y\in \partial u^{-1}[\gamma ,+\infty )\), \(\exists \zeta _1\in (0,1)\), s.t. \(z=(1-\zeta _1)y+\zeta _1c\notin u^{-1}[\gamma ,+\infty )\) according to the definition of star-shaped domain. It implies \(u(z)<\gamma \) by the definition of the \(\gamma \)-super level set.
Let \(f(\zeta )=u((1-\zeta )y+\zeta c), \zeta \in [0,1)\), then \(f\in C^1\) and \(f^{'}(\zeta )=\langle \nabla u((1-\zeta )y+\zeta c) , c-y\rangle \).
Since \(\langle \nabla u(x),\varvec{s}(x)\rangle \geqslant 0\) for all x, we have \(\langle \nabla u((1-\zeta )y+\zeta c),c-[(1-\zeta )y+\zeta c]\rangle \geqslant 0\) by taking \(x=(1-\zeta )y+\zeta c\) and \(\varvec{s}(x)=c-[(1-\zeta )y+\zeta c]\). Arranging the above formulation, one can get \(\langle \nabla u((1-\zeta )y+\zeta c),(1-\zeta )(c-y)\rangle \geqslant 0\).
So \(\langle \nabla u((1-\zeta )y+\zeta c),c-y\rangle \geqslant 0\) since \(1-\zeta >0\). It implies \(f^{'}(\zeta )\geqslant 0\) for \(\zeta \in [0,1)\). Here \(f^{'}(0)\) stands for \(f^{'}_{+}(0)\).
Therefore, we can conclude that \(f(\zeta )\) is monotone increasing when \(\zeta \in [0,1)\). However, \(f(0)=u(y)\geqslant \gamma \), \(f(\zeta _1)=u(z)<\gamma , 0<\zeta _1<1\). This is a contradiction with \(f(\zeta )\) is monotone increasing. The proof is completed.
Rights and permissions
About this article
Cite this article
Liu, J., Wang, X. & Tai, XC. Deep Convolutional Neural Networks with Spatial Regularization, Volume and Star-Shape Priors for Image Segmentation. J Math Imaging Vis 64, 625–645 (2022). https://doi.org/10.1007/s10851-022-01087-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10851-022-01087-x