Skip to main content
Log in

Frequency disentangled residual network

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Residual networks (ResNets) have been utilized for various computer vision and image processing applications. The residual connection improves the training of the network with better gradient flow. A residual block consists of a few convolutional layers having trainable parameters, which leads to overfitting. Moreover, the present residual networks are not able to utilize the high- and low-frequency information suitably, which also challenges the generalization capability of the network. In this paper, a frequency-disentangled residual network (FDResNet) is proposed to tackle these issues. Specifically, FDResNet includes separate connections in the residual block for low- and high-frequency components, respectively. Basically, the proposed model disentangles the low- and high-frequency components to increase the generalization ability. Moreover, the computation of low- and high-frequency components using fixed filters further avoids the overfitting. The proposed model is tested on benchmark CIFAR-10/100, Caltech, and TinyImageNet datasets for image classification. The performance of the proposed model is also tested in the image retrieval framework. It is noticed that the proposed model outperforms its counterpart residual model. The effect of kernel size and standard deviation is also evaluated. The impact of the frequency disentangling is also analyzed using a saliency map.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. https://www.cs.toronto.edu/~kriz/cifar.html.

  2. http://www.vision.caltech.edu/Image_Datasets/Caltech256/.

  3. https://www.kaggle.com/c/tiny-imagenet.

  4. In this notation used here as well as at other places, the 1-st value in the brackets is the kernel size for high-pass filtering and the 2-nd value is the kernel size for low-pass filtering.

References

  1. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105 (2012)

  2. Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., Alsaadi, F.E.: A survey of deep neural network architectures and their applications. Neurocomputing 234, 11–26 (2017)

    Article  Google Scholar 

  3. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)

    Article  ADS  CAS  PubMed  Google Scholar 

  4. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

  5. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  6. Dubey, S.R., Singh, S.K., Singh, R.K.: Rotation and illumination invariant interleaved intensity order-based local descriptor. IEEE Trans. Image Process. 23(12), 5323–5333 (2014)

    Article  ADS  MathSciNet  PubMed  Google Scholar 

  7. Dubey, S.R., Singh, S.K., Singh, R.K.: Local wavelet pattern: a new feature descriptor for image retrieval in medical ct databases. IEEE Trans. Image Process. 24(12), 5892–5903 (2015)

    Article  ADS  MathSciNet  PubMed  Google Scholar 

  8. Dubey, S.R., Singh, S.K., Singh, R.K.: Multichannel decoded local binary patterns for content-based image retrieval. IEEE Trans. Image Process. 25(9), 4018–4032 (2016)

    Article  ADS  MathSciNet  PubMed  Google Scholar 

  9. Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)

    Article  PubMed  Google Scholar 

  10. Basha, S.S., Dubey, S.R., Pulabaigari, V., Mukherjee, S.: Impact of fully connected layers on performance of convolutional neural networks for image classification. Neurocomputing 378, 112–119 (2020)

    Article  Google Scholar 

  11. Dubey, S.R., Chakraborty, S., Roy, S.K., Mukherjee, S., Singh, S.K., Chaudhuri, B.B.: Diffgrad: an optimization method for convolutional neural networks. IEEE Trans. Neural Netw. Learn. Syst. 31(11), 4500–4511 (2019)

    Article  MathSciNet  Google Scholar 

  12. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

  13. He, D., Xie, C.: Semantic image segmentation algorithm in a deep learning computer network. Multimed. Syst. 28(6), 2065–2077 (2022)

    Article  MathSciNet  Google Scholar 

  14. Ma, H., Liu, D., Xiong, R., Wu, F.: iwave: CNN-based wavelet-like transform for image compression. IEEE Trans. Multimed. 22(7), 1667–1679 (2019)

    Article  Google Scholar 

  15. Liu, L., Chen, T., Liu, H., Pu, S., Wang, L., Shen, Q.: 2c-net: integrate image compression and classification via deep neural network. Multimed. Syst. 29(3), 945–959 (2023)

    Article  Google Scholar 

  16. Srivastava, Y., Murali, V., Dubey, S.R.: A performance comparison of loss functions for deep face recognition. In: Seventh National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, pp. 322–332 (2019)

  17. Chen, Z., Chen, J., Ding, G., Huang, H.: A lightweight CNN-based algorithm and implementation on embedded system for real-time face recognition. Multimed. Syst. 29(1), 129–138 (2023)

    Article  Google Scholar 

  18. Choi, J.Y., Lee, B.: Combining of multiple deep networks via ensemble generalization loss, based on MRI images, for Alzheimer’s disease classification. IEEE Signal Process. Lett. 27, 206–210 (2020)

    Article  ADS  Google Scholar 

  19. Dubey, S.R., Roy, S.K., Chakraborty, S., Mukherjee, S., Chaudhuri, B.B.: Local bit-plane decoded convolutional neural network features for biomedical image retrieval. Neural Comput. Appl. 32, 7539–7551 (2020)

    Article  Google Scholar 

  20. Tian, C., Xu, Y., Zuo, W., Zhang, B., Fei, L., Lin, C.W.: Coarse-to-fine CNN for image super-resolution. IEEE Trans. Multimed. 23, 1489–1502 (2020)

    Article  Google Scholar 

  21. Liu, J., Ge, J., Xue, Y., He, W., Sun, Q., Li, S.: Multi-scale skip-connection network for image super-resolution. Multimed. Syst. 27, 821–836 (2021)

    Article  Google Scholar 

  22. Fang, M., Bai, X., Zhao, J., Yang, F., Hung, C.C., Liu, S.: Integrating gaussian mixture model and dilated residual network for action recognition in videos. Multimed. Syst. 26, 715–725 (2020)

    Article  Google Scholar 

  23. Tripathy, S.K., Kostha, H., Srivastava, R.: Ts-mda: two-stream multiscale deep architecture for crowd behavior prediction. Multimed. Syst. 29(1), 15–31 (2023)

    Article  Google Scholar 

  24. Que, Y., Li, S., Lee, H.J.: Attentive composite residual network for robust rain removal from single images. IEEE Trans. Multimed. 23, 3059–3072 (2020)

    Article  Google Scholar 

  25. Park, K., Soh, J.W., Cho, N.I.: Dynamic residual self-attention network for lightweight single image super-resolution. IEEE Trans. Multimed. 25, 907–918 (2023)

    Article  Google Scholar 

  26. Akbari, M., Liang, J., Han, J., Tu, C.: Learned multi-resolution variable-rate image compression with octave-based residual blocks. IEEE Trans. Multimed. 23, 3013–3021 (2021)

    Article  Google Scholar 

  27. Chen, S., Tan, X., Wang, B., Lu, H., Hu, X., Fu, Y.: Reverse attention-based residual network for salient object detection. IEEE Trans. Image Process. 29, 3763–3776 (2020)

    Article  ADS  Google Scholar 

  28. Zhou, W., Wu, J., Lei, J., Hwang, J.N., Yu, L.: Salient object detection in stereoscopic 3d images using a deep convolutional residual autoencoder. IEEE Trans. Multimed. 23, 3388–3399 (2020)

    Article  Google Scholar 

  29. Liu, S., Thung, K.H., Lin, W., Yap, P.T., Shen, D.: Real-time quality assessment of pediatric MRI via semi-supervised deep nonlocal residual neural networks. IEEE Trans. Image Process. 29, 7697–7706 (2020)

    Article  ADS  Google Scholar 

  30. Tang, C., Liu, X., An, S., Wang, P.: Br 2net: Defocus blur detection via a bidirectional channel attention residual refining network. IEEE Trans. Multimed. 23, 624–635 (2020)

    Article  Google Scholar 

  31. Yeh, C.H., Huang, C.H., Kang, L.W.: Multi-scale deep residual learning-based single image haze removal via image decomposition. IEEE Trans. Image Process. 29, 3153–3167 (2019)

    Article  ADS  Google Scholar 

  32. Fan, Y.Y., Liu, S., Li, B., Guo, Z., Samal, A., Wan, J., Li, S.Z.: Label distribution-based facial attractiveness computation by deep residual learning. IEEE Trans. Multimed. 20(8), 2196–2208 (2017)

    Article  Google Scholar 

  33. Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)

  34. Perez, L., Wang, J.: The effectiveness of data augmentation in image classification using deep learning. arXiv preprint arXiv:1712.04621 (2017)

  35. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Machine Learning Res. 15(1), 1929–1958 (2014)

    MathSciNet  Google Scholar 

  36. Agrawal, A., Mittal, N.: Using CNN for facial expression recognition: a study of the effects of kernel size and number of filters on accuracy. Visual Comput. 36(2), 405–412 (2020)

    Article  Google Scholar 

  37. Hayou, S., Doucet, A., Rousseau, J.: On the selection of initialization and activation function for deep neural networks. arXiv preprint arXiv:1805.08266 (2018)

  38. Sineesh, A., Raveendranatha Panicker, M.: Edge preserved universal pooling: novel strategies for pooling in convolutional neural networks. Multimed. Syst. 29, 1277–1290 (2023)

    Article  Google Scholar 

  39. Yedla, R.R., Dubey, S.R.: On the performance of convolutional neural networks under high and low frequency information. In: International Conference on Computer Vision and Image Processing, pp. 214–224 (2021)

  40. Vuilleumier, P., Armony, J.L., Driver, J., Dolan, R.J.: Distinct spatial frequency sensitivities for processing faces and emotional expressions. Nat. Neurosci. 6(6), 624–631 (2003)

    Article  CAS  PubMed  Google Scholar 

  41. Monson, B.B., Hunter, E.J., Lotto, A.J., Story, B.H.: The perceptual significance of high-frequency energy in the human voice. Front. Psychol. 5, 587 (2014)

    Article  PubMed  PubMed Central  Google Scholar 

  42. Yi, Z., Chen, Z., Cai, H., Mao, W., Gong, M., Zhang, H.: Bsd-gan: branched generative adversarial network for scale-disentangled representation learning and image synthesis. IEEE Trans. Image Process. 29, 9073–9083 (2020)

    Article  ADS  Google Scholar 

  43. Chen, X., Wang, Y., Liu, J., Qiao, Y.: Did: Disentangling-imprinting-distilling for continuous low-shot detection. IEEE Trans. Image Process. 29, 7765–7778 (2020)

    Article  ADS  Google Scholar 

  44. Chen, H., Deng, Y., Li, Y., Hung, T.Y., Lin, G.: Rgbd salient object detection via disentangled cross-modal fusion. IEEE Trans. Image Process. 29, 8407–8416 (2020)

    Article  ADS  Google Scholar 

  45. Kottayil, N.K., Valenzise, G., Dufaux, F., Cheng, I.: Blind quality estimation by disentangling perceptual and noisy features in high dynamic range images. IEEE Trans. Image Process. 27(3), 1512–1525 (2017)

    Article  ADS  MathSciNet  PubMed  Google Scholar 

  46. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  47. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)

  48. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 4700–4708 (2017)

  49. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 1492–1500 (2017)

  50. Huang, G., Sun, Y., Liu, Z., Sedra, D., Weinberger, K.: Deep networks with stochastic depth. In: European Conference on Computer Vision. pp. 646–661 (2016).

  51. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)

  52. Krizhevsky, A.: Learning multiple layers of features from tiny images. Master’s thesis, University of Tront (2009)

  53. Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset. CalTech Report (2007)

  54. Le, Y., Yang, X.: Tiny imagenet visual recognition challenge. CS 231N, 7 (2015)

    Google Scholar 

  55. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: Visual explanations from deep networks via gradient-based localization. In: IEEE International Conference on Computer Vision, pp. 618–626 (2017)

  56. Dubey, S.R.: A decade survey of content based image retrieval using deep learning. IEEE Trans. Circ. Syst. Video Technol. 32(5), 2687–2704 (2021)

    Article  Google Scholar 

Download references

Acknowledgements

This research is funded by the Global Innovation and Technology Alliance (GITA) on Behalf of Department of Science and Technology (DST), Govt. of India under India–Taiwan joint project with Project Code GITA/DST/TWN/P-83/2019. We would like to acknowledge the NVIDIA for supporting with NVIDIA GPUs and Google Colab service for providing free computational resources which have been used in this research for the experiments.

Author information

Authors and Affiliations

Authors

Contributions

Equal contribution by SRS and RRY in terms of conducting experiments and computing the results. SRD conceptualized the idea, wrote the paper and supervised the work. RKS and W-TC edited the paper and provided the supervision.

Corresponding author

Correspondence to Shiv Ram Dubey.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Communicated by B. Bao.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Singh, S.R., Yedla, R.R., Dubey, S.R. et al. Frequency disentangled residual network. Multimedia Systems 30, 9 (2024). https://doi.org/10.1007/s00530-023-01232-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00530-023-01232-5

Keywords

Navigation