Skip to main content
Log in

Don’t Be So Dense: Sparse-to-Sparse GAN Training Without Sacrificing Performance

  • Manuscript
  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

This paper does not describe a novel method. Instead, it studies an incremental, yet must-know baseline given the recent progress in sparse neural network training and Generative Adversarial Networks (GANs). GANs have received an upsurging interest since being proposed due to the high quality of the generated data. While achieving increasingly impressive results, the resource demands associated with the large model size hinders the usage of GANs in resource-limited scenarios. For inference, the existing model compression techniques can reduce the model complexity with comparable performance. However, the training efficiency of GANs has less been explored due to the fragile training process of GANs. In this paper, we, for the first time, explore the possibility of directly training sparse GAN from scratch without involving any dense or pre-training steps. Even more unconventionally, our proposed method enables directly training sparse unbalanced GANs with an extremely sparse generator from scratch. Instead of training full GANs, we start with sparse GANs and dynamically explore the parameter space spanned over the generator throughout training. Such a sparse-to-sparse training procedure enhances the capacity of the highly sparse generator progressively while sticking to a fixed small parameter budget with appealing training and inference efficiency gains. Extensive experiments with modern GAN architectures validate the effectiveness of our method. Our sparsified GANs, trained from scratch in one single run, are able to outperform the ones learned by expensive iterative pruning and re-training. Perhaps most importantly, we find instead of inheriting parameters from expensive pre-trained GANs, directly training sparse GANs from scratch can be a much more efficient solution. For example, only training with a 80% sparse generator and a 70% sparse discriminator, our method can achieve even better performance than the dense BigGAN.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Sparse unbalanced GANs refer to the scenarios where the sparsities of generators and discriminators are not well-matched.

  2. https://github.com/ajbrock/BigGAN-PyTorch.

  3. The training configurations and hyperparameters of BigGAN and SNGAN are obtained from the open-source implementations https://github.com/ajbrock/BigGAN-PyTorch and https://github.com/VITA-Group/GAN-LTH, respectively.

  4. Our sparse StyleGAN2 is based on the official repository https://github.com/NVlabs/stylegan3.

References

  • Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In: Advances in Neural Information Processing Systems (Vol. 27).

  • Zhu, J.-Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2223–2232).

  • Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein generative adversarial networks. In International Conference on Machine Learning (pp. 214–223). PMLR.

  • Miyato, T., Kataoka, T., Koyama, M., & Yoshida, Y. (2018). Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957.

  • Miyato, T., & Koyama, M. (2018). CGANS with projection discriminator. arXiv preprint arXiv:1802.05637.

  • Brock, A., Donahue, J., & Simonyan, K. (2018). Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096.

  • Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2017). Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196.

  • Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4401–4410).

  • Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., & Aila, T. (2020). Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8110–8119).

  • Shu, H., Wang, Y., Jia, X., Han, K., Chen, H., Xu, C., Tian, Q., & Xu, C. (2019). Co-evolutionary compression for unpaired image translation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 3235–3244).

  • Li, M., Lin, J., Ding, Y., Liu, Z., Zhu, J.-Y., & Han, S. (2020). Gan compression: Efficient architectures for interactive conditional gans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(pp. 5284–5294).

  • Chen, H., Wang, Y., Shu, H., Wen, C., Xu, C., Shi, B., Xu, C., & Xu, C. (2020). Distilling portable generative adversarial networks for image translation. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, pp. 3585–3592).

  • Wang, H., Gui, S., Yang, H., Liu, J., & Wang, Z. (2020). Gan slimming: All-in-one gan compression by a unified optimization framework. In European Conference on Computer Vision (pp. 54–73). Springer.

  • Wang, P., Wang, D., Ji, Y., Xie, X., Song, H., Liu, X., Lyu, Y., & Xie, Y. (2019). Qgan: Quantized generative adversarial networks. arXiv preprint arXiv:1901.08263.

  • Frankle, J., & Carbin, M. (2018). The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635.

  • Chen, X., Zhang, Z., Sui, Y., & Chen, T. (2021). Gans can play lottery tickets too. arXiv preprint arXiv:2106.00134.

  • Chen, T., Cheng, Y., Gan, Z., Liu, J., & Wang, Z. (2021). Ultra-data-efficient gan training: Drawing a lottery ticket first, then training it toughly. arXiv preprint arXiv:2103.00397.

  • Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and policy considerations for deep learning in nlp. arXiv preprint arXiv:1906.02243.

  • Patterson, D., Gonzalez, J., Le, Q., Liang, C., Munguia, L.-M., Rothchild, D., So, D., Texier, M., & Dean, J. (2021). Carbon emissions and large neural network training. arXiv preprint arXiv:2104.10350.

  • Mocanu, D. C., Mocanu, E., Nguyen, P. H., Gibescu, M., & Liotta, A. (2016). A topological insight into restricted Boltzmann machines. Machine Learning, 104, 243–270.

    Article  MathSciNet  MATH  Google Scholar 

  • Evci, U., Pedregosa, F., Gomez, A., & Elsen, E. (2019). The difficulty of training sparse neural networks. arXiv preprint arXiv:1906.10732.

  • Yu, C., & Pool, J. (2020). Self-supervised gan compression. arXiv preprint arXiv:2007.01491.

  • Berthelot, D., Schumm, T., & Metz, L. (2017). Began: Boundary equilibrium generative adversarial networks. arXiv preprint arXiv:1703.10717.

  • Ham, H., Jun, T. J., & Kim, D. (2020). Unbalanced gans: Pre-training the generator of generative adversarial network using variational autoencoder. arXiv preprint arXiv:2002.02112.

  • Chen, Y., Lai, Y.-K., &Liu, Y.-J. (2018). Cartoongan: Generative adversarial networks for photo cartoonization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 9465–9474).

  • Jing, Y., Yang, Y., Feng, Z., Ye, J., Yu, Y., & Song, M. (2019). Neural style transfer: A review. IEEE Transactions on Visualization and Computer Graphics, 26(11), 3365–3385.

    Article  Google Scholar 

  • Gui, J., Sun, Z., Wen, Y., Tao, D., & Ye, J. (2020). A review on generative adversarial networks: Algorithms, theory, and applications. arXiv preprint arXiv:2001.06937.

  • Mariet, Z., & Sra, S. (2016). Diversity networks: Neural network compression using determinantal point processes. In International Conference on Learning Representations. arXiv:1511.05077.

  • He, Y., Zhang, X., & Sun, J. (2017). Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1389–1397).

  • Suau, X., Zappella, L., & Apostoloff, N. (2019). Network compression using correlation analysis of layer responses. https://openreview.net/forum?id=rkl42iA5t7.

  • Gale, T., Elsen, E., & Hooker, S. (2019). The state of sparsity in deep neural networks. arXiv preprint arXiv:1902.09574.

  • Mocanu, D. C., Mocanu, E., Nguyen, P. H., Gibescu, M., & Liotta, A. (2016). A topological insight into restricted boltzmann machines. Machine Learning, 104(2), 243–270.

    Article  MathSciNet  MATH  Google Scholar 

  • Kepner, J., & Robinett, R. (2019). Radix-net: Structured sparse matrices for deep neural networks. In 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) (pp. 268–274). IEEE.

  • Mocanu, D. C., Mocanu, E., Stone, P., Nguyen, P. H., Gibescu, M., & Liotta, A. (2018). Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science. Nature Communications, 9(1), 2383.

    Article  Google Scholar 

  • Evci, U., Gale, T., Menick, J., Castro, P. S., & Elsen, E. (2020). Rigging the lottery: Making all tickets winners. In International Conference on Machine Learning (pp. 2943–2952). PMLR

  • Mostafa, H., & Wang, X. (2019). Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization. In International Conference on Machine Learning.

  • Dettmers, T., & Zettlemoyer, L. (2019). Sparse networks from scratch: Faster training without losing performance. arXiv preprint arXiv:1907.04840.

  • Liu, S., Mocanu, D. C., Pei, Y., & Pechenizkiy, M. (2021). Selfish sparse RNN training. In Proceedings of the 39th International Conference on Machine Learning (pp. 6893–6904). PMLR.

  • Dietrich, A., Gressmann, F., Orr, D., Chelombiev, I., Justus, D., & Luschi, C. (2021). Towards structured dynamic sparse pre-training of bert. arXiv preprint arXiv:2108.06277.

  • Liu, S., Yin, L., Mocanu, D. C., & Pechenizkiy, M. (2021). Do we actually need dense over-parameterization? in-time over-parameterization in sparse training. In Proceedings of the 39th International Conference on Machine Learning (pp. 6989–7000). PMLR.

  • Gale, T., Zaharia, M., Young, C., & Elsen, E. (2020). Sparse gpu kernels for deep learning. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 1–14). IEEE.

  • Liu, S., Mocanu, D. C., Matavalam, A. R. R., Pei, Y., & Pechenizkiy, M. (2020). Sparse evolutionary deep learning with over one million artificial neurons on commodity hardware. Neural Computing and Applications.

  • Nvidia (2020). Nvidia a100 tensor core gpu architecture. https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/nvidia-ampere-architecture-whitepaper.pdf.

  • Zhou, A., Ma, Y., Zhu, J., Liu, J., Zhang, Z., Yuan, K., Sun, W., & Li, H. (2021). Learning n:m fine-grained structured sparse neural networks from scratch. In International Conference on Learning Representations. https://openreview.net/forum?id=K9bw7vqp_s.

  • Han, S., Pool, J., Tran, J., & Dally, W. (2015). Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems (pp. 1135–1143).

  • Liu, S., Chen, T., Chen, X., Shen, L., Mocanu, D.C., Wang, Z., & Pechenizkiy, M. (2022). The unreasonable effectiveness of random pruning: Return of the most naive baseline for sparse training. In International Conference on Learning Representations. https://openreview.net/forum?id=VBZJ_3tz-t.

  • Lee, N., Ajanthan, T., & Torr, P. H. (2018). Snip: Single-shot network pruning based on connection sensitivity. In International Conference on Learning Representations.

  • Wang, C., Zhang, G., & Grosse, R. (2020). Picking winning tickets before training by preserving gradient flow. In International Conference on Learning Representations. https://openreview.net/forum?id=SkgsACVKPH.

  • Molchanov, P., Tyree, S., Karras, T., Aila, T., & Kautz, J. (2016). Pruning convolutional neural networks for resource efficient inference. In International Conference on Learning Representations.

  • Chechik, G., Meilijson, I., & Ruppin, E. (1998). Synaptic pruning in development: A computational account. Neural Computation, 10, 2418–2427.

    Article  Google Scholar 

  • Chechik, G., Meilijson, I., & Ruppin, E. (1998). Neuronal regulation: A mechanism for synaptic pruning during brain maturation. Neural Computation, 11, 11–8.

    Google Scholar 

  • Craik, F. I. M., & Bialystok, E. (2006). Cognition through the lifespan: Mechanisms of change. In Trends in Cognitive Sciences (pp. 131–138).

  • Yazıcı, Y., Foo, C.-S., Winkler, S., Yap, K.-H., Piliouras, G., & Chandrasekhar, V. (2019). The unusual effectiveness of averaging in GAN training. In International Conference on Learning Representations. https://openreview.net/forum?id=SJgw_sRqFQ.

  • Gidel, G., Berard, H., Vignoud, G., Vincent, P., & Lacoste-Julien, S. (2019). A variational inequality perspective on generative adversarial networks. In International Conference on Learning Representations. https://openreview.net/forum?id=r1laEnA5Ym.

  • Mescheder, L., Nowozin, S., & Geiger, A. (2018). Which training methods for gans do actually converge? In International Conference on Machine Learning (ICML).

  • Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

  • Kang, M., Shim, W., Cho, M., & Park, J. (2021). Rebooting acgan: Auxiliary classifier gans with stable training. Advances in Neural Information Processing Systems, 34, 23505–23518.

    Google Scholar 

  • Han, S., Mao, H., & Dally, W. J. (2015). Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149.

  • Karras, T., Aittala, M., Laine, S., Härkönen, E., Hellsten, J., Lehtinen, & J., Aila, T. (2021). Alias-free generative adversarial networks. In Proc. NeurIPS.

  • Elsen, E., Dukhan, M., Gale, T., & Simonyan, K. (2020). Fast sparse convnets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 14629–14638).

  • Ashby, M., Baaij, C., Baldwin, P., Bastiaan, M., Bunting, O., Cairncross, A., Chalmers, C., Corrigan, L., Davis, S., & van Doorn, N., et al. (2019). Exploiting unstructured sparsity on next-generation datacenter hardware. None.

  • Kurtz, M., Kopinsky, J., Gelashvili, R., Matveev, A., Carr, J., Goin, M., Leiserson, W., Moore, S., Nell, B., Shavit, N., & Alistarh, D. (2020). Inducing and exploiting activation sparsity for fast inference on deep neural networks. In III, H.D., Singh, A. (Eds.), Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research (Vol. 119, pp. 5533–5543). PMLR, Virtual. http://proceedings.mlr.press/v119/kurtz20a.html.

  • Kurtic, E., Campos, D., Nguyen, T., Frantar, E., Kurtz, M., Fineran, B., Goin, M., & Alistarh, D. (2022). The optimal bert surgeon: Scalable and accurate second-order pruning for large language models. arXiv preprint arXiv:2203.07259.

  • Liu, S., Mocanu, D. C., Matavalam, A. R. R., Pei, Y., & Pechenizkiy, M. (2021). Sparse evolutionary deep learning with over one million artificial neurons on commodity hardware. Neural Computing and Applications, 33(7), 2589–2604.

    Article  Google Scholar 

  • Curci, S., Mocanu, D. C., & Pechenizkiyi, M. (2021). Truly sparse neural networks at scale. arXiv preprint arXiv:2102.01732.

  • Atashgahi, Z., Sokar, G., van der Lee, T., Mocanu, E., Mocanu, D.C., Veldhuis, R., & Pechenizkiy, M. (2020). Quick and robust feature selection: the strength of energy-efficient sparse training for autoencoders. arXiv preprint arXiv:2012.00560.

  • Yen, I.E.-H., Xiao, Z., & Xu, D. (2022). S4: A high-sparsity, high-performance ai accelerator. arXiv preprint arXiv:2207.08006.

  • Liu, S., & Wang, Z. (2023). Ten lessons we have learned in the new “sparseland”: A short handbook for sparse neural network researchers. arXiv preprint arXiv:2302.02596.

Download references

Acknowledgements

Li Shen is supported by Science and Technology Innovation 2030— “Brain Science and Brain-like Research” Major Project (No. 2021ZD0201402 and No. 2021ZD0201405). Shiwei Liu is in part supported by the NSF AI Institute for Foundations of Machine Learning (IFML). Part of this work used the Dutch national e-infrastructure with the support of the SURF Cooperative using Grant No. NWO2021.060.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Shen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, S., Tian, Y., Chen, T. et al. Don’t Be So Dense: Sparse-to-Sparse GAN Training Without Sacrificing Performance. Int J Comput Vis 131, 2635–2648 (2023). https://doi.org/10.1007/s11263-023-01824-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-023-01824-8

Keywords

Navigation