Temporally consistent video colorization with deep feature propagation and self-regularization learning

Liu, Yihao; Zhao, Hengyuan; Chan, Kelvin C. K.; Wang, Xintao; Loy, Chen Change; Qiao, Yu; Dong, Chao

doi:10.1007/s41095-023-0342-8

Temporally consistent video colorization with deep feature propagation and self-regularization learning

Research Article
Open access
Published: 03 January 2024

Volume 10, pages 375–395, (2024)
Cite this article

Download PDF

You have full access to this open access article

Computational Visual Media Aims and scope Submit manuscript

Temporally consistent video colorization with deep feature propagation and self-regularization learning

Download PDF

Yihao Liu^1,2^na1,
Hengyuan Zhao¹^na1,
Kelvin C. K. Chan³,
Xintao Wang⁴,
Chen Change Loy³,
Yu Qiao¹ &
…
Chao Dong¹

1164 Accesses
6 Citations
Explore all metrics

Abstract

Video colorization is a challenging and highly ill-posed problem. Although recent years have witnessed remarkable progress in single image colorization, there is relatively less research effort on video colorization, and existing methods always suffer from severe flickering artifacts (temporal inconsistency) or unsatisfactory colorization. We address this problem from a new perspective, by jointly considering colorization and temporal consistency in a unified framework. Specifically, we propose a novel temporally consistent video colorization (TCVC) framework. TCVC effectively propagates frame-level deep features in a bidirectional way to enhance the temporal consistency of colorization. Furthermore, TCVC introduces a self-regularization learning (SRL) scheme to minimize the differences in predictions obtained using different time steps. SRL does not require any ground-truth color videos for training and can further improve temporal consistency. Experiments demonstrate that our method can not only provide visually pleasing colorized video, but also with clearly better temporal consistency than state-of-the-art methods. A video demo is provided at https://www.youtube.com/watch?v=c7dczMs-olE, while code is available at https://github.com/lyh-18/TCVC-Temporally-Consistent-Video-Colorization.

Article PDF

Automatic video colorization based on contrastive learning and optical flow

Article 02 January 2024

Learned Variational Video Color Propagation

An End-to-End Approach for Automatic and Consistent Colorization of Gray-Scale Videos Using Deep-Learning Techniques

Availability of data and materials

Data and code are available at https://github.com/lyh-18/TCVC-Temporally-Consistent-Video-Colorization. A video demo is provided at https://www.youtube.com/watch?v=c7dczMs-olE.

References

Ren, S. Q.; He, K. M.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, Vol. 1, 91–99, 2015.
Google Scholar
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 779–788, 2016.
Vondrick, C.; Shrivastava, A.; Fathi, A.; Guadarrama, S.; Murphy, K. Tracking emerges by colorizing videos. In: Computer Vision - ECCV 2018. Lecture Notes in Computer Science, Vol. 11217. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 402–419, 2018.
Chapter Google Scholar
Zhang, Z. P.; Peng, H. W. Deeper and wider Siamese networks for real-time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4586–4595, 2019.
Larsson, G.; Maire, M.; Shakhnarovich, G. Colorization as a proxy task for visual understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 840–849, 2017.
Iizuka, S.; Simo-Serra, E.; Ishikawa, H. Let there be color!: Joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM Transactions on Graphics Vol. 35, No. 4, Article No. 110, 2016.
Zhang, R.; Isola, P.; Efros, A. A. Colorful image colorization. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Vol. 9907. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 649–666, 2016.
Google Scholar
Cheng, Z. Z.; Yang, Q. X.; Sheng, B. Deep colorization. In: Proceedings of the IEEE International Conference on Computer Vision, 415–423, 2015.
Zhang, R.; Zhu, J. Y.; Isola, P.; Geng, X. Y.; Lin, A. S.; Yu, T. H.; Efros, A. A. Real-time user-guided image colorization with learned deep priors. ACM Transactions on Graphics Vol. 36, No. 4, Article No. 119, 2017.
Su, J. W.; Chu, H. K.; Huang, J. B. Instance-aware image colorization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7965–7974, 2020.
Paul, S.; Bhattacharya, S.; Gupta, S. Spatiotemporal colorization of video using 3D steerable Pyramids. IEEE Transactions on Circuits and Systems for Video Technology Vol. 27, No. 8, 1605–1619, 2017.
Article Google Scholar
Sheng, B.; Sun, H. Q.; Magnor, M.; Li, P. Video colorization using parallel optimization in feature space. IEEE Transactions on Circuits and Systems for Video Technology Vol. 24, No. 3, 407–417, 2014.
Article Google Scholar
Lei, C. Y.; Chen, Q. F. Fully automatic video colorization with self-regularization and diversity. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3748–3756, 2019.
Bonneel, N.; Tompkin, J.; Sunkavalli, K.; Sun, D. Q.; Paris, S.; Pfister, H. Blind video temporal consistency. ACM Transactions on Graphics Vol. 34, No. 6, Article No. 196, 2015.
Yao, C. H.; Chang, C. Y.; Chien, S. Y. Occlusion-aware video temporal consistency. In: Proceedings of the 25th ACM International Conference on Multimedia, 777–785, 2017.
Lai, W. S.; Huang, J. B.; Wang, O.; Shechtman, E.; Yumer, E.; Yang, M. H. Learning blind video temporal consistency. In: Computer Vision–ECCV 2018. Lecture Notes in Computer Science, Vol. 11219. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 179–195, 2018.
Google Scholar
Lei, C. Y.; Xing, Y. Z.; Chen, Q. F. Blind video temporal consistency via deep video prior. arXiv preprint arXiv:2010.11838, 2020.
Levin, A.; Lischinski, D.; Weiss, Y. Colorization using optimization. ACM Transactions on Graphics Vol. 23, No. 3, 689–694, 2004.
Article Google Scholar
Qu, Y. G.; Wong, T. T.; Heng, P. A. Manga colorization. ACM Transactions on Graphics Vol. 25, No. 3, 1214–1220, 2006.
Article Google Scholar
Luan, Q.; Wen, F.; Cohen-Or, D.; Liang, L.; Xu, Y. Q.; Shum, H. Y. Natural image colorization. In: Proceedings of the 18th Eurographics Conference on Rendering Techniques, 309–320, 2007.
Larsson, G.; Maire, M.; Shakhnarovich, G. Learning representations for automatic colorization. In: Computer Vision–ECCV 2016. Lecture Notes in Computer Science, Vol. 9908. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 577–593, 2016.
Google Scholar
Chen, X. W.; Zou, D. Q.; Zhao, Q. P.; Tan, P. Manifold preserving edit propagation. ACM Transactions on Graphics Vol. 31, No. 6, Article No. 132, 2012.
Gupta, R. K.; Chia, A. Y. S.; Rajan, D.; Ng, E. S.; Huang, Z. Y. Image colorization using similar images. In: Proceedings of the 20th ACM International Conference on Multimedia, 369–378, 2012.
Welsh, T.; Ashikhmin, M.; Mueller, K. Transferring color to greyscale images. In: Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques, 277–280, 2002.
Liu, X. P.; Wan, L.; Qu, Y. G.; Wong, T. T.; Lin, S.; Leung, C. S.; Heng, P. A. Intrinsic colorization. ACM Transactions on Graphics Vol. 27, No. 5, Article No. 152, 2008.
He, M. M.; Chen, D. D.; Liao, J.; Sander, P. V.; Yuan, L. Deep exemplar-based colorization. ACM Transactions on Graphics Vol. 37, No. 4, Article No. 47, 2018.
Lee, J.; Kim, E.; Lee, Y.; Kim, D.; Chang, J.; Choo, J. Reference-based sketch image colorization using augmented-self reference and dense semantic correspondence. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5800–5809, 2020.
Yoo, S.; Bahng, H.; Chung, S.; Lee, J.; Chang, J.; Choo, J. Coloring with limited data: Few-shot colorization via memory augmented networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11275–11284, 2019.
Xu, Z. Y.; Wang, T. T.; Fang, F. M.; Sheng, Y.; Zhang, G. X. Stylization-based architecture for fast deep exemplar colorization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9360–9369, 2020.
Zhang, B.; He, M. M.; Liao, J.; Sander, P. V.; Yuan, L.; Bermak, A.; Chen, D. Deep exemplar-based video colorization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8044–8053, 2019.
Shi, M.; Zhang, J. Q.; Chen, S. Y.; Gao, L.; Lai, Y. K.; Zhang, F. L. Reference-based deep line art video colorization. IEEE Transactions on Visualization and Computer Graphics Vol. 29, No. 6, 2965–2979, 2023.
Article Google Scholar
Thasarathan, H.; Nazeri, K.; Ebrahimi, M. Automatic temporally coherent video colorization. In: Proceedings of the 16th Conference on Computer and Robot Vision, 189–194, 2019.
Iizuka, S.; Simo-Serra, E. DeepRemaster: Temporal source-reference attention networks for comprehensive video enhancement. ACM Transactions on Graphics Vol. 38, No. 6, Article No. 176, 2019.
Gatys, L. A.; Ecker, A. S.; Bethge, M. A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576, 2015.
Zhu, J. Y.; Park, T.; Isola, P.; Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, 2242–2251, 2017.
Ruder, M.; Dosovitskiy, A.; Brox, T. Artistic style transfer for videos. In: Pattern Recognition. Lecture Notes in Computer Science, Vol. 9796. Rosenhahn, B.; Andres, B. Eds. Springer Cham, 26–36, 2016.
Google Scholar
Jampani, V.; Gadde, R.; Gehler, P. V. Video propagation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3154–3164, 2017.
Chu, M. Y.; Xie, Y.; Mayer, J.; Leal-Taixe, L.; Thuerey, N. Learning temporal coherence via self-supervision for GAN-based video generation. ACM Transactions on Graphics Vol. 39, No. 4, Article No. 75, 2020.
Dong, Y.; Liu, Y. H.; Zhang, H.; Chen, S. F.; Qiao, Y. FD-GAN: Generative adversarial networks with fusion-discriminator for single image dehazing. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 34, No. 7, 10729–10736, 2020.
Article Google Scholar
He, J. W.; Liu, Y. H.; Qiao, Y.; Dong, C. Conditional sequential modulation for efficient global image retouching. In: Computer Vision–ECCV 2020. Lecture Notes in Computer Science, Vol. 12358. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 679–695, 2020.
Google Scholar
Eilertsen, G.; Mantiuk, R. K.; Unger, J. Singleframe regularization for temporally stable CNNs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11168–11177, 2019.
Lei, C. Y.; Xing, Y. Z.; Chen, Q. F. Blind video temporal consistency via deep video prior. arXiv preprint arXiv:2010.11838, 2020.
Johnson, J.; Alahi, A.; Li, F. F. Perceptual losses for real-time style transfer and super-resolution. In: Computer Vision–ECCV 2016. Lecture Notes in Computer Science, Vol. 9906. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 694–711, 2016.
Google Scholar
Jaderberg, M.; Simonyan, K.; Zisserman, A.; Kavukcuoglu, K. Spatial transformer networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, Vol. 2, 2017–2025, 2015.
Google Scholar
Ilg, E.; Mayer, N.; Saikia, T.; Keuper, M.; Dosovitskiy, A.; Brox, T. FlowNet 2.0: Evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1647–1655, 2017.
Perazzi, F.; Pont-Tuset, J.; McWilliams, B.; Van Gool, L.; Gross, M.; Sorkine-Hornung, A. A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 724–732, 2016.
Hasler, D.; Suesstrunk, S. E. Measuring colorfulness in natural images. In: Proceedings of the SPIE 5007, Human Vision and Electronic Imaging VIII, 87–95, 2003.
Kingma, D. P.; Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
Deshpande, A.; Lu, J. J.; Yeh, M. C.; Chong, M. J.; Forsyth, D. Learning diverse image colorization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2877–2885, 2017.
Xue, T. F.; Chen, B. A.; Wu, J. J.; Wei, D. L.; Freeman, W. T. Video enhancement with task-oriented flow. International Journal of Computer Vision Vol. 127, No. 8, 1106–1125, 2019.
Article Google Scholar
Bao, W. B.; Lai, W. S.; Ma, C.; Zhang, X. Y.; Gao, Z. Y.; Yang, M. H. Depth-aware video frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3698–3707, 2019.
Lu, G.; Ouyang, W. L.; Xu, D.; Zhang, X. Y.; Cai, C. L.; Gao, Z. Y. DVC: An end-to-end deep video compression framework. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10998–11007, 2019.

Download references

Acknowledgements

This work was partially supported by grants from the National Natural Science Foundation of China (61906184), the Joint Lab of CAS–HK, and the Shanghai Committee of Science and Technology, China (20DZ1100800, 21DZ1100100).

Author information

Yihao Liu and Hengyuan Zhao contributed equally to this work.

Authors and Affiliations

Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
Yihao Liu, Hengyuan Zhao, Yu Qiao & Chao Dong
University of Chinese Academy of Sciences, Beijing, 100049, China
Yihao Liu
Department of Electrical & Computer Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798, Singapore
Kelvin C. K. Chan & Chen Change Loy
Applied Research Center, Tencent PCG, Shenzhen, China
Xintao Wang

Authors

Yihao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hengyuan Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Kelvin C. K. Chan
View author publications
You can also search for this author in PubMed Google Scholar
Xintao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chen Change Loy
View author publications
You can also search for this author in PubMed Google Scholar
Yu Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Chao Dong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chao Dong.

Ethics declarations

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Yihao Liu received his B.S. degree from University of Chinese Academy of Sciences, Beijing, in 2018. He is now working towards the Ph.D. degree in Multimedia Laboratory, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences. He is supervised by Prof. Yu Qiao and Prof. Chao Dong. His research interests include computer vision and image/video enhancement.

Hengyuan Zhao received his B.S. degree from Nanjing University of Posts and Telecommunications, Nanjing, in 2020. He is now working towards the Ph.D. degree at Show Lab in National University of Singapore, supervised by Prof. Mike Shou. He formerly worked as a research intern at VIS Baidu Inc. and SenseTime Inc. He also worked as a research intern in Multimedia Laboratory, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, supervised by Prof. Yu Qiao and Prof. Chao Dong. His research interests include computer vision and image/video processing and generation.

Kelvin C. K. Chan is a reseach scientist at Google. Before joining Google, he was a Ph.D. student at MMLab@NTU under the supervision of Prof. Chen Change Loy. He received his M.Phil. degree in mathematics as well as his B.Sc. and B.Eng. degrees from The Chinese University of Hong Kong. His current research interest focuses on low-level vision and multimodal content generation.

Xintao Wang is currently a researcher in Applied Research Center (ARC), Tencent PCG. He received his Ph.D. degree in the Department of Information Engineering, The Chinese University of Hong Kong, in 2020. He was selected as an outstanding reviewer in CVPR 2019 and an outstanding reviewer (honorable mention) in BMVC 2019. He won the first place in several international super-resolution challenges including NTIRE2019, NTIRE2018, and PIRM2018. His research interests focus on low-level vision problems, including superresolution, image and video restoration.

Chen Change Loy (Senior Member, IEEE) received his Ph.D. degree in computer science from the Queen Mary University of London, in 2010. He is currently an associate professor with the School of Computer Science and Engineering, Nanyang Technological University, Singapore. He is also an adjunct associate professor with the Chinese University of Hong Kong. Prior to joining NTU, he served as a research assistant professor with the MMLab of The Chinese University of Hong Kong, from 2013 to 2018. He was a postdoctoral researcher with Queen Mary University of London and Vision Semantics Limited, from 2010 to 2013. He serves as an associate editor of IEEE Transactions on Pattern Analysis and Machine Intelligence and International Journal of Computer Vision. He also serves/served as an area chair of major conferences such as ICCV 2021, CVPR 2021, CVPR 2019, and ECCV 2018. His research interests include image/video restoration and enhancement, generative tasks, and representation learning.

Yu Qiao (Senior Member, IEEE) is currently a professor with the Shenzhen Institute of Advanced Technology (SIAT), Chinese Academy of Sciences, and the director of the Institute of Advanced Computing and Digital Engineering. He has published more than 180 articles in international journals and conferences, including T-PAMI, IJCV, T-IP, T-SP, CVPR, and ICCV. His research interests include computer vision, deep learning, and bioinformation. He received the First Prize of the Guangdong Technological Invention Award, and the Jiaxi Lv Young Researcher Award from the Chinese Academy of Sciences. His group achieved the first runner-up at the ImageNet Large Scale Visual Recognition Challenge 2015 in scene recognition, and the Winner at the ActivityNet Large Scale Activity Recognition Challenge 2016 in video classification. He has served as the Program Chair of the IEEE ICIST 2014.

Chao Dong is currently an associate professor in Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences. He received his Ph.D. degree from The Chinese University of Hong Kong in 2016. In 2014, he first introduced deep learning method–SRCNN into the super-resolution field. This seminal work was chosen as one of the top ten “opular Articles” of TPAMI in 2016. His team has won several championships in international challenges–NTIRE2018, PIRM2018, NTIRE2019, NTIRE2020, AIM2020, and NTIRE2022. He worked in SenseTime from 2016 to 2018, as the team leader of Super-Resolution Group. In 2021, he was chosen as one of the World’s Top 2% Scientists. In 2022, he was recognized as the AI 2000 Most Influential Scholar Honorable Mention in computer vision. His current research interest focuses on low-level vision problems, such as image/video super-resolution, denoising, and enhancement.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.

Reprints and permissions

About this article

Cite this article

Liu, Y., Zhao, H., Chan, K.C.K. et al. Temporally consistent video colorization with deep feature propagation and self-regularization learning. Comp. Visual Media 10, 375–395 (2024). https://doi.org/10.1007/s41095-023-0342-8

Download citation

Received: 07 December 2022
Accepted: 12 March 2023
Published: 03 January 2024
Issue Date: April 2024
DOI: https://doi.org/10.1007/s41095-023-0342-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Temporally consistent video colorization with deep feature propagation and self-regularization learning

Abstract

Article PDF

Similar content being viewed by others

Automatic video colorization based on contrastive learning and optical flow

Learned Variational Video Color Propagation

An End-to-End Approach for Automatic and Consistent Colorization of Gray-Scale Videos Using Deep-Learning Techniques

Availability of data and materials

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Temporally consistent video colorization with deep feature propagation and self-regularization learning

Abstract

Article PDF

Similar content being viewed by others

Automatic video colorization based on contrastive learning and optical flow

Learned Variational Video Color Propagation

An End-to-End Approach for Automatic and Consistent Colorization of Gray-Scale Videos Using Deep-Learning Techniques

Availability of data and materials

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation