Abstract
Video colorization is a challenging and highly ill-posed problem. Although recent years have witnessed remarkable progress in single image colorization, there is relatively less research effort on video colorization, and existing methods always suffer from severe flickering artifacts (temporal inconsistency) or unsatisfactory colorization. We address this problem from a new perspective, by jointly considering colorization and temporal consistency in a unified framework. Specifically, we propose a novel temporally consistent video colorization (TCVC) framework. TCVC effectively propagates frame-level deep features in a bidirectional way to enhance the temporal consistency of colorization. Furthermore, TCVC introduces a self-regularization learning (SRL) scheme to minimize the differences in predictions obtained using different time steps. SRL does not require any ground-truth color videos for training and can further improve temporal consistency. Experiments demonstrate that our method can not only provide visually pleasing colorized video, but also with clearly better temporal consistency than state-of-the-art methods. A video demo is provided at https://www.youtube.com/watch?v=c7dczMs-olE, while code is available at https://github.com/lyh-18/TCVC-Temporally-Consistent-Video-Colorization.
Article PDF
Similar content being viewed by others
Availability of data and materials
Data and code are available at https://github.com/lyh-18/TCVC-Temporally-Consistent-Video-Colorization. A video demo is provided at https://www.youtube.com/watch?v=c7dczMs-olE.
References
Ren, S. Q.; He, K. M.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, Vol. 1, 91–99, 2015.
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 779–788, 2016.
Vondrick, C.; Shrivastava, A.; Fathi, A.; Guadarrama, S.; Murphy, K. Tracking emerges by colorizing videos. In: Computer Vision - ECCV 2018. Lecture Notes in Computer Science, Vol. 11217. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 402–419, 2018.
Zhang, Z. P.; Peng, H. W. Deeper and wider Siamese networks for real-time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4586–4595, 2019.
Larsson, G.; Maire, M.; Shakhnarovich, G. Colorization as a proxy task for visual understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 840–849, 2017.
Iizuka, S.; Simo-Serra, E.; Ishikawa, H. Let there be color!: Joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM Transactions on Graphics Vol. 35, No. 4, Article No. 110, 2016.
Zhang, R.; Isola, P.; Efros, A. A. Colorful image colorization. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Vol. 9907. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 649–666, 2016.
Cheng, Z. Z.; Yang, Q. X.; Sheng, B. Deep colorization. In: Proceedings of the IEEE International Conference on Computer Vision, 415–423, 2015.
Zhang, R.; Zhu, J. Y.; Isola, P.; Geng, X. Y.; Lin, A. S.; Yu, T. H.; Efros, A. A. Real-time user-guided image colorization with learned deep priors. ACM Transactions on Graphics Vol. 36, No. 4, Article No. 119, 2017.
Su, J. W.; Chu, H. K.; Huang, J. B. Instance-aware image colorization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7965–7974, 2020.
Paul, S.; Bhattacharya, S.; Gupta, S. Spatiotemporal colorization of video using 3D steerable Pyramids. IEEE Transactions on Circuits and Systems for Video Technology Vol. 27, No. 8, 1605–1619, 2017.
Sheng, B.; Sun, H. Q.; Magnor, M.; Li, P. Video colorization using parallel optimization in feature space. IEEE Transactions on Circuits and Systems for Video Technology Vol. 24, No. 3, 407–417, 2014.
Lei, C. Y.; Chen, Q. F. Fully automatic video colorization with self-regularization and diversity. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3748–3756, 2019.
Bonneel, N.; Tompkin, J.; Sunkavalli, K.; Sun, D. Q.; Paris, S.; Pfister, H. Blind video temporal consistency. ACM Transactions on Graphics Vol. 34, No. 6, Article No. 196, 2015.
Yao, C. H.; Chang, C. Y.; Chien, S. Y. Occlusion-aware video temporal consistency. In: Proceedings of the 25th ACM International Conference on Multimedia, 777–785, 2017.
Lai, W. S.; Huang, J. B.; Wang, O.; Shechtman, E.; Yumer, E.; Yang, M. H. Learning blind video temporal consistency. In: Computer Vision–ECCV 2018. Lecture Notes in Computer Science, Vol. 11219. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 179–195, 2018.
Lei, C. Y.; Xing, Y. Z.; Chen, Q. F. Blind video temporal consistency via deep video prior. arXiv preprint arXiv:2010.11838, 2020.
Levin, A.; Lischinski, D.; Weiss, Y. Colorization using optimization. ACM Transactions on Graphics Vol. 23, No. 3, 689–694, 2004.
Qu, Y. G.; Wong, T. T.; Heng, P. A. Manga colorization. ACM Transactions on Graphics Vol. 25, No. 3, 1214–1220, 2006.
Luan, Q.; Wen, F.; Cohen-Or, D.; Liang, L.; Xu, Y. Q.; Shum, H. Y. Natural image colorization. In: Proceedings of the 18th Eurographics Conference on Rendering Techniques, 309–320, 2007.
Larsson, G.; Maire, M.; Shakhnarovich, G. Learning representations for automatic colorization. In: Computer Vision–ECCV 2016. Lecture Notes in Computer Science, Vol. 9908. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 577–593, 2016.
Chen, X. W.; Zou, D. Q.; Zhao, Q. P.; Tan, P. Manifold preserving edit propagation. ACM Transactions on Graphics Vol. 31, No. 6, Article No. 132, 2012.
Gupta, R. K.; Chia, A. Y. S.; Rajan, D.; Ng, E. S.; Huang, Z. Y. Image colorization using similar images. In: Proceedings of the 20th ACM International Conference on Multimedia, 369–378, 2012.
Welsh, T.; Ashikhmin, M.; Mueller, K. Transferring color to greyscale images. In: Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques, 277–280, 2002.
Liu, X. P.; Wan, L.; Qu, Y. G.; Wong, T. T.; Lin, S.; Leung, C. S.; Heng, P. A. Intrinsic colorization. ACM Transactions on Graphics Vol. 27, No. 5, Article No. 152, 2008.
He, M. M.; Chen, D. D.; Liao, J.; Sander, P. V.; Yuan, L. Deep exemplar-based colorization. ACM Transactions on Graphics Vol. 37, No. 4, Article No. 47, 2018.
Lee, J.; Kim, E.; Lee, Y.; Kim, D.; Chang, J.; Choo, J. Reference-based sketch image colorization using augmented-self reference and dense semantic correspondence. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5800–5809, 2020.
Yoo, S.; Bahng, H.; Chung, S.; Lee, J.; Chang, J.; Choo, J. Coloring with limited data: Few-shot colorization via memory augmented networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11275–11284, 2019.
Xu, Z. Y.; Wang, T. T.; Fang, F. M.; Sheng, Y.; Zhang, G. X. Stylization-based architecture for fast deep exemplar colorization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9360–9369, 2020.
Zhang, B.; He, M. M.; Liao, J.; Sander, P. V.; Yuan, L.; Bermak, A.; Chen, D. Deep exemplar-based video colorization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8044–8053, 2019.
Shi, M.; Zhang, J. Q.; Chen, S. Y.; Gao, L.; Lai, Y. K.; Zhang, F. L. Reference-based deep line art video colorization. IEEE Transactions on Visualization and Computer Graphics Vol. 29, No. 6, 2965–2979, 2023.
Thasarathan, H.; Nazeri, K.; Ebrahimi, M. Automatic temporally coherent video colorization. In: Proceedings of the 16th Conference on Computer and Robot Vision, 189–194, 2019.
Iizuka, S.; Simo-Serra, E. DeepRemaster: Temporal source-reference attention networks for comprehensive video enhancement. ACM Transactions on Graphics Vol. 38, No. 6, Article No. 176, 2019.
Gatys, L. A.; Ecker, A. S.; Bethge, M. A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576, 2015.
Zhu, J. Y.; Park, T.; Isola, P.; Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, 2242–2251, 2017.
Ruder, M.; Dosovitskiy, A.; Brox, T. Artistic style transfer for videos. In: Pattern Recognition. Lecture Notes in Computer Science, Vol. 9796. Rosenhahn, B.; Andres, B. Eds. Springer Cham, 26–36, 2016.
Jampani, V.; Gadde, R.; Gehler, P. V. Video propagation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3154–3164, 2017.
Chu, M. Y.; Xie, Y.; Mayer, J.; Leal-Taixe, L.; Thuerey, N. Learning temporal coherence via self-supervision for GAN-based video generation. ACM Transactions on Graphics Vol. 39, No. 4, Article No. 75, 2020.
Dong, Y.; Liu, Y. H.; Zhang, H.; Chen, S. F.; Qiao, Y. FD-GAN: Generative adversarial networks with fusion-discriminator for single image dehazing. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 34, No. 7, 10729–10736, 2020.
He, J. W.; Liu, Y. H.; Qiao, Y.; Dong, C. Conditional sequential modulation for efficient global image retouching. In: Computer Vision–ECCV 2020. Lecture Notes in Computer Science, Vol. 12358. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 679–695, 2020.
Eilertsen, G.; Mantiuk, R. K.; Unger, J. Singleframe regularization for temporally stable CNNs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11168–11177, 2019.
Lei, C. Y.; Xing, Y. Z.; Chen, Q. F. Blind video temporal consistency via deep video prior. arXiv preprint arXiv:2010.11838, 2020.
Johnson, J.; Alahi, A.; Li, F. F. Perceptual losses for real-time style transfer and super-resolution. In: Computer Vision–ECCV 2016. Lecture Notes in Computer Science, Vol. 9906. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 694–711, 2016.
Jaderberg, M.; Simonyan, K.; Zisserman, A.; Kavukcuoglu, K. Spatial transformer networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, Vol. 2, 2017–2025, 2015.
Ilg, E.; Mayer, N.; Saikia, T.; Keuper, M.; Dosovitskiy, A.; Brox, T. FlowNet 2.0: Evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1647–1655, 2017.
Perazzi, F.; Pont-Tuset, J.; McWilliams, B.; Van Gool, L.; Gross, M.; Sorkine-Hornung, A. A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 724–732, 2016.
Hasler, D.; Suesstrunk, S. E. Measuring colorfulness in natural images. In: Proceedings of the SPIE 5007, Human Vision and Electronic Imaging VIII, 87–95, 2003.
Kingma, D. P.; Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
Deshpande, A.; Lu, J. J.; Yeh, M. C.; Chong, M. J.; Forsyth, D. Learning diverse image colorization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2877–2885, 2017.
Xue, T. F.; Chen, B. A.; Wu, J. J.; Wei, D. L.; Freeman, W. T. Video enhancement with task-oriented flow. International Journal of Computer Vision Vol. 127, No. 8, 1106–1125, 2019.
Bao, W. B.; Lai, W. S.; Ma, C.; Zhang, X. Y.; Gao, Z. Y.; Yang, M. H. Depth-aware video frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3698–3707, 2019.
Lu, G.; Ouyang, W. L.; Xu, D.; Zhang, X. Y.; Cai, C. L.; Gao, Z. Y. DVC: An end-to-end deep video compression framework. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10998–11007, 2019.
Acknowledgements
This work was partially supported by grants from the National Natural Science Foundation of China (61906184), the Joint Lab of CAS–HK, and the Shanghai Committee of Science and Technology, China (20DZ1100800, 21DZ1100100).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Yihao Liu received his B.S. degree from University of Chinese Academy of Sciences, Beijing, in 2018. He is now working towards the Ph.D. degree in Multimedia Laboratory, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences. He is supervised by Prof. Yu Qiao and Prof. Chao Dong. His research interests include computer vision and image/video enhancement.
Hengyuan Zhao received his B.S. degree from Nanjing University of Posts and Telecommunications, Nanjing, in 2020. He is now working towards the Ph.D. degree at Show Lab in National University of Singapore, supervised by Prof. Mike Shou. He formerly worked as a research intern at VIS Baidu Inc. and SenseTime Inc. He also worked as a research intern in Multimedia Laboratory, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, supervised by Prof. Yu Qiao and Prof. Chao Dong. His research interests include computer vision and image/video processing and generation.
Kelvin C. K. Chan is a reseach scientist at Google. Before joining Google, he was a Ph.D. student at MMLab@NTU under the supervision of Prof. Chen Change Loy. He received his M.Phil. degree in mathematics as well as his B.Sc. and B.Eng. degrees from The Chinese University of Hong Kong. His current research interest focuses on low-level vision and multimodal content generation.
Xintao Wang is currently a researcher in Applied Research Center (ARC), Tencent PCG. He received his Ph.D. degree in the Department of Information Engineering, The Chinese University of Hong Kong, in 2020. He was selected as an outstanding reviewer in CVPR 2019 and an outstanding reviewer (honorable mention) in BMVC 2019. He won the first place in several international super-resolution challenges including NTIRE2019, NTIRE2018, and PIRM2018. His research interests focus on low-level vision problems, including superresolution, image and video restoration.
Chen Change Loy (Senior Member, IEEE) received his Ph.D. degree in computer science from the Queen Mary University of London, in 2010. He is currently an associate professor with the School of Computer Science and Engineering, Nanyang Technological University, Singapore. He is also an adjunct associate professor with the Chinese University of Hong Kong. Prior to joining NTU, he served as a research assistant professor with the MMLab of The Chinese University of Hong Kong, from 2013 to 2018. He was a postdoctoral researcher with Queen Mary University of London and Vision Semantics Limited, from 2010 to 2013. He serves as an associate editor of IEEE Transactions on Pattern Analysis and Machine Intelligence and International Journal of Computer Vision. He also serves/served as an area chair of major conferences such as ICCV 2021, CVPR 2021, CVPR 2019, and ECCV 2018. His research interests include image/video restoration and enhancement, generative tasks, and representation learning.
Yu Qiao (Senior Member, IEEE) is currently a professor with the Shenzhen Institute of Advanced Technology (SIAT), Chinese Academy of Sciences, and the director of the Institute of Advanced Computing and Digital Engineering. He has published more than 180 articles in international journals and conferences, including T-PAMI, IJCV, T-IP, T-SP, CVPR, and ICCV. His research interests include computer vision, deep learning, and bioinformation. He received the First Prize of the Guangdong Technological Invention Award, and the Jiaxi Lv Young Researcher Award from the Chinese Academy of Sciences. His group achieved the first runner-up at the ImageNet Large Scale Visual Recognition Challenge 2015 in scene recognition, and the Winner at the ActivityNet Large Scale Activity Recognition Challenge 2016 in video classification. He has served as the Program Chair of the IEEE ICIST 2014.
Chao Dong is currently an associate professor in Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences. He received his Ph.D. degree from The Chinese University of Hong Kong in 2016. In 2014, he first introduced deep learning method–SRCNN into the super-resolution field. This seminal work was chosen as one of the top ten “opular Articles” of TPAMI in 2016. His team has won several championships in international challenges–NTIRE2018, PIRM2018, NTIRE2019, NTIRE2020, AIM2020, and NTIRE2022. He worked in SenseTime from 2016 to 2018, as the team leader of Super-Resolution Group. In 2021, he was chosen as one of the World’s Top 2% Scientists. In 2022, he was recognized as the AI 2000 Most Influential Scholar Honorable Mention in computer vision. His current research interest focuses on low-level vision problems, such as image/video super-resolution, denoising, and enhancement.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.
About this article
Cite this article
Liu, Y., Zhao, H., Chan, K.C.K. et al. Temporally consistent video colorization with deep feature propagation and self-regularization learning. Comp. Visual Media 10, 375–395 (2024). https://doi.org/10.1007/s41095-023-0342-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41095-023-0342-8