ABSTRACT
Image inpainting has achieved remarkable progress and inspired abundant methods, where the critical bottleneck is identified as how to fulfill the high-frequency structure and low-frequency texture information on the masked regions with semantics. To this end, deep models exhibit powerful superiority to capture them, yet constrained on the local spatial regions. In this paper, we delve globally into texture and structure information to well capture the semantics for image inpainting. As opposed to the existing arts trapped on the independent local patches, the texture information of each patch is reconstructed from all other patches across the whole image, to match the coarsely filled information, especially the structure information over the masked regions. Unlike the current decoder-only transformer within the pixel level for image inpainting, our model adopts the transformer pipeline paired with both encoder and decoder. On one hand, the encoder captures the texture semantic correlations of all patches across image via self-attention module. On the other hand, an adaptive patch vocabulary is dynamically established in the decoder for the filled patches over the masked regions. Building on this, a structure-texture matching attention module anchored on the known regions comes up to marry the best of these two worlds for progressive inpainting via a probabilistic diffusion process. Our model is orthogonal to the fashionable arts, such as Convolutional Neural Networks (CNNs), Attention and Transformer model, from the perspective of texture and structure information for image inpainting. The extensive experiments over the benchmarks validate its superiority. Our code is available here
Supplemental Material
Available for Download
- Connelly Barnes, Eli Shechtman, Adam Finkelstein, and Dan B Goldman. 2009. PatchMatch: A randomized correspondence algorithm for structural image editing. ACM Trans. Graph. 28, 3 (2009), 24.Google ScholarDigital Library
- Marcelo Bertalmio, Guillermo Sapiro, Vincent Caselles, and Coloma Ballester. 2000. Image inpainting. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques. 417--424.Google ScholarDigital Library
- Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In European conference on computer vision. Springer, 213--229.Google ScholarDigital Library
- Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, and Ilya Sutskever. 2020. Generative pretraining from pixels. In International Conference on Machine Learning. PMLR, 1691--1703.Google Scholar
- Soheil Darabi, Eli Shechtman, Connelly Barnes, Dan B Goldman, and Pradeep Sen. 2012. Image melding: Combining inconsistent images using patch-based synthesis. ACM Transactions on graphics (TOG) 31, 4 (2012), 1--10.Google ScholarDigital Library
- Carl Doersch, Saurabh Singh, Abhinav Gupta, Josef Sivic, and Alexei Efros. 2012. What makes paris look like paris? ACM Transactions on Graphics 31, 4 (2012).Google ScholarDigital Library
- Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).Google Scholar
- Alexei A Efros and William T Freeman. 2001. Image quilting for texture synthesis and transfer. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques. 341--346.Google ScholarDigital Library
- Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 249--256.Google Scholar
- Xiefan Guo, Hongyu Yang, and Di Huang. 2021. Image Inpainting via Conditional Texture and Structure Dual Generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 14134--14143.Google ScholarCross Ref
- James Hays and Alexei A Efros. 2007. Scene completion using millions of photographs. ACM Transactions on Graphics (ToG) 26, 3 (2007), 4--es.Google ScholarDigital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarCross Ref
- Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems 30 (2017).Google Scholar
- Yibing Song Wei Huang Hongyu Liu, Bin Jiang and Chao Yang. 2020. Rethinking Image Inpainting via a Mutual Encoder-Decoder with Feature Equalizations. In Proceedings of the European Conference on Computer Vision.Google Scholar
- Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa. 2017. Globally and locally consistent image completion. ACM Transactions on Graphics (ToG) 36, 4 (2017), 1--14.Google ScholarDigital Library
- Yifan Jiang, Shiyu Chang, and Zhangyang Wang. 2021. Transgan: Two pure transformers can make one strong gan, and that can scale up. Advances in Neural Information Processing Systems 34 (2021).Google Scholar
- Kwonjoon Lee, Huiwen Chang, Lu Jiang, Han Zhang, Zhuowen Tu, and Ce Liu. 2021. Vitgan: Training gans with vision transformers. arXiv preprint arXiv:2107.04589 (2021).Google Scholar
- Jingyuan Li, Fengxiang He, Lefei Zhang, Bo Du, and Dacheng Tao. 2019. Progressive Reconstruction of Visual Structure for Image Inpainting. In The IEEE International Conference on Computer Vision (ICCV).Google Scholar
- Liang Liao, Jing Xiao, Zheng Wang, Chia-Wen Lin, and Shin'ichi Satoh. 2020. Guidance and evaluation: Semantic-aware image inpainting for mixed scenes. In European Conference on Computer Vision. Springer, 683--700.Google ScholarDigital Library
- Liang Liao, Jing Xiao, Zheng Wang, Chia-Wen Lin, and Shin'ichi Satoh. 2021. Image inpainting guided by coherence priors of semantics and textures. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6539--6548.Google ScholarCross Ref
- Guilin Liu, Fitsum A Reda, Kevin J Shih, Ting-Chun Wang, Andrew Tao, and Bryan Catanzaro. 2018. Image inpainting for irregular holes using partial convolutions. In Proceedings of the European conference on computer vision (ECCV). 85--100.Google ScholarDigital Library
- Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep learning face attributes in the wild. In Proceedings of the IEEE international conference on computer vision. 3730--3738.Google ScholarDigital Library
- Ilya Loshchilov and Frank Hutter. 2018. Decoupled Weight Decay Regularization. In International Conference on Learning Representations.Google Scholar
- Kamyar Nazeri, Eric Ng, Tony Joseph, Faisal Qureshi, and Mehran Ebrahimi. 2019. EdgeConnect: Structure Guided Image Inpainting using Edge Prediction. In The IEEE International Conference on Computer Vision (ICCV) Workshops.Google ScholarCross Ref
- Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Noam Shazeer, Alexander Ku, and Dustin Tran. 2018. Image transformer. In International Conference on Machine Learning. PMLR, 4055--4064.Google Scholar
- Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A Efros. 2016. Context encoders: Feature learning by inpainting. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2536--2544.Google ScholarCross Ref
- Yurui Ren, Xiaoming Yu, Ruonan Zhang, Thomas H Li, Shan Liu, and Ge Li. 2019. Structureflow: Image inpainting via structure-aware appearance flow. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 181--190.Google ScholarCross Ref
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).Google Scholar
- Ziyu Wan, Jingbo Zhang, Dongdong Chen, and Jing Liao. 2021. High-fidelity pluralistic image completion with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4692--4701.Google ScholarCross Ref
- Tengfei Wang, Hao Ouyang, and Qifeng Chen. 2021. Image Inpainting with External-internal Learning and Monochromic Bottleneck. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5120--5129.Google ScholarCross Ref
- Yang Wang. 2021. Survey on deep multi-modal data analytics: Collaboration, rivalry, and fusion. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 17, 1s (2021), 1--25.Google ScholarDigital Library
- Yang Wang, Jinjia Peng, Huibing Wang, and Meng Wang. 2022. Progressive learning with multi-scale attention network for cross-domain vehicle re-identification. Science China Information Sciences 65, 6 (2022), 1--15.Google ScholarCross Ref
- Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600--612.Google ScholarDigital Library
- Chaohao Xie, Shaohui Liu, Chao Li, Ming-Ming Cheng, Wangmeng Zuo, Xiao Liu, Shilei Wen, and Errui Ding. 2019. Image Inpainting With Learnable Bidirectional Attention Maps. In The IEEE International Conference on Computer Vision (ICCV).Google Scholar
- Wei Xiong, Jiahui Yu, Zhe Lin, Jimei Yang, Xin Lu, Connelly Barnes, and Jiebo Luo. 2019. Foreground-aware image inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5840--5848.Google ScholarCross Ref
- Jie Yang, Zhiquan Qi, and Yong Shi. 2020. Learning to incorporate structure knowledge for image inpainting. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 12605--12612.Google ScholarCross Ref
- Zili Yi, Qiang Tang, Shekoofeh Azizi, Daesik Jang, and Zhan Xu. 2020. Contextual residual aggregation for ultra high-resolution image inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7508--7517.Google ScholarCross Ref
- Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S Huang. 2018. Generative image inpainting with contextual attention. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5505--5514.Google ScholarCross Ref
- Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S Huang. 2019. Free-form image inpainting with gated convolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4471--4480.Google ScholarCross Ref
- Yingchen Yu, Fangneng Zhan, Rongliang Wu, Jianxiong Pan, Kaiwen Cui, Shijian Lu, Feiying Ma, Xuansong Xie, and Chunyan Miao. 2021. Diverse Image Inpainting with Bidirectional and Autoregressive Transformers. In Proceedings of the 29th ACM International Conference on Multimedia.Google ScholarDigital Library
- Yanhong Zeng, Jianlong Fu, Hongyang Chao, and Baining Guo. 2019. Learning Pyramid-Context Encoder Network for High-Quality Image Inpainting. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1486--1494.Google Scholar
- Yu Zeng, Zhe Lin, Huchuan Lu, and Vishal M. Patel. 2021. CR-Fill: Generative Image Inpainting with Auxiliary Contextual Reconstruction. In Proceedings of the IEEE International Conference on Computer Vision.Google Scholar
- Chuanxia Zheng, Tat-Jen Cham, and Jianfei Cai. 2019. Pluralistic Image Completion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1438--1447.Google ScholarCross Ref
- Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Places: A 10 million image database for scene recognition. IEEE transactions on pattern analysis and machine intelligence 40, 6 (2017), 1452--1464.Google Scholar
Index Terms
- Delving Globally into Texture and Structure for Image Inpainting
Recommendations
Edge-Guided Image Inpainting with Transformer
Advances in Visual ComputingAbstractImage inpainting aims to complete missing regions by extracting the features of the image through the information of the known region. Traditional image inpainting approaches like patch-based and diffusion-based methods are robust for simple ...
Image inpainting with salient structure completion and texture propagation
Image inpainting technique uses structural and textural information to repair or fill missing regions of a picture. Inspired by human visual characteristics, we introduce a new image inpainting approach which includes salient structure completion and ...
Two-stream coupling network with bidirectional interaction between structure and texture for image inpainting
AbstractSince neural networks possess universal approximation property and the property can be a theoretical guarantee for image inpainting, the research, which uses deep learning technology to complete images, has attracted more attention from ...
Highlights- A two-stream coupling network with bidirectional interaction between structure and texture complete is designed.
- A gated interaction unit (GIU) and a spatial-channel interaction module (SCIM) are proposed.
- Frequent mutual ...
Comments