ABSTRACT
The demand of social life for automatic recognition of food images is increasing. Food images have the characteristics of diverse forms, small differences between classes and large differences within classes, which has the problem of high recognition difficulty. This paper proposes a food image recognition method based on generative self-supervised learning. Firstly, we use a BEiT based pre-training model which is trained through generative self-monitoring learning method as the feature extraction network to extract the global semantics and local detail features of food images. And then we fine-tune the fully connected network MLP for classification and recognition through supervised learning method. The model is tested on the current mainstream public food image dataset Food-101, and the top-1 accuracy of 85.99% is obtained. The experimental results show that this method can significantly reduce the computation of pixel level expression as well as extract the global and detailed features of the image, achieving quite good food image classification and recognition effect. Our method has good robustness, generalization and flexibility, which has practical application value.
- Hongsheng He, Fangyu Kong, and Jindong Tan. DietCam: multiview food recognition using a multikernel SVM. IEEE Journal of Biomedical and Health Informatics 20, 3 (2016), 848-855.Google ScholarCross Ref
- Mei-yun Chen, Yung-hsiangYang, Chia-Ju Ho, Shih-Han Wang, Shane-Ming Liu, Eugene Chang, Che-Hua Yeh, and Ming Ouhyoung. Automatic Chinese food identification and quantity estimation. In Proceedings of SA' 12 SIGGRAPH Asia 2012 Technical Briefs. Singapore: ACM (2012).Google Scholar
- Niki Martinel, Claudio Piciarelli, and Christian Micheloni. A supervised extreme learning committee for food recognition. Computer Vision and Image Understanding 148 (2016), 67-86.Google ScholarCross Ref
- Huagang Liang, Xiaoqian Wen, Dandan Liang, Huaide Li, and Feng Ru. Fine-grained food image recognition of a multi-level convolution feature pyramid. Journal of Image and Graphics, 2019, 24(06): 0870-0881.Google Scholar
- Zhiliang Deng and Lei Li. Chinese food recognition model based on improved residual network. Progress in Laser and Optoelectronics 58, 6 (2021), 0610019.Google Scholar
- Niki Martinel, Gian Luca Foresti, and Christian Micheloni. Wide-slice residual networks for food recognition. In Proceedings of the Winter Conference on Applications of Computer Vision. Lake Tahoe, NV, US: IEEE (2018), 567-576.Google ScholarCross Ref
- Paritosh Pandey, Akella Deepthi, Bappaditya Mandal, and N. B. Puhan. FoodNet: recognizing foods using ensemble of deep networks. IEEE Signal Processing Letters 24,12 (2017), 1758-1762.Google ScholarCross Ref
- Eduardo Aguilar, Marc Bolaños, and Radeva Petia. Food recognition using fusion of classifiers based on CNNs. In Proceedings of the 19th International Conference on Image Analysis and Processing. Catania, Italy: Springer (2017), 213-224.Google ScholarCross Ref
- Jing Bian, Yixuan Wang, Yuhui Dai, Zezhong Chen, and Jingchun Huang. Recognition of ingredients and dish names based on convolutional neural network. Intelligent Computer and Applications 10, 6 (2020), 55-58.Google Scholar
- Gang Zhang and Shiqing Zhang. Food image recognition using deep convolutional neural network and transfer learning. Research and Exploration in Laboratory 38, 6 (2019), 1006-7167.Google Scholar
- Xinyue Guo, Qinhan Hu, Chunping Liu, and Jiwen Yang. Food image recognition based on transfer learning and batch normalization. Computer Applications and Software 38, 3 (2021), 124-133.Google Scholar
- Weisheng Yao, Yufan Shen, Yubo Peng, and Wei Sheng. Food image classification based on self-supervised preprocessing, Intelligent Computer and Applications 11, 3 (2021), 9-15.Google Scholar
- Ashish Jaiswal, Ashwin Ramesh Babu, Mohammad Zaki Zadeh, DebapriyaBanerjee, and Fillia Makedon. A survey on contrastive self-supervised learning. Technologies 9, 2 (2021).Google ScholarCross Ref
- Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, and Armand Joulin. Unsupervised learning of visual features by contrasting cluster assignments. In Proceedings of the 34th Conference on Neural Information Processing Systems. Vancouver, Canada. (2020).Google ScholarDigital Library
- Hangbo Bao, Li Dong, and Furu Wei. BEiT: BERT pre-training of image transformers. In Proceedings of the 10th International Conference on Learning Representations. (2022).Google Scholar
- Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. Zero-shot text-to-image generation. In Proceedings of the 37th International Conference on Machine Learning. (2020).Google Scholar
- Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16×16 words: transformers for image recognition at scale. In Proceedings of the 9th International Conference on Learning Representations. (2021).Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas: IEEE (2016), 770–778.Google ScholarCross Ref
- Ze Liu, Yutong Lin, Yue Cao, Han hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: hierarchical vision transformer using shifted windows. In Proceeding of the IEEE/CVF International Conference on Computer Vision. (2021), 10012-10022.Google ScholarCross Ref
- Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jegou, Julien Mairal, Piotr Nojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. In Proceeding of the IEEE/CVF International Conference on Computer Vision. (2021), 9630-9640.Google ScholarCross Ref
Index Terms
- Food Image Recognition Method Based on Generative Self-supervised Learning
Recommendations
Semi-supervised self-growing generative adversarial networks for image recognition
AbstractImage recognition is an important topic in computer vision and image processing, and has been mainly addressed by supervised deep learning methods, which need a large set of labeled images to achieve promising performance. However, in most cases, ...
Occluded Facial Expression Recognition Using Self-supervised Learning
Computer Vision – ACCV 2022AbstractRecent studies on occluded facial expression recognition typically required fully expression-annotated facial images for training. However, it is time consuming and expensive to collect a large number of facial images with various occlusions and ...
Perceptual Image Dehazing Based on Generative Adversarial Learning
Advances in Multimedia Information Processing – PCM 2018AbstractConvolutional Neural Networks (CNN) based single image dehazing methods have recently gained much attention. However, as they heavily rely on synthetic haze images, existing CNN-based dehazing methods have limitations in achieving visually ...
Comments