Foodnet: multi-scale and label dependency learning-based multi-task network for food and ingredient recognition

Shuang, Feng; Lu, Zhouxian; Li, Yong; Han, Chao; Gu, Xia; Wei, Shidi

doi:10.1007/s00521-023-09349-4

Foodnet: multi-scale and label dependency learning-based multi-task network for food and ingredient recognition

Review
Published: 24 December 2023

Volume 36, pages 4485–4501, (2024)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Feng Shuang¹^na1,
Zhouxian Lu¹^na1,
Yong Li¹,
Chao Han¹,
Xia Gu¹ &
…
Shidi Wei¹

206 Accesses
Explore all metrics

Abstract

Image-based food pattern classification poses challenges of non-fixed spatial distribution and ingredient occlusion for mainstream computer vision algorithms. However, most current approaches classify food and ingredients by directly extracting abstract features of the entire image through a convolutional neural network (CNN), ignoring the relationship between food and ingredients and ingredient occlusion problem. To address these issues mentioned, we propose a FoodNet for both food and ingredient recognition, which uses a multi-task structure with a multi-scale relationship learning module (MSRL) and a label dependency learning module (LDL). As ingredients normally co-occur in an image, we present the LDL to use the dependency of ingredient to alleviate the occlusion problem of ingredient. MSRL aggregates multi-scale information of food and ingredients, then uses two relational matrixs to model the food-ingredient matching relationship to obtain richer feature representation. The experimental results show that FoodNet can achieve good performance on the Vireo Food-172 and UEC Food-100 datasets. It is worth noting that it reaches the most state-of-the-art level in terms of ingredient recognition in the Vireo Food-172 and UECFood-100.The source code will be made available at https://github.com/visipaper/FoodNet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recognize after early fusion: the Chinese food recognition based on the alignment of image and ingredients

Article 26 March 2024

Cross-modal recipe retrieval with stacked attention model

Article 17 April 2018

Multi-food detection using a modified swin-transfomer with recursive feature pyramid network

Article 12 December 2023

Data Availability

The datasets VireoFood-172 and UEC Food-100 used to train and evaluate the neural networks are publicly available at http://vireo.cs.cityu.edu.hk/VireoFood172/ and http://foodcam.mobi/dataset.html.

Notes

References

Guillaumin M, Gool LV, et al. (2014) Food-101 - mining discriminative components with random forests. In: Proceeding of the 13 th European Conference Computer Vison, Springer, Cham, Switzerland, pp 446–461
He H, Kong F, Tan J (2016) DietCam: multi-view food recognition using a multikernel SVM. In: Proceedings of the IEEE journal of biomedical and health informatics, May, pp 848–855
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 770-778
Szegedy C, et al. (2014) Going deeper with convolutions. IEEE Comput Soc, pp 1-9
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Conference on computer vision and pattern recognition (CVPR) , pp 770–778
Zhang N, Donahue J, et al. (2014) Part-based r-cnns for fine-grained category detection. In: Proceedings of the ECCV. Springer , pp 834–849
Chen Y, Bai Y, Zhang W, Mei T (2019) Destruction and construction learning for fine-grained image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5152–5161
Won CS (2020) Multi-scale CNN for fine-grained image recognition. In: IEEE Access , pp 116663–116674
Zhao Y, Yan K, Huang F, Li J (2021) Graph-based high-order relation discovery for fine-grained recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 15074–15083
Lin T, RoyChowdhury A, Maji S (2018) Bilinear convolutional neural networks for fine-grained visual recognition. In: IEEE transactions on pattern analysis and machine intelligence, pp 1309–1322
Fu J, Zheng H, Mei T (2017) Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4476–4484
Yan T, Li H, Sun B, Wang Z, Luo Z (2022) Discriminative feature mining and enhancement network for low-resolution fine-grained image recognition. In: IEEE transactions on circuits and systems for video technology
Liu C, Liang Y, Xue Y, Qian X, Fu J (2021) Food and ingredient joint learning for fine-grained recognition. IEEE Trans Circuits Syst Video Technol 31(6):2480–2493
Article Google Scholar
Zhang C, Huang Z, Liu S, Xiao J (2022) Dual-channel multi-task CNN for no-reference screen content image quality assessment. In: IEEE transactions on circuits and systems for video technology
Chen J, Ngo CW, (2016) Deep-based ingredient recognition for cooking recipe retrieval. In: Acm on multimedia conference ACM
Matsuda Y, Yanai K (2012) Multiple-food recognition considering co-occurrence employing manifold ranking. In: Proceedings of the international conference on pattern recognition, pp 2017–2020
Bhadane P, Ravikesh Bhaladhare P (2021) Optimized deep neuro fuzzy network based automatic approach for segmentation and food recognition. In: 2021 in 5th international conference on information systems and computer networks (ISCON), pp 1–4. https://doi.org/10.1109/ISCON52037.2021.9702370
Anthimopoulos MM, Gianola L, Scarnato L, Diem P, Mougiakakou SG (2014) A food recognition system for diabetic patients based on an optimized bag-of-features model. IEEE J Biomed Health Inform 18(4):1261–1271. https://doi.org/10.1109/JBHI.2014.2308928
Article PubMed Google Scholar
Lo FPW, Sun Y, Qiu J, Lo B (2020) Image-based food classification and volume estimation for dietary assessment: a review. IEEE J Biomed Health Inform 24(7):1926–1939. https://doi.org/10.1109/JBHI.2020.2987943
Article PubMed Google Scholar
Ciocca G, Napoletano P, Schettini R (2017) Food recognition: a new dataset, experiments, and results. IEEE J Biomed Health Inform 21(3):588–598. https://doi.org/10.1109/JBHI.2016.2636441
Article PubMed Google Scholar
Lo FPW, Sun Y, Qiu J, Lo B (2020) Image-based food classification and volume estimation for dietary assessment: a review. IEEE J Biomed Health Inform 24(7):1926–1939. https://doi.org/10.1109/JBHI.2020.2987943
Article PubMed Google Scholar
Lu Y, Allegra D, Anthimopoulos M, Stanco F, Farinella GM, Mougiakakou S (2018) A multi-task learning approach for meal assessment. In: Proceedings of the joint workshop on multimedia for cooking and eating activities and multimedia assisted dietary management, pp 46–52
Jiang S, Min W, Liu L, Luo Z (2020) Multi-scale multi-view deep feature aggregation for food recognition. IEEE Trans Image Process 29:265–276
Article ADS MathSciNet PubMed Google Scholar
Jiang S, et al. (2020) Few-shot food recognition via multi-view representation learning. In: ACM transactions on multimedia computing communications and applications
Zhao H, Yap K, Kot AC, Duan L, Cheung N (2019) Few-shot and many-shot fusion learning in mobile visual food recognition. IEEE Int Symp Circ Syst (ISCAS) 2019:1–5
CAS Google Scholar
Zhang L, Zhao J, Li S, Shi B, Duan L-Y (2019) From market to dish: multi-ingredient image recognition for personalized recipe recommendation. IEEE Int Conf Multimedia Expo (ICME) 2019:1252–1257
Google Scholar
Aguilar E, Bolanos M, Radeva P (2019) Regularized uncertainty-based multi-task learning model for food analysis. J Vis Commun Image Represent 60:360–370
Article Google Scholar
Ege T, Yanai K (2018) Multi-task learning of dish detection and calorie estimation. In: Proceedings of the joint workshop on multimedia for cooking and eating activities and multimedia assisted dietary management, pp 53–58
Read J, Pfahringer B, Holmes G, Frank E (2009) Classifier chains for multi-label classification. Mach Learn Knowl Discov Databases pp 254–269
Nam J, Mencía EL, Kim HJ, Fürnkranz J (2017) Maximizing subset accuracy with recurrentneural networks in multi-label classification. In: Advances in neural information processing systems, pp 5419–5429
Wang J, Y ang Y, Mao J, Huang Z, Huang C, Xu W (2016) Cnn-rnn: A unified framework for multi-label image classification. In: IEEE conference on computer vision and pattern recognition, pp 2285–2294
Zhang J, Wu Q, Shen C, Zhang J, Lu J (2018) Multilabel image classification with regional latent semantic dependencies. IEEE Trans Multimedia 20(10):2801–2813
Article Google Scholar
Host-Parasite: Graph LSTM-In-LSTM for group activity recognition. In: IEEE transactions on neural networks and learning systems (TNNLS), 32(2): 663–674 (2021)
Chen T, Xu M, Hui X, Wu H, Lin L (2019) Learning semantic-specific graph representation for multi-label image recognition. In: IEEE/CVF international conference on computer vision (ICCV), pp 522–531
You R, et al. (2020) Cross-modality attention with semantic graph embedding for multi-label classification. In: The AAAI conference on artificial intelligence
Yu F, Vladlen K (2015) Multi-scale context aggregation by dilated convolutions. In: CVPR, arXiv preprint arXiv:1511.07122
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2881–2890
Yang M, Yu K, Zhang C, Li Z, Yang K (2018) DenseASPP for semantic segmentation in street scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3684–3692
Kamnitsas K et al (2017) Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Med Image Anal 36(2017):61–78
Article PubMed Google Scholar
Cui Z, Wenlin C, Yixin C (2016) Multi-scale convolutional neural networks for time series classification. In: Computer vision and pattern recognition, arXiv preprint arXiv:1603.06995
Li Y, Chen Y, Wang N, Zhang Z (2019) Scale-aware trident networks for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 6054–6063
Pennington J, Socher R, Manning C (2014) GloV e: global vectors for word representation. In: Proceedings of empirical methods in natural language processing, pp 1532–1543
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: International conference on learning representations, pp 1–12
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-CAM: visual explanations from deep networks via gradient-based localization. In: IEEE international conference on computer vision (ICCV), pp 618–626
Zhu K, Wu J (2021) Residual attention: a simple but effective method for multi-label recognition. In: IEEE/CVF international conference on computer vision (ICCV), pp 184–193
Chen J, Zhu B, Ngo C-W, Chua T-S, Jiang Y-G (2021) A study of multi-task and region-wise deep learning for food ingredient recognition. IEEE Trans Image Process 30:1514–1526
Article ADS PubMed Google Scholar
Won CS (2020) Multi-scale CNN for fine-grained image recognition. IEEE Access 8:116663–116674
Article Google Scholar
Dosovitskiy A, Beyer L, Kolesnikov A, et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Liu Z, et al. (2022) Swin transformer V2: scaling up capacity and resolution. In: 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR), New Orleans, LA, USA, pp 11999–12009. https://doi.org/10.1109/CVPR52688.2022.01170
Qi Charles R, et al (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Zhao H, et al (2021) Point transformer. In: Proceedings of the IEEE/CVF international conference on computer vision
Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907

Download references

Author information

Feng Shuang, Zhouxian Lu and Yong Li are shared the equal contributions and shared the co-first author.

Authors and Affiliations

The Guangxi Key Laboratory of Intelligent Control and Maintenance of Power Equipment, School of Electrical Engineering, Guangxi University, Nanning, 530000, China
Feng Shuang, Zhouxian Lu, Yong Li, Chao Han, Xia Gu & Shidi Wei

Authors

Feng Shuang
View author publications
You can also search for this author in PubMed Google Scholar
Zhouxian Lu
View author publications
You can also search for this author in PubMed Google Scholar
Yong Li
View author publications
You can also search for this author in PubMed Google Scholar
Chao Han
View author publications
You can also search for this author in PubMed Google Scholar
Xia Gu
View author publications
You can also search for this author in PubMed Google Scholar
Shidi Wei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong Li.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: FoodNet structure information

The detailed structure of FoodNet is shown in Tables 12, 13.

Table 12 Structure details of the FoodNet based on ResNet50

Full size table

Table 13 Structure details of the FoodNet based on DenseNet161

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Shuang, F., Lu, Z., Li, Y. et al. Foodnet: multi-scale and label dependency learning-based multi-task network for food and ingredient recognition. Neural Comput & Applic 36, 4485–4501 (2024). https://doi.org/10.1007/s00521-023-09349-4

Download citation

Received: 20 November 2022
Accepted: 26 November 2023
Published: 24 December 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s00521-023-09349-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Foodnet: multi-scale and label dependency learning-based multi-task network for food and ingredient recognition

Abstract

Access this article

Similar content being viewed by others

Recognize after early fusion: the Chinese food recognition based on the alignment of image and ingredients

Cross-modal recipe retrieval with stacked attention model

Multi-food detection using a modified swin-transfomer with recursive feature pyramid network

Data Availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A: FoodNet structure information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Foodnet: multi-scale and label dependency learning-based multi-task network for food and ingredient recognition

Abstract

Access this article

Similar content being viewed by others

Recognize after early fusion: the Chinese food recognition based on the alignment of image and ingredients

Cross-modal recipe retrieval with stacked attention model

Multi-food detection using a modified swin-transfomer with recursive feature pyramid network

Data Availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A: FoodNet structure information

Appendix A: FoodNet structure information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation