Skip to main content
Log in

Foodnet: multi-scale and label dependency learning-based multi-task network for food and ingredient recognition

  • Review
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Image-based food pattern classification poses challenges of non-fixed spatial distribution and ingredient occlusion for mainstream computer vision algorithms. However, most current approaches classify food and ingredients by directly extracting abstract features of the entire image through a convolutional neural network (CNN), ignoring the relationship between food and ingredients and ingredient occlusion problem. To address these issues mentioned, we propose a FoodNet for both food and ingredient recognition, which uses a multi-task structure with a multi-scale relationship learning module (MSRL) and a label dependency learning module (LDL). As ingredients normally co-occur in an image, we present the LDL to use the dependency of ingredient to alleviate the occlusion problem of ingredient. MSRL aggregates multi-scale information of food and ingredients, then uses two relational matrixs to model the food-ingredient matching relationship to obtain richer feature representation. The experimental results show that FoodNet can achieve good performance on the Vireo Food-172 and UEC Food-100 datasets. It is worth noting that it reaches the most state-of-the-art level in terms of ingredient recognition in the Vireo Food-172 and UECFood-100.The source code will be made available at https://github.com/visipaper/FoodNet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Algorithm 1
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data Availability

The datasets VireoFood-172 and UEC Food-100 used to train and evaluate the neural networks are publicly available at http://vireo.cs.cityu.edu.hk/VireoFood172/ and http://foodcam.mobi/dataset.html.

Notes

  1. http://pytorch.org/.

  2. https://scikit-learn.org/stable/.

References

  1. Guillaumin M, Gool LV, et al. (2014) Food-101 - mining discriminative components with random forests. In: Proceeding of the 13 th European Conference Computer Vison, Springer, Cham, Switzerland, pp 446–461

  2. He H, Kong F, Tan J (2016) DietCam: multi-view food recognition using a multikernel SVM. In: Proceedings of the IEEE journal of biomedical and health informatics, May, pp 848–855

  3. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 770-778

  4. Szegedy C, et al. (2014) Going deeper with convolutions. IEEE Comput Soc, pp 1-9

  5. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Conference on computer vision and pattern recognition (CVPR) , pp 770–778

  6. Zhang N, Donahue J, et al. (2014) Part-based r-cnns for fine-grained category detection. In: Proceedings of the ECCV. Springer , pp 834–849

  7. Chen Y, Bai Y, Zhang W, Mei T (2019) Destruction and construction learning for fine-grained image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5152–5161

  8. Won CS (2020) Multi-scale CNN for fine-grained image recognition. In: IEEE Access , pp 116663–116674

  9. Zhao Y, Yan K, Huang F, Li J (2021) Graph-based high-order relation discovery for fine-grained recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 15074–15083

  10. Lin T, RoyChowdhury A, Maji S (2018) Bilinear convolutional neural networks for fine-grained visual recognition. In: IEEE transactions on pattern analysis and machine intelligence, pp 1309–1322

  11. Fu J, Zheng H, Mei T (2017) Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4476–4484

  12. Yan T, Li H, Sun B, Wang Z, Luo Z (2022) Discriminative feature mining and enhancement network for low-resolution fine-grained image recognition. In: IEEE transactions on circuits and systems for video technology

  13. Liu C, Liang Y, Xue Y, Qian X, Fu J (2021) Food and ingredient joint learning for fine-grained recognition. IEEE Trans Circuits Syst Video Technol 31(6):2480–2493

    Article  Google Scholar 

  14. Zhang C, Huang Z, Liu S, Xiao J (2022) Dual-channel multi-task CNN for no-reference screen content image quality assessment. In: IEEE transactions on circuits and systems for video technology

  15. Chen J, Ngo CW, (2016) Deep-based ingredient recognition for cooking recipe retrieval. In: Acm on multimedia conference ACM

  16. Matsuda Y, Yanai K (2012) Multiple-food recognition considering co-occurrence employing manifold ranking. In: Proceedings of the international conference on pattern recognition, pp 2017–2020

  17. Bhadane P, Ravikesh Bhaladhare P (2021) Optimized deep neuro fuzzy network based automatic approach for segmentation and food recognition. In: 2021 in 5th international conference on information systems and computer networks (ISCON), pp 1–4. https://doi.org/10.1109/ISCON52037.2021.9702370

  18. Anthimopoulos MM, Gianola L, Scarnato L, Diem P, Mougiakakou SG (2014) A food recognition system for diabetic patients based on an optimized bag-of-features model. IEEE J Biomed Health Inform 18(4):1261–1271. https://doi.org/10.1109/JBHI.2014.2308928

    Article  PubMed  Google Scholar 

  19. Lo FPW, Sun Y, Qiu J, Lo B (2020) Image-based food classification and volume estimation for dietary assessment: a review. IEEE J Biomed Health Inform 24(7):1926–1939. https://doi.org/10.1109/JBHI.2020.2987943

    Article  PubMed  Google Scholar 

  20. Ciocca G, Napoletano P, Schettini R (2017) Food recognition: a new dataset, experiments, and results. IEEE J Biomed Health Inform 21(3):588–598. https://doi.org/10.1109/JBHI.2016.2636441

    Article  PubMed  Google Scholar 

  21. Lo FPW, Sun Y, Qiu J, Lo B (2020) Image-based food classification and volume estimation for dietary assessment: a review. IEEE J Biomed Health Inform 24(7):1926–1939. https://doi.org/10.1109/JBHI.2020.2987943

    Article  PubMed  Google Scholar 

  22. Lu Y, Allegra D, Anthimopoulos M, Stanco F, Farinella GM, Mougiakakou S (2018) A multi-task learning approach for meal assessment. In: Proceedings of the joint workshop on multimedia for cooking and eating activities and multimedia assisted dietary management, pp 46–52

  23. Jiang S, Min W, Liu L, Luo Z (2020) Multi-scale multi-view deep feature aggregation for food recognition. IEEE Trans Image Process 29:265–276

    Article  ADS  MathSciNet  PubMed  Google Scholar 

  24. Jiang S, et al. (2020) Few-shot food recognition via multi-view representation learning. In: ACM transactions on multimedia computing communications and applications

  25. Zhao H, Yap K, Kot AC, Duan L, Cheung N (2019) Few-shot and many-shot fusion learning in mobile visual food recognition. IEEE Int Symp Circ Syst (ISCAS) 2019:1–5

    CAS  Google Scholar 

  26. Zhang L, Zhao J, Li S, Shi B, Duan L-Y (2019) From market to dish: multi-ingredient image recognition for personalized recipe recommendation. IEEE Int Conf Multimedia Expo (ICME) 2019:1252–1257

    Google Scholar 

  27. Aguilar E, Bolanos M, Radeva P (2019) Regularized uncertainty-based multi-task learning model for food analysis. J Vis Commun Image Represent 60:360–370

    Article  Google Scholar 

  28. Ege T, Yanai K (2018) Multi-task learning of dish detection and calorie estimation. In: Proceedings of the joint workshop on multimedia for cooking and eating activities and multimedia assisted dietary management, pp 53–58

  29. Read J, Pfahringer B, Holmes G, Frank E (2009) Classifier chains for multi-label classification. Mach Learn Knowl Discov Databases pp 254–269

  30. Nam J, Mencía EL, Kim HJ, Fürnkranz J (2017) Maximizing subset accuracy with recurrentneural networks in multi-label classification. In: Advances in neural information processing systems, pp 5419–5429

  31. Wang J, Y ang Y, Mao J, Huang Z, Huang C, Xu W (2016) Cnn-rnn: A unified framework for multi-label image classification. In: IEEE conference on computer vision and pattern recognition, pp 2285–2294

  32. Zhang J, Wu Q, Shen C, Zhang J, Lu J (2018) Multilabel image classification with regional latent semantic dependencies. IEEE Trans Multimedia 20(10):2801–2813

    Article  Google Scholar 

  33. Host-Parasite: Graph LSTM-In-LSTM for group activity recognition. In: IEEE transactions on neural networks and learning systems (TNNLS), 32(2): 663–674 (2021)

  34. Chen T, Xu M, Hui X, Wu H, Lin L (2019) Learning semantic-specific graph representation for multi-label image recognition. In: IEEE/CVF international conference on computer vision (ICCV), pp 522–531

  35. You R, et al. (2020) Cross-modality attention with semantic graph embedding for multi-label classification. In: The AAAI conference on artificial intelligence

  36. Yu F, Vladlen K (2015) Multi-scale context aggregation by dilated convolutions. In: CVPR, arXiv preprint arXiv:1511.07122

  37. Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2881–2890

  38. Yang M, Yu K, Zhang C, Li Z, Yang K (2018) DenseASPP for semantic segmentation in street scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3684–3692

  39. Kamnitsas K et al (2017) Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Med Image Anal 36(2017):61–78

    Article  PubMed  Google Scholar 

  40. Cui Z, Wenlin C, Yixin C (2016) Multi-scale convolutional neural networks for time series classification. In: Computer vision and pattern recognition, arXiv preprint arXiv:1603.06995

  41. Li Y, Chen Y, Wang N, Zhang Z (2019) Scale-aware trident networks for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 6054–6063

  42. Pennington J, Socher R, Manning C (2014) GloV e: global vectors for word representation. In: Proceedings of empirical methods in natural language processing, pp 1532–1543

  43. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: International conference on learning representations, pp 1–12

  44. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-CAM: visual explanations from deep networks via gradient-based localization. In: IEEE international conference on computer vision (ICCV), pp 618–626

  45. Zhu K, Wu J (2021) Residual attention: a simple but effective method for multi-label recognition. In: IEEE/CVF international conference on computer vision (ICCV), pp 184–193

  46. Chen J, Zhu B, Ngo C-W, Chua T-S, Jiang Y-G (2021) A study of multi-task and region-wise deep learning for food ingredient recognition. IEEE Trans Image Process 30:1514–1526

    Article  ADS  PubMed  Google Scholar 

  47. Won CS (2020) Multi-scale CNN for fine-grained image recognition. IEEE Access 8:116663–116674

    Article  Google Scholar 

  48. Dosovitskiy A, Beyer L, Kolesnikov A, et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

  49. Liu Z, et al. (2022) Swin transformer V2: scaling up capacity and resolution. In: 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR), New Orleans, LA, USA, pp 11999–12009. https://doi.org/10.1109/CVPR52688.2022.01170

  50. Qi Charles R, et al (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition

  51. Zhao H, et al (2021) Point transformer. In: Proceedings of the IEEE/CVF international conference on computer vision

  52. Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yong Li.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: FoodNet structure information

Appendix A: FoodNet structure information

The detailed structure of FoodNet is shown in Tables 12, 13.

Table 12 Structure details of the FoodNet based on ResNet50
Table 13 Structure details of the FoodNet based on DenseNet161

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shuang, F., Lu, Z., Li, Y. et al. Foodnet: multi-scale and label dependency learning-based multi-task network for food and ingredient recognition. Neural Comput & Applic 36, 4485–4501 (2024). https://doi.org/10.1007/s00521-023-09349-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-09349-4

Keywords

Navigation