A novel approach to estimate the weight of food items based on features extracted from an image using boosting algorithms

Managing daily nutrition is a prominent concern among individuals in contemporary society. The advancement of dietary assessment systems and applications utilizing images has facilitated the effective management of individuals' nutritional information and dietary habits over time. The determination of food weight or volume is a vital part in these systems for assessing food quantities and nutritional information. This study presents a novel methodology for evaluating the weight of food by utilizing extracted features from images and training them through advanced boosting regression algorithms. Α unique dataset of 23,052 annotated food images of Mediterranean cuisine, including 226 different dishes with a reference object placed next to the dish, was used to train the proposed pipeline. Then, using extracted features from the annotated images, such as food area, reference object area, food id, food category, and food weight, we built a dataframe with 24,996 records. Finally, we trained the weight estimation model by applying cross validation, hyperparameter tuning, and boosting regression algorithms such as XGBoost, CatBoost, and LightGBM. Between the predicted and actual weight values for each food in the proposed dataset, the proposed model achieves a mean weight absolute error 3.93 g, a mean absolute percentage error 3.73% and a root mean square error 6.05 g for the 226 food items of the Mediterranean Greek Food database (MedGRFood), setting new perspectives in food image-based weight and nutrition estimate models and systems.


Related work
The fundamental parts that constitute dietary applications and systems encompass two vital parts: (i) the food image dataset, which is an essential starting point for data analysis and model training, and (ii) the volume estimation subsystem, which provides a crucial part in accurately estimating the quantity or weight of food items 11 .Figure 1 shows a vision-based nutritional assessment system and its main components.Its main parts include those related to and developed for this research, including the food image database and the food weight calculation subsystem.The food image dataset plays a crucial role in the development of a reliable dietary assessment system and has a direct impact on the effectiveness of its subsystems.The characterization of a dataset can be determined based on two primary factors: the quantity of images and classes it encompasses, as well as the specific cuisine type it represents; the source from which the images are obtained; and the type of use for the image database (i.e., for segmentation, classification, or volume estimation tasks).The Food524DB 12 dataset is comprised of a total of 247,636 food images that encompass a wide range of international cuisine.These images are categorized into 524 distinct food classes and were obtained from various existing databases.On the other hand, the UECFoodPix-Complete 13 dataset specifically focuses on Japanese cuisine.It consists of 10,000 food images that have been annotated and categorized into 102 food classes.This dataset is particularly well-suited for tasks related to image segmentation.Depending on the method used to determine the meal's volume or weight, the image dataset for volume or weight estimate tasks may also include the depth map of the food images.Additionally, other details like food weight, camera features, or camera viewing angle are required to calculate the food volume 14 .
In vision-based dietary assessment systems, the most challenging tasks are volume or weight and nutrient estimation.The challenges associated with estimating the amount of food through image analysis for nutritional assessment systems are primarily attributed to the controlled environment required for food image capture, the need for multiple images, the difficulty in estimating the volume of the food with weak textural features, and the variability in dataset creation methods across different studies.These factors contribute to the complexity of accurately determining food quantities from images, as there are various interpretations and approaches employed by different systems in addressing this task.The categorization of current methods for estimating volume can be divided into five main categories 15 : (i) approaches based on stereo vision techniques 16 , (ii) approaches based on pre-build shape templates 17 , (iii) approaches based on perspective transformation 18 , (iv) approaches based on depth cameras 9 , and (v) approaches based on deep learning 19 .In general, each of the five approaches exhibits technical characteristics that impose certain limitations on their applicability.These limitations include the requirement for multiple images, a limited number of shape templates, dependence on wearable devices (e.g., depth camera), and the challenges associated with generating a three-dimensional food point cloud.

Results
The boosting regression models were validated using the validation subset.We performed the runs 10 different times for each model, employing a randomized approach to choose the training and validation sets.The average results and a comparison of tenfold cross validation are shown in Table 1 for the training models (XGBoost, CatBoost, and LightGBM) developed in this study.The proposed model, using the XGBoost algorithm, achieves a MWAE overall of 3.93 g, a MAPE overall of 3.73% and a RMSE overall of 6.05 g per food item on the MedGRFood database.The model with the CatBoost algorithm achieves a MWAE overall of 16.15 g, a MAPE overall of 16.44% and a RMSE overall of 22.19 g, and the final model with the LightGBM algorithm a MWAE overall of 13.93 g, a MAPE overall of 12.94% and a RMSE overall of 21.26 g.The findings of this study are highly promising, since they present a novel approach to estimate the weight of food images that differs from the existing methods discussed in the relevant literature 15 (Table 2).The outperforming results demonstrated by the model utilizing the XGBoost algorithm in comparison to the other two models are most likely due to XGBoost's ability to handle datasets with limited features as well as its improved ability to effectively optimize the hyperparameters.In Fig. 2, we show the MWAE, MAPE, and RMSE metrics for each run of the XGBoost regression model for the training and validation subset random splits.We notice that the best values in the evaluation indices are observed in the fifth time, while the worst values are observed in the sixth time.Figure 2 shows the superiority of the XGBoost algorithm in relation to the two other boosting algorithms employed.This is proven from the consistently superior performance of XGBoost, as its results above the average performance of the other algorithms.Furthermore, it confirms the very good overall outcomes achieved by the suggested pipeline on our generated dataset.Figure 3 presents the overall density distribution of the continuous actual and predicted values of the weight estimation models, where the superiority of the model based on the XGBoost algorithm is depicted (blue line).We observe that the distributions of actual and predicted values show more variation for foods weighing between 200 and 300 g and for foods weighing more than 700 g.In contrast, it can be observed that there is a convergence between the predicted and actual values, resulting in a lower variance, for food items that have a low weight.Similarly, this convergence is also observed for food items that weigh more than 300 g.In Fig. 4, we present the actual and predicted weight values compared to each of the dataset features generated for the proposed model.We observe that the largest residuals are for food items belonging to the category with id six, eight and twelve (grain, vegetable   and miscellaneous products).In these categories, there are foods in liquid form, such as soups, where the exact calculation of their weight is a very difficult task due to the depth of the dish that contains them.In contrast, it is worth noting that foods that do not fall into the previously mentioned categories show improved accuracy in predicted weight measurement due to their more distinct shapes.Furthermore, looking at the predicted versus actual values relative to the area of the food in pixels, we observe a larger price deviation for foods with a larger surface area, which is also confirmed by the image associated with the feature reference to food area.In our analysis, it is clear that images featuring significant food or reference areas exhibit a greater degree of variability in weight calculation.The observed trend can be attributed to the utilization of a wide viewing angle during image capture.As a consequence, a larger number of pixels from the food or reference card area are included, thereby resulting in predicted values with greater variance.By implementing a protocol of slightly constraining the shooting angle and distance during the process of capturing images, it is possible to assume that the observed deviations in weight value prediction could be reduced, potentially leading to improved outcomes.Finally, Fig. 5 presents the distribution of MAPE across various food categories.We notice a large dispersion in vegetables, where there are foods with very little weight in which it is possible that there will be overlapping of their various pieces during photography (i.e., raw glistrida, spinach salad, parsley), so we are also led to a weight value prediction with a large deviation.

Discussion
In the field of food image databases, the application of deep learning techniques for the purpose of food recognition tasks has been observed to produce databases that aim to include a large number of images for each food category.The existing databases have several limitations in terms of the number of food classes they include, which is dependent upon the dietary preferences and practices of the researchers who are constructing the databases.The task of collecting food images and building food image databases has become less difficult nowadays, mainly due to the widespread practice of downloading and sharing images on social media platforms, which provides the ability to collect images from different sources.Nevertheless, the development of an extensive database that incorporates not only the nutritional information of food but also its constituent ingredients or weight remains a demanding task.
In this study, we presented an updated version of the MedGRFood database that includes more images and food categories with recorded food weight 11 .The MedGRFood food image database focuses primarily on Mediterranean cuisine, thus limiting its application to a wider range of culinary traditions.However, the process of annotating images and creating a dataset containing the unique features extracted from each annotated image provides an innovative perspective on how to approach similar problems.The dataset generated in this study represents an innovative effort in the field, as it is the first to present this structure.The proposed dataset was generated through the resulting question, "How can the problem of estimating the weight or amount of food be approached as a regression problem?".The previously mentioned question inspired the identification of the following features: food area, food reference area, food name id, category name id, and weight, which act as The task of estimating volume represents major difficulties in the context of vision-based dietary assessment systems.The use of depth cameras in the field of scale and quantity calculation, as well as in capturing multiple images for 3D reconstruction of food, presents certain constraints that limit their extensive adoption.Moreover, it is crucial to note that the application of food estimation methodologies based in geometric patterns allows for the calculation of volume estimation only for a limited number of food items characterized by identifiable geometric formats.The application of deep learning methods in the domain of food volume estimation has attracted significant interest in recent years due to its promising results.However, it has been observed through relevant literature 14 that these techniques do not exhibit superior performance compared to the methodologies currently in use.
Table 2 presents the comparison of recent food volume or weight approaches with the proposed study.It is obvious that the proposed study is superior in terms of the number of foods for which their weight can be calculated, in that it requires only one image without additional devices and without a specific acquisition method, and in that it can be applied to both solid and liquid foods without limitations about the shape of the plate or the type of reference object that it needs.This study's innovative method for calculating food weight based on an annotated image is the reason for this distinction.The dataset that has been generated enables us to establish a correlation between the calculation of food weight and a regression problem, further allowing us to treat it as a food weight estimation problem rather than a food quantity estimation problem.To address this, we proceeded by building and training boosting regression algorithms.The outcomes obtained from the implemented model, utilizing the XGBoost algorithm, exhibit a notable advancement over the existing methodologies 14 .The decision to exclusively consider algorithms from the boosting family was based on their potential efficiency in addressing regression problems and their ability to surpass the performance of traditional ML and deep learning algorithms.They exhibit the ability to deal with multiple categorical features, demonstrating superior outlier handling capabilities compared to other algorithms.Additionally, boosting algorithms show reduced bias, mitigating the risk of overfitting, and they enable the optimization of regression models across a wide range of parameters.Also, although in the respective research studies they usually present the results of one metric, in our research we presented the results of all the metrics used in food volume or weight estimation tasks through images.In addition, although similar studies make a clear distinction and estimate the quantity of solid foods 16 , the present study offers a holistic approach without any distinction.This novel approach offers a promising solution to address the basic challenge of accurately estimating food weights through images.The methodology proposed requires the inclusion of just one image, preferably captured from a view from above or with a low viewing angle, and demands the accurate segmentation and classification of the food items present on the plate.This process will provide useful information about the food itself, its category, and finally the area of pixels covered by the food and the reference object.The next steps of our research include evaluating the proposed system on an external food dataset, as boosting algorithms tend to underperform in a range of values different from the one, they were trained on.Also, building and training more complex models for food weight estimation utilizing Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) models are among our priorities.

Conclusions
In this study, the architecture and overall concept of a model for estimating the weight of food by extracting features from annotated images were presented.The appropriate dataframe was created and the model is based on an augmented regression algorithm.The proposed methodology and model provide an innovative approach and solution to the problem of calculating food weight from images.By combining it with the database of nutrients and macronutrients 20 and integrating it into a dietary assessment system, it aims to support health professionals in identifying dietary risks and consumers in following a healthy and balanced diet.Both perspectives play a significant role in the prevention of malnutrition, as well as several other diseases and conditions related to nutrition.

Food image dataset
The MedGRFood 21 image database was utilized in the current study for both training and evaluating the weight estimation model.The MedGRFood dataset is a recently introduced database of food images specifically focused on Mediterranean cuisine.It comprises a total of 51,840 images, each representing a distinct Mediterranean dish, and these images are categorized into 160 classes, making it suitable for various classification tasks.Additionally, the dataset includes an additional subset of 23,052 food images, categorized into 226 categories with at least 100 images for each food.The entirety of the images have been systematically collected within a controlled environment, where a reference object (card or coin) has been placed next to the dish.This subset is particularly useful for tasks related to volume estimation and contains images of Mediterranean dishes, such as pastitsio, moussaka, seafood dishes, nuts, fruits, etc. (Fig. 6). Figure 7 shows the distribution of food items for each food category in the MedGRFood image database used in this study.In the context of this research work, it was necessary to systematically annotate the entire collection of 23,052 images within the selected subset.The annotations were accurately executed, focusing on several key aspects, including the categorization of the food, the specific name  of the food item, the cuisine associated with the dish, the presence of any reference objects, and lastly, the weight of the food item in grams.The database includes a wide range of images for each food item, capturing different viewing angles and distances between the camera and the dish.The CVAT 22 annotation tool, an open-source tool, was utilized for the purpose of food image annotation.Figure 8 shows examples of CVAT-annotated images and the attributes imported.Then, the annotated images were exported in COCO format 23 , generating a JSON file that, with appropriate processing, creates a two-dimensional data structure of dimensions 24,996 × 12 with the features of the images.The creation of the dataframe includes food items that consist of multiple pieces, resulting in a larger number of records compared to the total number of images available in the MedGRFood database.This dataframe comprises 24,996 records arranged as rows, each containing 12 different features represented as columns.Some of these features included are: the food name, food name id, category name, category name id, weight, food area in pixels, and the reference area in pixels, creating a unique dataset that includes food features which is suitable for training machine learning regression models.There is a direct connection between the fields food name and food name id, and the fields category name and category name id based on the Greek Composition dataset by the Hellenic Health Foundation 24 .For example, the food name "pastitsio" has food name id 274 and category name id 12. Table 3 shows the creation of the data structure of the annotated images exported in COCO format.

Data manipulation
The next step in the proposed methodology includes data manipulation, where the data is methodically organized to enhance its readability, design, or structure.The process of data manipulation plays a crucial role in optimizing the utilization of information by systematically organizing raw data into a structured format.This procedure is required for improving productivity, identifying and analyzing patterns and trends, among other benefits.Since a large part of the images have been captured with a card as the reference object and the rest with a 2-euro coin as the reference object, the first step is to convert the reference area field so that all records refer to the 2-euro coin for reference.Knowing that the ratio between the areas covered by a reference card (8.5 cm × 5.5 cm) and the 2-euro coin is 8.9, we convert the records that have the reference object area of the card into a 2-euro coin.Next, we create a new field, the ratio of the reference area to the food area, which is unique for each type of food since it directly depends on the distance the image was taken and the perimeters of the areas of interest.Then, through the dataframe, a selection was made to exclude certain attributes, namely image id, food name, category name,  www.nature.com/scientificreports/file name, height, width, and cuisine.Consequently, the resulting dataframe contains a total of 24,996 records and 6 columns.Figure 9 illustrates the features and their respective associations within the generated dataframe.

Boosting algorithms
Once the dataframe has been appropriately manipulated, a dataset that is suitable for machine learning regression techniques has been generated.The task of estimating weight can be defined as a regression problem, in particular for value forecasting.In this case, the requested value, indicated as y, represents the weight of the food and is considered the dependent variable, while the remaining features represent the independent variables.In the current study, the use of boosting machine learning algorithms was employed to calculate food weight.These algorithms were chosen due to their strong capabilities to solve regression problems, their ability to improve predictive accuracy, their capacity to handle diverse data types, and their flexibility in optimizing a range of loss functions.
Boosting is a widely used ensemble learning technique in which a series of weaker learners are sequentially fit to a given dataset.This iterative process aims to improve the overall predictive performance of the ensemble by focusing on the instances that were previously misclassified.By iteratively adjusting the weights assigned to each instance, boosting effectively emphasizes the difficult-to-classify instances, allowing subsequent learners to focus on these challenging cases.This iterative nature of boosting enables the ensemble to learn from the mistakes made by previous learners, leading to a more accurate and robust final prediction.Each subsequent weak learner that is trained is designed with the objective minimize the errors that arise from the previous learner 25 .

Extreme gradient boosting
The algorithm initially utilized for the computation of food weight was the Extreme Gradient Boosting 26 (XGBoost) algorithm.XGBoost is an optimized implementation of gradient boosting (Table 4) that has received significant popularity and appreciation due to its efficiency and scalability.Also, XGBoost is a popular machine learning algorithm that presents a range of enhancements compared to the traditional gradient boosting technique.These advancements include the incorporation of regularization techniques, the ability to handle sparse data efficiently, the utilization of parallel computing for improved performance, and an outstanding accuracy that surpasses other machine learning algorithms in various predictive modeling cases.XGBoost works through an ensemble learning technique that utilizes the combination of multiple weak learners to construct a robust and powerful learner, and a training process that involves the construction of multiple decision trees.Finally, fit a regression tree to the targets giving terminal regions , … , j is a terminal node (i.e., a leaf) in the tree and J represents the total number of leaves 5.

compute ( , ( ) + = , … , ∈
where Fm is the prediction for each tree //searching for minimizes the loss function on each terminal node j 6.
update the model: // ν is learning rate ranging between 0 and 1 each tree is trained on a subset of the data, and the predictions from each tree are combined to form the final prediction.The structure of the proposed XGBoost regression algorithm is shown as a decision tree in Fig. 10.The numbers in the ellipses represent the feature thresholds defined during the decision tree's construction.

Categorical boosting
The next algorithm employed in our study was the Categorical Boosting 27 (CatBoost) algorithm.The CatBoost algorithm is a depth-wise gradient boosting technique that has been developed to address the challenges associated with effectively handling categorical features.The approach employed in this algorithm involves the utilization of a hybrid technique, specifically gradient boosting, in conjunction with one-hot coding.The utilization of this particular combination allows the efficient handling of categorical variables, consequently reducing the necessity for extensive preprocessing steps.The CatBoost algorithm incorporates a range of techniques effectively address the issue of overfitting that can happen during the boosting process.The proposed approach incorporates regularization techniques, specifically "l2" and "l1" regularization, to prevent overfitting and enhance generalization.These regularization techniques applied to the leaf values of the trees.Additionally, the method employs feature selection techniques (i.e., border count), which further contribute to the prevention of overfitting and generalization.

Light gradient boosting machine
The final boosting algorithm we used was the Light Gradient Boosting Machine 28 (LightGBM).LightGBM is a distributed and efficient gradient-boosting framework (Table 4) that uses tree-based learning, that is designed to be highly efficient and scalable.The decision trees in LightGBM are constructed using a unique technique called "Leaf-wise" tree growth.In contrast to conventional depth-first approaches such as Depth-wise or Level-wise, where trees are expanded by dividing nodes at each level, LightGBM adopts a top-down approach by selecting the leaf nodes.The algorithm employs the strategy of selecting the leaf node with the highest delta loss for splitting.This approach leads to the generation of trees that are both more informative and deeper in structure.It is very fast in handling a large amount of data thus it is named as "light".It introduces several innovative techniques, such as Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB), which enable it to handle large-scale data and high-dimensional feature spaces.Finally, LightGBM effectively addresses the issue of overfitting by integrating regularization techniques, specifically L1 and L2 regularization.

Hyperparameters tuning
Hyperparameters refer to the adjustable variables that are used to determine the learning process of a machine learning model.The users demonstrate the ultimate authority to determine how the model acquires knowledge about a particular association between input data and related predictions.The optimization of a model's hyperparameters is a crucial step in addressing a particular problem, as it enables the development of an optimal model by identifying the most appropriate combinations of hyperparameters.The proposed model should have the ability to yield the best results by minimizing the loss function.To tune the hyperparameters of the proposed weight estimation models, we used the Optuna framework 29 .Optuna offers a convenient and efficient way to incorporate various advanced optimization techniques for the purpose of rapidly and effectively optimizing hyperparameters.By default, Optuna employs a Bayesian 30 optimization algorithm (TPE).However, it offers the flexibility to seamlessly switch to alternative algorithms available within the Optuna framework.Figure 11 presents the hyperparameter importance of the proposed CatBoost algorithm.

Training and evaluation
To train and evaluate the boosting algorithms, we split the data into training and validation subsets.For the validation subset, we randomly selected 10% of the records from each food after first shuffling the existing records for each of them.Thus, two sub-datasets are obtained from the existing dataset, the first containing 22,527 records used for training and evaluation and the second with 2469 records used for validation.Model testing was performed on validation subset that were not considered during the training phase and, consequently, did not influence the feature selection process.To account for randomness, obtain reliable and stable results, and  www.nature.com/scientificreports/algorithms.These additional features are for each food item: the average weight, the average food area, the average reference area, the weight standard deviation, and the ratio of the average reference area to the average food area.
Figure 13 shows the feature importance for the proposed XGBoost model.We observe that all features affect the training of the XGBoost algorithm almost the same, except Reference_area feature, which clearly affects less.As evaluation metrics for the proposed model, we used for each food item the Mean Weight Absolute Error-MWAE: the Mean Absolute Percentage Error-MAPE: and the Root Mean Square Error-RMSE: where W pred is the predicted weight of food, W real is the real weight, and n represents the corresponding records for each food item present in the generated dataset.In total, we estimate the weight of 226 different dishes from the MedGRFood image dataset, using the evaluation metrics: and,

Implementation
The workflows were executed under the high-performance computing infrastructure (HCI) which has been explicitly designed for data intensive tasks as part of the PRECIOUS project.The HCI currently includes 576 Intel(R) Xeon(R) Gold 5220R physical cores, 86000 CUDA cores, 4.6 TB RAM, and 0.5 PB raw storage.Also, we used the Python programming language to implement the dietary assessment system in the Anaconda environment, installing appropriate libraries for the implementation of the food weight estimation system.

Figure 1 .
Figure 1.A dietary assessment system that includes the proposed methodology for estimating the weight of food.

Figure 4 .
Figure 4. Predicted vs. actual values for food_name_id, category_name_id, food_area and reference_area features in the generated dataset using the XGBoost algorithm.

Figure 6 .
Figure 6.Examples of food images from the MedGRFood dataset used in this research.

Figure 7 .
Figure 7. Distribution of food items for each food category in MedGRFood database.

Figure 8 .
Figure 8. Annotated images using CVAT annotation tool with imported attributes.

Figure 9 .Table 4 .
Figure 9. Correlation table between the features of the generated dataframe.

Figure 10 .Figure 11 .
Figure 10.A decision tree structure of the proposed XGBoost algorithm.

Figure 12 .
Figure 12.The proposed weight calculation pipeline during the training and validation steps.

Figure 13 .
Figure 13.Feature importance of the XGBoost algorithm.

Table 1 .
Average results of the proposed boosting algorithms.

Table 2 .
Presentation of food volume and weight approaches including the proposed study.

Figure 2. MWAE, MAPE, and RMSE metrics for each run of the XGBoost regression model. Figure 3. Density
distribution of actual vs predicted values between the used boosting algorithms.

Table 3 .
Structure of the generated dataframe dataset.