Smartphone Apps for Tracking Food Consumption and Recommendations: Evaluating Artificial Intelligence-based Functionalities, Features and Quality of Current Apps

The advancement of artificial intelligence (AI) and the significant growth in the use of food consumption tracking and recommendation-related apps in the app stores have created a need for an evaluation system, as minimal information is available about the evidence-based quality and technological advancement of these apps. Electronic searches were conducted across three major app stores and the selected apps were evaluated by three independent raters. A total of 473 apps were found and 80 of them were selected for review based on inclusion and exclusion criteria. An app rating tool is devised to evaluate the selected apps. Our rating tool assesses the apps' essential features, AI-based advanced functionalities, and software quality characteristics required for food consumption tracking and recommendations, as well as their usefulness to general users. Users' comments from the app stores are collected and evaluated to better understand their expectations and perspectives. Following an evaluation of the assessed applications, design considerations that emphasize automation-based approaches using artificial intelligence are proposed. According to our assessment, most mobile apps in the app stores do not satisfy the overall requirements for tracking food consumption and recommendations."Foodvisor"is the only app that can automatically recognize food items, and compute the recommended volume and nutritional information of that food item. However, these features need to be improvised in the food consumption tracking and recommendation apps. This study provides both researchers and developers with an insight into current state-of-the-art apps and design guidelines with necessary information on essential features and software quality characteristics for designing and developing a better app.


Introduction
Food is one of the most basic requirements of human life. It is often regarded as much more than a means of survival, and proper food intake is essential for human health and fitness. Our health is closely dependent on the 4 types or amount of food we intake (Min et al., 2019). There are numerous fields such as sociology, psychology, nutrition sciences, and medicine in which healthy food consumption is explored (Mai & Hoffmann, 2017). Food choices are negatively influenced by a busy lifestyle, bad habits, and low self-control (Brug et al., 1995;Koenigstorfer et al., 2014).
However, excessively unhealthy lifestyles and bad dietary habits, such as increased food intake with high energy and high fat, lead to various health issues (Ng et al., 2014). According to the World Health Organization (WHO), more than 1.9 billion adults (aged over 18) are overweight, and more than 650 million people suffer from obesity (Chu et al., 2018). Many chronic diseases such as hypertension, type 2 diabetes mellitus, cardiovascular disease, and stroke are linked to obesity and excess weight (Speiser et al., 2005). This problem is becoming a significant health concern. One of the main reasons for the obesity problem is that many people follow a very unhealthy lifestyle. Their dietary habits are also unhealthy, such as increased food intake with high energy and high fat. The intake of highly caloric, inexpensive, larger portion sizes and nutrient-dense foods promoted by environmental changes, coupled with decreased physical activity, and increased sedentary behaviors, is a significant causative factor for obesity (Beal et al., 2013).
In recent years, the use of smartphones to track food consumption or compute the nutritional value of food's has expanded due to the increasing number of food consumption tracking and recommendation apps in the app stores, and the great potential of smartphone's to be a useful tool (Kalinowska et al., 2021). Nowadays in app stores, many apps are focused on health and fitness. In the major app stores, there were 32500 mobile health apps available in 2017 and this number is continuing to rise (Ferrara et al., 2019). Apps can play an important role in simplifying the tracking of health-related behaviors and weight management (Chen et al., 2015). Moreover, the usage of smartphones and rapid development of artificial intelligence (AI) technologies have enabled new food identification systems for dietary assessment, which are significant for the prevention and treatment of chronic diseases such as type 2 diabetes mellitus, cardiovascular disease, and overcoming health issues such as obesity (Min et al., 2019). Furthermore, food intake behaviour (e.g., assessment of calorie intake, nutritional analysis, and eating habits) can be analyzed if food items or categories are recognized.
Recently, AI and machine learning based mobile food recognition methods are also being implemented. For example, He et al. (2014) used AI techniques for identifying food from an image. The bag of visual words model (BoW) has been used for representing food images as visual words distributions and the support vector machine (SVM) model has been used to classify (Farinella et al., 2014). Furthermore, Anthimopoulos et al. (2014) used SVM, artificial neural Network and random forest classifications on 5000 food images organized into 11 classes described in terms of different bag-of-features. The convolutional neural network (CNN) is also used in some studies (Christodoulidis et al., 2015;Kawano & Yanai, 2014). Ming et al. (2018) proposed a photo-based dietary tracking system that employed deep-based image recognition algorithms to recognize food and analyze nutrition. For estimating an individual's food and calorie intake, the calculation of food portion size or volume is necessary. In several studies, different types of methods (i.e., single image-based or multiple image-based) have been used for estimating food volume from food images (Kong & Tan, 2012;Sun et al., 2010;Dehais et al., 2016;Fang et al., 2018;Meyers et al., 2015). To achieve quantitative food intake estimation, researchers combined visual recognition and 3D reconstruction in a study (Puri et al., 2009). Both Android smartphone and web-based applications are implemented to recognize food and estimate the calorific and nutritional content of foods automatically without any user input (Zhang et al., 2015).
Food recommendation is a significant domain for people as well as society (Min et al., 2019). Incorporating health into recommendations is mostly a recent concern (Rokicki et al., 2018;Nag et al., 2017;Yang et al., 2017). Mokdara et al. (2018) proposed integrating deep neural network with a recommendation system focusing on Thai food. It not only considers users' food choices but also pays attention to users' health. Based on individual customer behaviors, tastes, and eating history, the system will assist consumers in making food selection decisions. Besides, a food recommendation system has been built to recommend food to diabetic patients based on nutrition and food characteristics (Phanich et al., 2010).
Reviews on various health-related apps have been conducted in many different studies. A prior study reviewed diet tracking apps common in the Apple App Store and Google Play Store (Ferrara et al., 2019). Franco et al. (2016) analyzed the main features of the most popular nutrition apps and compared their strategies and technologies for dietary assessment and user feedback. Another study reviewed nutritional tracking mobile applications specifically for diabetes patients (Darby et al., 2016). Rivera et al. (2016) characterized the use of evidence-based methods, the participation of health care experts, and the clinical assessment of commercial smartphone applications for weight loss or weight control. In this study, we evaluated the apps from three commercial app stores -Google Play, Apple App Store, and the Microsoft Store -to evaluate food consumption tracking and recommendation apps for all users, not just diabetes patients, pregnant women, or children. To the best of our knowledge, no research has thoroughly examined the current commercial mobile app market landscape to analyze and scientifically evaluate apps linked to food consumption tracking and recommendations. The speedy growth of such apps in the app stores,-and the fast acceptance of these apps by the general population necessitates an assessment of this rapidly expanding market.
In this study, we have conducted a critical review of food consumption tracking and recommendation apps accessible in the three major commercial app stores (i.e., Google Play Store, Apple App Store, and Microsoft Store). We found a total of 473 apps in our initial search; after excluding the apps based on our exclusion criteria, we finally selected 80 apps for our study. We devise an app rating tool by adopting and extending the existing app rating tools to assess those selected apps using three raters. The rating tool and the rating quality of raters are examined through internal consistency, and inter-and intra-rater reliability, respectively. We also analysed the user comments from app stores to better understand users' expectations and perspectives. We also discuss the limitations of the reviewed apps and potential design considerations from the perspectives of both developers and researchers.
The rest of the paper is organized as follows. Section 2 describes the methodology of our work, including the app search procedure, the measures used in app selection, and our devised app rating tool. In Section 3, we present the results of our study that include the overall assessment of the apps, internal consistency of our rating tool, intraand inter-rater reliability, analysis of app store ratings and our measured ratings, assessment of functionality criteria, and analysis of users' comments from app stores. In Section 4, principal findings (including the limitations of the reviewed apps and design considerations) and the limitations of this study are discussed. Finally, Section 5 concludes the paper and outlines future research directions.

App search procedure
We have performed an electronic search to identify the relevant apps from three major commercial app stores, i.e., Google Play Store, Apple App Store, and Microsoft Store. Following similar approaches used in previous studies (Rivera et al., 2016;Kabir et al., 2021), a keyword-based search process was used. The guidelines for Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) (Tricco et al., 2018) were followed to ensure transparency and clarity in reporting, as well as the ability for other researchers to replicate the search process. The keywords used in the search were specifically selected by studying the names of several prominent food consumption tracking and recommendation apps, so that the search would yield the same result if the same keywords were used at the same time and from the same location (Stawarz et al., 2015). "Food consumption", "calorie consumption", "daily food consumption", "nutrition consumption", and "track food consumption" are the keywords used for searching.
The investigators worked together to conduct the search, screening, and final inclusion process. All three app stores were searched using the same set of keywords to minimize variances and maintain uniformity. Three investigators independently conducted the same search using the same terms numerous times before compiling the final list of food consumption tracking and recommendation applications for inclusion. For each app store they searched, each investigator created their list of apps. They screened the apps using the inclusion and exclusion criteria (described in Section 2.3). This search and selection process was carried out by each investigator using their smartphone. The investigators' different lists were combined to create the final app list to be examined and analyzed for this study.
Conflicts between the lists were resolved by a group discussion among all the investigators.

Raters
Expert raters were selected to rate all the apps. They included three final year Bachelor of Computer Science students with two years of mobile application development experience. Also, two computer science graduates with two years' of mobile application development experience rated three apps named "Weight Loss Coach & Calorie Counter -Nutright", "Foodzilla! Nutrition Assistant, Food Diary, Recipe" and "Fitatu Calorie Counter -Free Weight Loss Tracker" for measuring internal consistency. All the apps on the investigators' final app list were rated separately by the raters. Their responses were collected in a response form (Google Forms), and the data from the spreadsheet attached to the form was used to extract the data from the raters.

Measures used in apps' selection
The methodology we used for the identification, screening, eligibility, and selection of apps is shown in Figure 1.
During the search process, a keyword-based search technique was used on three app stores separately, yielding 473  apps. Eight apps were removed because they had identical apps from the same developer or publisher in multiple app stores (a duplicated app). Before being eliminated, these apps were tested on their respective platforms (Android, iOS, and Windows) to see if they had the same functionality. The remaining 465 apps were screened based on their title and description. After that, the apps were chosen based on their description in the app store. This was the first stage of the screening process. If the app description stated that it tracked food consumption or provided food recommendations, it was included in the study. For further curation, we considered the following inclusion criteria: (1) apps that can track food consumption, (2) apps that compute food portion size and estimate nutritional values, (3) apps that present consumption history visually, (4) apps that recommend food to the users. Also, we looked for apps that allow users to contribute new food items' names to their database. These criteria were applied to the selected apps to ensure that they met our study's requirements. If any of these features were contained in an app, we took those apps and included them in to our apps list. From the primary screening of the apps, apps were excluded for one or several of the following reasons: (1) apps that had functionality such as water consumption tracking and eBook related, (2) apps that were solely focused on pregnancy and baby food, diabetes, fitness and exercise, (3) food-related games, (4) apps in non-English languages, (5) apps that are not relevant to our study like fasting related, food photo sharing related, health tips related, and only recipe suggesting related, Sugar trackers, step counters, blood test guides, protein trackers, and wine consumption trackers were also excluded. In the secondary screening, all 81 remaining apps were downloaded and evaluated by each rater individually. One app was removed at this point because it was malfunctioning. In the end, 80 apps (70 from the Google Play store, 6 from the Microsoft Store and 4 from the Apple App store) selected for this study were analyzed and reviewed.

App rating tool
We have devised a rating tool for evaluating the selected apps and determining their appropriateness and usability.
We reviewed research on software quality aspects such as usability, reliability, functionality, and efficiency (Koepp et al., 2020;Poon & Friesen, 2015;Friesen et al., 2013;Vos-Draper, 2013). Our goal was to build a rating tool by adopting and extending the existing rating tools such as the mobile application rating scale (MARS) (Stoyanov et al., 2015), uMARS -end-user version of MARS (Stoyanov et al., 2016), FinMARS -MARS for financial apps (Huebner et al., 2019), a mobile app rating tool for foot measurement , and a mobile app rating tool for child sexual abuse education (Pritha et al., 2021). Our developed rating tool adopts the relevant evaluation features suited for analyzing food consumption tracking and recommendations. The finalized rating tool with the updated sub-scales and their respective criteria have been portrayed in Appendix A.
We have devised the app rating tool by clustering the domains according to app quality criteria. We used a Likert scale which is a popular instrument (Wu, 2007) ranging from 1 to 5 representing very bad to very good, respectively.
For example, if an app can recognise food items from photos and also from an app databas, then we consider it as the highest quality feature and we rate the app as 5 for this feature. If an app can recognise a food item from a photo but not from a database, we rated the app as 4. We rated the app as 3 when it can recognise food from barcode scanning, and 2 when an app can recognise food from a database or allows manual entry. Lastly, if an app cannot recognise food by any means, we evaluated the app as 1. We applied this evaluation technique for every question of our food app rating tool.
We also added a rating option labeled "Unknown" because we were facing problems accessing certain information.
For example, we could not find whether an app was open source or not, so we labeled those apps' sources as unknown.
We used descriptive answers for app metadata items. These items are store name, app name, app rating, developer name, applicable age group items, and app sub-category items. But, we used the Likert scale for the rest of the questions. In the following subsections, we describe all the sub-scales of our app rating tool.

App metadata
Metadata is data that provides information about other data. App metadata has been clustered with the general information of the apps which were gathered from the respective app stores. Table 1 reports metadata of our reviewed apps such as platform, country of origin, business model (free/paid), app rating, and number of downloads.

App category
All the included apps were divided into sub-categories like nutrition tracker, calorie tracker, food tracker, diet, fitness, and others, focusing on their main aim and functionalities (see Figure 2). In the sub-category of nutrition tracker, the focus of the apps was tracking nutrition but some also tracked calorie consumption. The nutrition tracker sub-category contained 28.75% (23/80) of the total apps. However, in the calorie tracker sub-category, 25 out of 80 apps (31.25%) only tracked calorie consumption. Food tracker apps were focused on tracking food the names only.
They merely tracked nutrition or calorie consumption. Only 10% (8/80) of the total apps were listed in the food tracker sub-category. Our evaluation procedure, found some apps that can track nutrition or calorie consumption, but they mainly focused on suggesting a diet plan for users. We considered these apps in the diet sub-category; 18.75% (15/80) of the apps were in this category(e.g., "Keto Manager: Keto Diet Tracker & Carb Counter" app). Also, we found 4 out of 80 apps focused on improving their users' fitness (5%) by tracking exercise, suggesting workout routines, and so on. However, those apps could also track calories or nutrition consumption (e.g., "Fitstyle -Home Workout, Fitness & Diet Plan"). We considered 5 out of the 80 apps (6.25%) in the others category because most of them focused on multiple features, and some of these features matched our key features. For example, the app "MealLogger-Photo Food Journal", is like social media for health-conscious people. Another example is "Health Mate -Calorie Counter & Weight Loss", which can track heart rate, sleep routine, shows food insights, and so on. Most apps were in the calorie tracker category, followed by. The nutrition tracker and diet plan focused app categories. The fitness focused,and diet plan focused apps could also track calorie/nutrition/food, and show food consumption history.

Aesthetics
Visual appeal is one of the key factors for the success of any app. This sub-scale criteria is equally important as the core functionalities and performance of an app (Chetrari, 2017). There are many apps in app stores that have similar functionalities, so visual appeal is often the main difference between them. In today's competitive marketplace, whether an app will be successful or not, is widely dependant on the layout and organisation of the user interface components integrated into the app. This tendency is also seen in food consumption tracking and recommendation apps, where the visual outlook, and a distinct and organised layout determine the marketability of the apps. In this regard, we have considered some aspects for evaluating the aesthetics of an app including -layout consistency and readability, content resolution, visual appeal, and group targeting according to app content.

General features
General features (such as data sharing options, sign-up, etc.) are crucial for enhancing the user experience of food consumption tracking and recommendation apps. The log in or sign-up feature is important to preserve the users' food consumption history data in case of users changing devices. Features such as data export and share options are crucial for the users to use the data for other purposes (e.g., share with a nutritionist/dietitian). Sending regular notifications is also considered an important feature because this reminds users when to consume food. Furthermore, tutorial or on-boarding facilities are also considered a desirable feature since they demonstrate the operations the app.
The relevance of content customisation and the amount of visual information provided in apps have also been noted in recent years for boosting an app's user value. Therefore, they were also included in the general features. Moreover, a subscription package is also regarded as a useful factor because it can help support the development of a better user experience.

Performance and efficiency
One of the key features contributing to an app's acceptance to users is its efficiency and performance. Efficiency relates to, how fast an app functions and gives results on a device. The performance of an app includes battery life, device heating, and so on. However, the performance metrics may differ depending on the mobile device hardware configuration. These measures play a crucial role in an app's characterisation and have been included as a sub-scale for rating food consumption tracking and recommendation apps.

Usability
The usability of mobile apps has become a significant issue because many software products currently running on smartphones previously ran on desktops and laptops (Hussain et al., 2017). Users do not like apps that have a poor standard of usability and lack appropriate user-centered design. It is vital to test the usability of food consumption tracking and recommendation apps to identify whether they have sufficient characteristics to capture the interest of their target user groups. Nowadays, the attention of app users can be divided into two ways -one is through interaction with the app, and the other is with the environment (Kallio et al., 2005). Navigation and ease-of-use are also crucial indicators of app usefulness. The screen sequences in an app guide users through the various views, allowing them to receive the needed information from the app (Georgieva et al., 2011). Because of the differences in user behavior and user experience, the usability of an app in real life differs from the usability of an app in laboratory settings (Kallio et al., 2005). To asses the usability of the apps, we have focused on the following criteria: (i) the app can be used quickly and efficiently, (ii) the app's navigation activity is not disturbed, (iii) the gestural design and screen links (e.g., navigation panels buttons, arrows, etc.) are needed to be reconcilable throughout all the app pages, (iv) apps should provide an engaging experience by encouraging user input and providing feedback as appropriate.

Functionality
In a food consumption tracking and recommendation app, the functionalities provided by the apps posses significant importance. The consideration of potential utility between two apps is determined by the integrated app functionalities. Several core functionalities are directly or indirectly involved with food consumption tracking and recommendations. These are food recognition, volume estimation, nutrition estimation, visualization of food consumption history, ability to add new food, and a food recommendation system. Table 2 summarizes the definition of the rating scores used to measure functionalities. The key functionality we looked for is whether an app could recognize a food item. Novel food recognition systems for dietary evaluation have been enabled by the increasing use of smartphones and the advancement of artificial intelligence and computer vision technologies. Various studies have already been conducted in this domain (Zheng et al., 2017;He et al., 2015;Ravì et al., 2015;Nayak et al., 2020;Chopra & Purwar, 2021). The first thing users need to do in food consumption tracking and recommendation apps is track their food consumption. Thus, it is essential for an app to allow its users to input food details. Otherwise, the app would not be able to track users' food consumption history and provide recommendation accordingly. The functionality of adding food details(also called "food recognition") can vary from automatic recognition from a food photo (taken by the user), scanning the barcode on a food packet, or manually entering information into the app or selecting from a database. Recognizing food from images is the most advanced technology in the field of food recognition. There are some recent studies on food recognition systems based on image recognition (Ming et al., 2018;Aguilar et al., 2019;Knez &Šajn, 2020;Liu et al., 2021;Mezgec & Koroušić Seljak, 2017;Jiang et al., 2020).
Another functionality is volume estimation of users' consumed food. It is crucial to calculate food portion size or food volume to determine users' nutrition intake (Min et al., 2019). There are numerous studies that have been conducted to compute food volume size automatically (Tahir & Loo, 2021;Dehais et al., 2016;Fang et al., 2018;Yang et al., 2019;Tay et al., 2020). Furthermore, volume estimation is essential to compute food nutritional value, which is an importnat predictor of immunological responses (Chandra, 1997). Some studies have focused on estimating nutritional value from food images using artificial intelligence (Kirk et al., 2021;Zhang et al., 2015;Meyers et al., 2015;Pouladzadeh et al., 2014;Boland & Bronlund, 2019;Michel & Burbidge, 2019).
Studies (Mezgec & Koroušić Seljak, 2017;Meyers et al., 2015;Boland & Bronlund, 2019;Liu et al., 2021) show that the most advanced technique to recognize food items, estimate their volume and nutritional value is automatic detection from an image. Automatic detection is considered good for these three aspects. So, in our modified rating scale, a rating of 5 is given to those apps that can recognize food, estimate its volume and nutritional value automatically from image and also provide the flexibility to users to make the apps recognize food items manually. In this rating scale, manual food recognition means users must manually enter the food item's name and search for it from the respective app databases. A rating 4 is given to those apps that can provide these functionalities through automatic image recognition. Furthermore, barcodes are already widely employed in industries and commercial sectors such as transportation, technology, food production and so on (Sriram et al., 1996). Hence, this feature was given a rating of 3 for food recognition where users' need to scan the barcode of a food packet to recognize it. But barcode scanning is not suitable for all types of food, as many food items do not come with a barcode (e.g., unpacked food). Besides, in the case of portion and nutrition estimation, the barcode scanning feature is also given a rating of 3. Moreover, barcode scanning has some drawbacks-foods are not consumed the way they are packed in a package. After being cooked, the nutritional value will change. It is impossible to accurately estimate food volume as user may not consume the exact amount of food present in the package. On the other hand, the barcode feature relieves users from manually inputting food data. Hence, a rating of 3 does justice to this feature. In the manual system, where the only way apps can support these functionalities is through user input, app were given a rating of 2. If an app is unable to support these functionalities, it is rated as 1.
Increasing a person's knowledge and understanding of their eating patterns and motives for eating may aid in beneficial dietary changes. Besides, data visualization may be useful in determining correlations between dietary and behavioral aspects (Hingle et al., 2013). That is why, visualization of consumption history is another significant criterion of measuring functionalities. It helps users to visualize their food or calorie consumption history in charts or graphs. This functionality is good for users to see what they are consuming and improve their food habits. If any user needs to show their food consumption habit to a dietitian, this functionality will be the greatest advantage. We rated an app 5 if it provides the user flexibility to see their food consumption history more than three ways, like yearly, monthly, weekly, daily and so on. A rating of 4 is given to those that have this feature in three ways (daily, monthly, weekly). Finally, the apps with a one-way visualization feature are rated 2, and those have with two way visualization are rated 3.
Food recommendation is an excellent way to suggest users eat healthy foods (Wang et al., 2021). A recommendation system uses information from a user's profile and compares it to come up with a list of relevant suggestions (Vivek et al., 2018). Food recommendation is also mandatory when users want to know what they should consume as per their current health status. If an app can suggest appropriate food names, it helps immensely. For this reason, we considered the food recommendation option as another specific functionality of food consumption tracking and recommendation apps. For a user, it is very beneficial to know which type of food they need to consume to fulfill their bodies' nutritional needs. Hence, any app that can automatically suggest foods based on the nutritional is rated as 5, as nutritional components are the most important factors for a balanced diet and good health (Elsweiler et al., 2015).
In another work, the recommendation is made based on calorie count (Ge et al., 2015). Calories are energy, and the number of calories tells us very little about a dish's nutritional content, both in terms of macro-nutrients (protein, carbohydrates, and fat) and micro-nutrients (vitamins, minerals, and phytonutrients like antioxidants). That is why,a rating of 4 is given to those apps that can recommend food to users based on calorie estimation. A rating of 3 was given to the apps that suggest food based on the users' preferences. If a user prefers a meat item or fruit items, the app suggests food from those preferred domains and in this case, health issues are not considered. Finally, the apps that can suggest food items that are generally good for the human body and are not considering any specific factors, such as recommending fruits, vegetables, milk, eggs, and so on, are rated a 2 for the food recommendation function.
There is a wide variety of food worldwide. It is extremely challenging to create a database with all the food names with their nutritional values. Thus, there should be an option for users to add new food items to the apps' food database. We determined that the functionality of adding new food items to the database should be present in a food consumption tracking and recommendation app, as this option can benefit future users of the apps. Apps that allow users to automatically add new food items from their users' community discussion are rated as 5. The databases of these apps can become enriched with numerous entries of food items due to this feature. Apps that allow users to add food items manually, i.e., manually entering a food item's details in the device database, were rated a 3. We found many apps that do not allow the user to add new food items to their database, and we rated these apps 1 in this functionality category.

Transparency
Most mobile apps depend on social and personal information to work properly. Various businesses that profit from customized services commonly target this information (Brug et al., 1995). Often, the app developers or publishers sell private information to third parties without the permission of the users, which violates users' privacy. The apps must follow strict and precise data protection and regulation laws, such as asking users if they consent to their private data being accessed. The apps should clearly state how and why users' data is being collected, even if the users are unaware of the direct effects of such acts. In the case of food consumption tracking and recommendation apps, constraints such as "do not share private data", "considering user consent in case of sharing", and "verification of the developer" should be observed, which will assist users in determining whether the source of the app can be trusted or not. Furthermore, it is a matter of investigation to see if the software can meet the goals indicated in the store description.

Subjective quality
App subjective quality refers to the users' perspectives of the app . We used several metrics to assess the subjective quality of individual applications, including assessing personal app scores, preferring to pay for an app depending on its functionalities, preferring to recommend an app, and reviewing positive and negative feedback about the app. An overview of the app's offerings can be made by looking at the reactions of users who downloaded and used the application. However, this is a subjective viewpoint, and this method of assessing an app's performance prior to download is ineffective for apps with few or no user comments or ratings on the app stores. Nowadays, users tend to comment on the app store with more details and critical points,making it easier for new users to find useful apps easily. Therefore, the subjective quality of apps is an optional but valid criterion to find effective and preferable apps.

Perceived impact of app on users
The impact on a user's perception after using an app can be used to assess the app's potential. There are certain features like awareness, attitude and behavior changes, help-seeking attitude, and so on, to check this potentiality. It is essential to identify whether an app can spread awareness among the users or not. Our desired apps should be able to alert people about health issues or the impact of poor food habits. Another aspect is knowledge enhancing behavior.
Apps should increase a users knowledge about food items. For example, the nutritional value of foods,positive impacts of foods on health and the body. Also, users may learn more about what food they need to avoid or the harmful effects of any food items by gathering knowledge from the app. The main impact of a food consumption tracking app is whether it can change users' attitudes toward improving their diet. The app can play a vital role in encouraging users to consume healthy food, and maintain good food routines. Furthermore, users' approaches to seeking health and food-related help can be perceived as another impact of app on users. Our study also assessed the impact of the app on users to understand the perceived impact of the applications on users, and whether these features were present in the applications.

Inter-rater and intra-rater reliability
Inter-rater reliability is a way of quantifying the level of agreement between two or more raters who rate an item (in the case, an app) independently based on a set of criteria (Lange, 2011). We used the intra-class correlation (ICC) method to assess inter-rater reliability. ICC is one of the most widely used statistics for evaluating inter-rater reliability if a study includes two or more raters (Sawa & Morikawa, 2007). In our study, all apps were rated by the same three raters. Thus, we have used the ICC two-way mixed model as it is recommended when the raters are fixed and each of the apps is rated by all raters (Koo & Li, 2016). Depending on the 95% confidence interim of the ICC estimation, values smaller than 0.5, within 0.5 and 0.75, within 0.75 and 0.9, and higher than 0.90 suggest poor, moderate, good, and excellent reliability, respectively (Koo & Li, 2016). The ICC score of our 80 apps was calculated as 0.90 (95% CI ranging from 0.89 to 0.91), showing a good level of inter-rater reliability.
Intra-rater reliability is estimated to measure how consistent an individual is at measuring a set of criteria. This is a reliability estimation in which the same evaluation is performed by the same rater on more than one occasion.
To measure the intra-rater reliability of the three raters, we randomly selected three apps from our included list of 80 apps. The selected three apps were in three levels of quality (as per their overall rating score): low, average, and high. Those three apps were: "Calorie Counter -MyNetDiary, Food Diary Tracker", "Stupid Simple Marcos IIFYM Tracker", and "Nutrition Tracker". The three raters reviewed these three apps twice in two months. All three raters showed a significant good level of intra-rater reliability between their two ratings; their two-way mixed ICC values were 0.89 (95% CI 0.85-0.93), 0.8 (95% CI 0.72-0.86), and 0.88 (95% CI 0.83-0.92), respectively.

Internal consistency of modified scale
Internal consistency measures the degree of inter-relationships or homogeneity among the items on a test (in our case the questions/items used in a sub-scale/assessment criteria), such that the items are consistent with one another and measuring the same thing (Christmann & Van Aelst, 2006). We have used Cronbach's alpha which is the most popular means of calculating internal consistency (Cronbach, 1951). Cronbach's alpha (α) reliability coefficient indicates internal consistency that ranges between 0 and 1, with 0.9 ≤ α as excellent, 0.8 ≤ α < 0.9 as good, 0.7 ≤ α < 0.8 as acceptable, 0.6 ≤ α < 0.7 as questionable, 0.5 ≤ α < 0.6 as poor, and α < 0.5 as unacceptable (Gliem & Gliem, 2003). The closer the value to 1 the higher the internal consistency. We have randomly chosen three apps -"Weight Loss Coach & Calorie Counter -Nutright", "Foodzilla! Nutrition Assistant, Food Diary, Recipe" and "Fitatu Calorie Counter -Free Weight Loss Tracker" to compute internal consistency. Table 3 reports the internal consistency of the sub-scales of our devised rating scale -aesthetics, performance, usability, subjective quality, transparency and perceived impact. We excluded two sub-scales -general and functionality, as their items are not meant to be collective measures of the construct. The overall internal consistency of our modified scale was high at alpha 0.93, which is as regarded an excellent by prior studies (Ursachi et al., 2015).

Overall assessment of the apps
The sub-scale ratings of all 80 apps with their mean and standard deviation are reported in Table 4. The rating of each sub-scale is computed by taking the mean of the scores of all items in that sub-scale. The scores an app received in different sub-scales were used to calculate its overall mean (and standard deviation), indicating overall quality.
As discussed in Section 2.4.3, we analyze various items such as layout, graphics, visual appeal, and appropriateness for the targeted audiences to measure the aesthetics of an app. In terms of layout, graphics, and appropriateness, more than 90% of the apps are rated above 4 out of 5, and 81.25% (65/80) of the apps are rated above 4 in visual One-fifth of the apps (16/80) do not have the customization feature. In this sub-scale, the app "Health Mate" received the highest score (4.86) and 5 apps (6.25%) rated the lowest score of 1.
While rating the apps, we found that most of the apps scored 4 to 5 the in performance sub-scale, which means they are responsive, components are working well, the apps do not crash, and battery power and memory consumption are reasonable. The app "Nutrition Tracker" is the only app to score 3 in this sub-scale as it had component and feature issues.
In the usability sub-scale, 80% of the apps(64/80) were very easy to use, 77.5% (62/80) had high navigational accuracy, 72.5% (58/80) featured a very good quality of gestural design, and 63.75% (51/80) were rated high in interactivity and user feedback, i.e., scored 5 out of 5 in all these areas. Overall, most of the apps scored high in this sub-scale -83.75% (67/80) of the apps scored between 4 and 5, and the rest, i.e., 16.25% (13/80) scored between 2.5 and 3.75. The lowest rated apps in this sub-scale were "FoodImage" and "WAIE What AM I Eating -v2", which both scored 2.5.
Functionality is an essential sub-scale in our rating scale as it measures the extent (not at all (1), manually (2) to fully automatically (5)) to which an app supports food recognition, food volume estimation, nutritional value estimation, food consumption history/pattern visualization, food recommendations, and the ability to add new food information in the app. In our reviewed apps, 78 out of 80 apps (97.5%) scored below 3 (i.e., average). The app "Foodvisor: Calorie Counter, Food Diary & Diet Plan" (3.67) is one of the two apps that scored three and above. This result indicates that although there are many apps in the market, and significant advances in artificial intelligence and image processing technologies, there is still a lack of smart food computing apps with automatic desired features.
The transparency sub-scale is measured based on an: app's description in the app store, it is credibility (from a legitimate source), evidence, goals, and policy in accessing and sharing user data. Eleven out of 80 apps (13.75%) scored between 4 and 5. Among them four apps "Calorie Counter -MyNetDiary, Food Diary Tracker", "Calorie Counter -MyFitnessPal", "Calorie Counter by Lose It! for Diet & Weight Loss", and "Lifesum -Diet Plan, Macro Calculator & Food Diary" scored the same highest score of 4.40. The two apps that received the lowest score (1.60) were "DietYuk" and "WAIE What Am I Eating -v2".
The subjective quality sub-scale is measured based on an individual's willingness to use, recommend, and pay for the apps, and the overall star rating given by the individual (in this case, the rater). In this sub-scale, only 8 out of 80 apps (10%) scored 4 and higher. The app "WW Weight Watchers Reimagined" scored the highest score of 4.5. The two apps that scored 1 (the lowest) were "DietYuk" and "WAIE What Am I Eating -v2".
The perceived impact on user sub-scale measures the effectiveness of an app in changing users' attitudes toward a balanced diet and healthy life. This has been evaluated based on whether the app provides a diet plan considering an individual's eating habits, community or forum to share information, seek help, etc. Only 7 out of 80 apps (8.75%) scored between 4 and 5; the only app that scored 5 is "WW Weight Watchers Reimagined". On the other hand, five apps (6.25%) scored 1 in this sub-scale.
In addition, we found some other useful features while reviewing those apps. Tracking weight, tracking exercise and steps, and tracking water consumption are the most common features.    Overall, seven out of 80 apps (8.75%) scored mean values between 4 and 5. However, none of those seven apps scored highest in the functionality sub-scale. The app that scored highest in functionality ("Foodvisor: Calorie Counter, Food Diary & Diet Plan") received an overall score of 3.79. Out of these seven top scoring apps, "WW Weight Watchers Reimagined" received the highest score of 4.38. However, this app scored lowest in functionality among its sub-scale scores. Of all the apps, 78.75% (63/80) scored above average (between 3 and 5) and only one ("WAIE What Am I Eating -v2") scored below 2. Figure 3 shows the sub-scale specific scores and the total mean score of all the apps. The total mean score of all the apps was 3.44 out of 5 with a 95% CI ranging from 3.33 to 3.55. Significant discrepancies were found within the sub-scales, most notably functionality, perceived impact, and general features, which received the lowest mean scores of 2.15, 2.74 and 2.86, respectively. Performance, usability, and aesthetics, on the other hand, received the highest mean scores of 4.92, 4.46 and 4.34, respectively. Transparency and subjective quality were two other sub-scales that received average mean scores of 3.05 and 2.99, respectively. In summary, the apps are highly lacking in functionality and very good in performance.

Analysis of app store ratings and our measured ratings
The pearson correlation between app store ratings and our measured ratings is 0.46, which is considered a moderate strength (Liang et al., 2019) between these two values. Besides, our examined apps' standard deviation between the average app store ratings and our measured ratings was 0.49. Given that our rating scale's score is an aggregated mean of the different sub-scale ratings required to determine the consistency and parameters of food consumption tracking applications, this variance is not too low. Figure 4 shows app store ratings and our measured ratings of the selected apps. In Figure 4, we have reported 15 apps out of 80. Here, we have considered the apps with 100+ user ratings in the app store, and randomly selected five apps from each measured rating range (4 to 5, 3 to below 4, 2 to below 3 and 1 to below 2). Figure 4 shows a clear consistency between the app store ratings and our measured ratings. Measured rating App store rating Figure 4: Consistency between app store rating and measured rating

Assessment of functionality sub-scale
We have identified six criteria for measuring the functionality of the food consumption tracking and recommendation apps -food recognition, food portion computing, nutritional value estimation, history visualization, food recommendation, and the ability to add food into their databases, as discussed in Section 2.4.7. Figure 5 presents the results of our functionality assessment of 80 reviewed apps based on the six measurement criteria.
Recognizing the food items from the recorded entries of the users is one of the key criteria for assessing the functionality of food tracking and recommendation apps. Among 80 reviewed apps, three could recognize food items both automatically and manually. These apps are -"Foodzilla! Nutrition Assistant, Food Diary, Recipe", "Bitesnap: Photo Food Tracker & Calorie Counter", and "Foodvisor: Calorie Counter, Food Diary & Diet Plan". Forty-one out of the 80 apps (51.25%) have a barcode feature to recognize food items. Thirty out of the 80 apps (37.5%) need manual input from users, as they do not have any automatic food recognition features.
Another criterion that we have tested in the apps is whether they can compute food volume. Most of the apps cannot compute volume directly from an image, and thus, users have to enter the food volume manually. Among all the reviewed apps, 87.5% (69/80) had the feature of computing food volume manually. We found only one app "Foodzilla! Nutrition Assistant, Food Diary, Recipe" that provides both a manual and an automatic food volume We have also searched for the nutrition value estimation functionality in the reviewed apps. "Foodzilla! Nutrition Assistant, Food Diary, Recipe", and "Bitesnap: Photo Food Tracker and Calorie Counter" are the apps in which users can estimate nutrition value manually and automatically. Besides, we have found only one app, "Foodvisor: Calorie Counter, Food Diary & Diet Plan", that can compute food nutritional value through images. Thirty-eight apps can estimate the nutritional value of 100 gm of food from their database using barcode scanning. In 37.5% of apps(30/80), users are required to input nutritional values manually.
Another major criterion that we have searched in the reviewed apps is food consumption history visualization.
Most of the apps can show the records of user's daily consumption. Some apps can visualize weekly and monthly food consumption, and some can show yearly food consumption history. Only a few apps can visualize any records between a given time range defined by the users. More than half of the apps, 58.75% (47/80), can show the history in only one of the ways mentioned above. Fifteen out of the 80 apps (18.75%) can show the consumption history in two ways, 6.25% (5/80) can portray the consumption record in three ways, and 7.5% (6/80) can visualize the records in more than three ways (e.g, daily, weekly, monthly, yearly).
Recommending foods according to the user's requirements is one of the most important criteria of food consumption tracking and recommendation apps. However, in our study, we have found only one app that can recommend food based on the nutritional consumption of the user, named "WW Weight Watchers Reimagined". Only 3.75% of the apps (3/80) recommend foods that are good for health in general, but they do not consider any individual user requirements the consumption history of nutrition or calories. These three apps are-"Carb Manager: Keto Diet Tracker & Macros Counter", "AI Nutrition Tracker: Macro Diet & Calorie Counter", and "TrackEats".
We have also reviewed the ability to add new food items criterion for apps. As we have witnessed food consumption tracking and recommendation apps are not all enriched with food item databases, so, users cannot find the name of the food they have consumed in most of the apps. In such cases, some apps allow the user to add new food item names and the nutritional value to its database. In the manual process, 71.25% of the apps (57/80) let users add food items. In the case of manually performing this task, some apps allow a new food item to be added to in the global database while others add food items to the local database. We found that some apps (e.g., "Calorie Counter -MyFit-nessPal", "Calorie Counter, Carb Manager & Keto by Freshbit") allow users to add new food names and nutritional information into their global database, which means other users can access that newly added food item information.
On the other hand, some apps (e.g., "Calorie Counter by FatSecret", "Keto.app -Keto diet tracker") allow users to add new food items only into that apps' local database and, not in the main database. Therefore, if users add any new food items, only that specific user can see the added food items in the app. Users have to provide total nutritional values, and sometimes an image or barcode number to add the food into the app's database. The rest of the apps do not allow users to add foods to their database. Table 5 reports the percentage of functionality measurement criteria fulfilled by the apps evaluated from the three app stores. It provides an overall quality rating of all the apps for the different platforms (app stores). The results in Table 5 show that the Google Play store (i.e., Android platform) contains a large number of apps compared to the Apple app and Microsoft stores. It also shows that the food recommendation criterion is rarely present among the 80 apps -only 4 apps could recommend foods to users.

Analysis of user reviews from the app stores
User reviews or comments in the app stores play a vital role in identifying any app's quality (Guzman et al., 2018).
These reviews often provide detailed information about the features of the apps and their pros and cons. Thus, many people rely on users' reviews before downloading an app, as these reviews act as quality indicators of apps (Vasa et al., 2012). In the commercial app world, developers review the positive and negative comments to help improve their apps.
Considering the importance of user review analysis, in this study, we collected users' comments for the apps from their respective app stores. While collecting comments, we classified them into two categories based on the users' ratings for the apps: a comment is classified as "positive" if the commenter rated the app four stars or above; otherwise the comment is classified as "negative" review. We have collected 900 user reviews, among which 55% were positive and the rest were negative. Usually, in positive reviews, users tend to point out the features they liked most in the apps, and how benefited by using the apps. On the contrary, the negative reviews mainly reflect the limitations, faulty features or inaccurate descriptions of the apps.We used a word-cloud for visualizing comments to gain better insights from the users' reviews. Figure 6 depicts the word-cloud of the positive and negative reviews. words. The word "like" is the most frequent in the positive reviews. This word is frequently used along with some other frequent words such as "track", "calories", "exercise", and "intake" because users like these components of the apps (i.e., weekly history visualization, track calorie intake, tracking exercise). Many users commented that they loved the apps for having an option to add food recipes into the database and recommend a diet plan. Therefore, we have identified the words "love", "add", "recipes", "database","recommend", "diet", and "plan" as more recurring words. Users expressed their feelings using words such as "great", "good", and "much" because some apps' designs are very simple and they were easy to use. As a result, we saw the words "simple", "easy", and "use" in the good comments. Users like the apps that have a barcode scanner feature to estimate nutritional values like macros and, protein. Hence the words "barcode", "nutrition", "macros", "protein" are occur frequently in the positive reviews.
Many apps provide some diet plans that can help users to lose weight. Thus, we see "helpful" and "lose" in the users' positive reviews. Some users said that they wanted to recommend the apps to their friends. Some users expressed in the good comments that the apps helped them to set goals and could track their health progress, including tracking their weight. Most of the users said that they liked the free version of most of the apps. For this reason we see the words "free" and "version" in our word-cloud of positive reviews. Figure 6b depicts the most frequently used words in the negative reviews. Here, we witnessed words such as "money", "paid", "refund", "subscription", "waste" and "premium". Users are not satisfied because components do not work properly in the paid apps or the apps having premium subscription packages. For these categories of apps, users felt that spending money on them apps was a total waste and they wanted refund. Frequent ads in the free apps disappoint the users and for this we see words such as "free", "ads", "disappointed" in the word-cloud. In some apps, the barcode scanners do not work properly. Therefore, "barcode", "scan" have occurred in the users' negative reviews. Some users complained about problems they experienced while trying to login from another phone. A few users expressed confusion about the design and layout while they were using the apps. In some comments we noticed problems such as, users becoming frustrated by an app crashing, not letting users change their passwords, login issues when changing devices even though the email address is same, wrong estimations of calories or nutrition, etc. As a result, they uninstalled the apps. That is why, in the word-cloud we see some words like -"frustrating", "useless", "hard", "crashing", "change", "bother", "issue", "password", "email", "wrong", "uninstalled", etc.

Limitations of reviewed apps
We outline the major shortcomings of the reviewed apps for food consumption tracking and recommendation below: (i) Lack of automatic food computation features: In our findings most of the apps are for the tracking user's' consumption of foods. The apps need to recognize the food, estimate the food's portion size, and estimate nutritional values to track food consumption. However, in most of the apps, these tasks are done manually.
Therefore, users must manually input these values to track their daily consumption. We find that only a few apps can recognize food directly from images, but these apps also have some issues. For instance, they first anticipate the ingredients of the food items, which is not always accurate. As a result, calorie and nutrient value estimations performed by the apps are inaccurate. In some cases, we noticed that if the food image of has multiple food items, then the apps focused only on the main item, and the rest of the items were ignored.
For example, in a rice dish with side vegetables and, sauces, most apps only recognize the rice, as it is the main item, with a bigger portion. However, it is not enough to only recognize one food item among so many items. Moreover, in the case of low-quality images, this feature of the apps stops working. We have observed underestimation when it comes to computing food volumes. Most of the apps require manual entry for food volumes; only a few have the option to compute volume, which is often error-prone as well.
(ii) Scarce existing database: We spotted a deficiency in the existing food databases used in the commercial apps.
We found two types of databases: international databas, that contain only international foods common in any country, for example, fruits, meat, fish, and vegetables, and the apps' country-specific food items. In the other database, only region-wise foods like Asian foods or European foods were listed. For this reason, when a user searches foods from the app's databases manually or through barcode scanning, they cannot always find the foods. In the case of region-specific database apps, users from different regions cannot find their countryspecific foods. Also, the apps that only focused on international food items, had a small list of country-specific food items in their databases, so the users could not find their cultural foods. Some of the apps give users an option to add foods. Even though by using this feature the users can add foods in the apps, the newly added foods are confined to that particular device or individual account. Other users cannot access that food item. We have found a few apps that allow users to add food items to their main databases. In these cases, users are asked to send several images of the food product and after a verification process, the apps' server adds the food item into the main database.
(iii) Lack of evidence-based app: Evidence-based strategies are critical for food consumption tracking and recommendation apps. Food directly contributes to human health; thus, any information related to food needs to be verified, as wrong information can cause health risks or diseases. Hence, food consumption tracking and recommendation apps necessitate the involvement of dietitians, health experts, or nutritionists. This involvement raises the level of evidence or integrity of these apps. However, evidence-based strategies mostly were absent in the mobile apps we reviewed. Evidence-based apps are those that have been verified in a scientific peer-reviewed journal. There are only 12 apps that are based on evidence (e.g.,"Lose it!", "My Fitness Pal", and "Lifesum -Diet Plan, Macro Calculator & Food Diary"). When an app suggests a diet plan to a user, the involvement of an expert or dietitian is a must but we rarely found this association in an app. We also searched for apps that can recommend food to users. We found four apps with this recommendation feature, ("Carb Manager: Keto Diet Tracker & Macros Counter", "AI Nutrition Tracker: Macro Diet & Calorie Counter", "AI Nutrition Tracker: Macro Diet & Calorie Counter", and "WW Weight Watchers Reimagined" ), but even in these apps the involvement of a specialist was missing.

Design considerations
We have illustrated some future directions from app developers' and users' perspectives. We have also suggested some functionality improvements based on our findings.
(i) Food computing automation improvement: The recognition of food items, estimation of volume and nutrition from photos should exist in the apps and need to function properly. Various deep learning algorithms have been proposed to recognize food items from images (Min et al., 2019). There has been extensive research in the field of food recognition (Chopra & Purwar, 2021;Nayak et al., 2020;Jiang et al., 2020;Mezgec & Seljak, 2021;Van Asbroeck & Matthys, 2020). Yang et al. (2010) used the semantic texton forest to categorize all image pixels and then extracted the pairwise feature distribution as visual features. Nowadays, CNNs such as AlexNet (Kagaya et al., 2014), GoogLeNet (Wu et al., 2016), Network-In-Networks (NIN) (Tanno et al., 2016), Inception V3 (Hassannejad et al., 2016), ResNet (Ming et al., 2018), and their combinations, are frequently employed for feature extraction in food identification (McAllister et al., 2018;Pandey et al., 2017). In a realworld scenario, there will be many food items present in one image. There are some studies on multi-label ingredient recognition (Bolanos & Radeva, 2016;Chen et al., 2018). Aguilar et al. (2018) introduced a semantic food detection framework that consists of three parts: food segmentation (Chopra & Purwar, 2021), food detection, and semantic food detection. Nevertheless, while reviewing the apps, we observed the absence of automatic food recognition, volume estimation, and food recommendation features in most of the apps. For food volume estimation (Tahir & Loo, 2021), one study proposed a three-stage system to calculate portion sizes using only two images of a dish acquired by mobile devices (Dehais et al., 2016). Besides the CNNs, the generative adversarial networks are also used for food portion estimation (Fang et al., 2018). Kong & Tan (2012) presented a mobile phone-based system, DietCam, which only requires users to take three images or a short video of the meal. Some research have focused on food recommendation systems . During the recommendations, users' food preferences and health requirements should be considered (Wang et al., 2021).
In Phanich et al. (2010), the authors introduced the Food Recommendation System (FRS) for diabetic patients, utilizing food clustering analysis. In the context of nutrition (Kirk et al., 2021) and food features, their system offered the appropriate substitute foods. In Jiang et al. (2019), the authors revealed a personalized health-aware food recommendation system that could recognize the ingredients in market micro-videos, profile users' health status from their social media accounts, and offer personalized healthy foods to users. Hence, as we observed, implementing these automatic features in food consumption tracking and recommendation apps is essential yet challenging for developers. Therefore, developers should focus on these features to implement in future app development.
(ii) Use of enriched food database: Food databases should be enriched to identify any food items. Many food databases are available, like -"Food-101" which is a public dataset consisting of 101,000 images with 101 categories (Bossard et al., 2014), "Food201 segmented dataset" which has 201 food classes consisting of 12,093 data items (Meyers et al., 2015), and "Recipe1M+" which is a new structured database with over one million cooking recipes and 13 million food photos (Marin et al., 2019). However, we have seen that some of the apps' databases contain only international food items like fruit, vegetable, meat or fish. These items are common in all countries. But some apps were country of origin specific. Their database mostly had the food items of the apps' specific country. So, users from other countries find it difficult to use these apps, as there are fewer known food items in the apps' database. We barely found any apps with a highly enriched database with a vast number of food items. As we have seen, the databases used in the presented food consumption tracking and recommendation apps are not very rich. In the future, developers can utilize the datasets mentioned above to improve food consumption tracking and recommendation apps.
(iii) Expert involvement: Dietitians are professionals in diet and nutrition who have had extensive training and experience. Also, they are well-known for providing successful lifestyle strategies for weight management via health behavior consulting (Jensen et al., 2014;Millen et al., 2014). Moreover,the inclusion of health care experts in the development of medical urology apps has been found to have a favorable impact on app downloads, implying that working with health care experts provides users with greater confidence in the apps' safety and legitimacy (Jospe et al., 2015). Additionally, because dietitians use smartphone health applications and other mHealth technology in patient care (Lieffers et al., 2014;Chen et al., 2017;Jospe et al., 2015), their involvement in food consumption tracking and recommendations app is also expected. But unfortunately, the involvement of expert dietitians or nutritionists is insufficient in the existing apps, so the developers should consider increasing the involvement of medical experts in this app domain.
(iv) Improvement of software qualities: Because the user interface connects the customer to the service they require, it is one of the most critical aspects to consider when designing and developing a commercial app (Faghih et al., 2014). The user interface design determines whether a software application will stand or fall. Details of how to navigate the app and its services must be user-friendly; otherwise, the user will be unable to traverse the program (Ross & Gao, 2016). That is why this feature needs to be considered in an app's design. Furthermore, various machine learning methods may be employed for food automation activities so that app performance (total battery life impact, chance of device heating) does not suffer. Transparency is a critical component of every mobile app. No app would be trusted unless it has sufficient credibility (Corral et al., 2014). Thus, the sub-scale of transparency in terms of user consent, the accuracy of the store description, the validity of the source, and the practicality of fulfilling the stated goals, must all be considered. In the case of free apps, advertisements are troublesome for users. Therefore, apps should be made ad-free or show fewer advertisements.

Limitations of this study
A limitation of this study, is that we only evaluated English language apps and did not consider apps that have access restrictions by region. Again, the three raters had three different mobile devices with different operating systems, so some apps worked differently on different devices. Their ratings were also different from each other for some criteria in our app rating scale. In our rating tool, there is a sub-scale named perceived impact on the user that consists of awareness induction behavior, knowledge enhancing behaviour, change of attitude toward improving balanced diet, intention to change, balanced diet related help-seeking behavior, and behaviour change of the users.
As these criteria are mainly qualitative, their evaluation was subjective to the raters. Since our search and evaluation, some of the apps might have been removed from the app stores, or been updated with enhanced functionality. Also, new apps might have been added to the app stores.

Conclusion
In this study, we conducted a critical review of mobile apps from three popular app stores. Our search results identified a total of 473 related apps, from which we selected and evaluated 80 apps using our modified app rating tool. We devised this app rating tool specifically for analyzing food consumption tracking and recommendation apps by adopting and extending existing mobile app rating scales. Using this rating tool, we evaluated the selected 80 apps and analysed and identified their design faults. According to our evaluation, most of the existing mobile apps in the app stores do not meet the essential requirements for correctly tracking food consumption and recommendations.
Although a few apps had some of the expected features, none met all the required functionalities. For most of the apps, tracking information required manual data input. The databases that are used in the apps are not enriched.
We also observed that there are very few evidence-based apps. Because there have been numerous studies about automatic food recognition, food portion size estimates, and nutritional value assessments, these aspects must be included in modern food consumption tracking and recommendation apps. Also, there has been much research on food recommendations but this feature is absent in most of the evaluated apps, that is why this feature needs to be included in future apps. These apps suggest diet plans, recommend foods to users, and estimate nutrient values, so an expert dietitian or nutritionist should be involved in their development. Also, enrichment of the database is required as nowadays multiple food datasets are available. Software qualities (aesthetics, general features, performance, usability) also play a vital role in commercial apps and thus developers need to consider these matters.
Nonetheless, the analysis provided here covers a variety of general quality features and specific functional features that can be used in food consumption tracking and recommendation apps to provide consumers with a realistic and evidence-based experience. Studies show how people use smartphones to improve their fitness and obesity literacy, as well as the overall status of the commercial product market for food consumption tracking and recommendation apps.
This study will open the door to future researchers who focus on the implementation, effectiveness and performance measurement of food computing apps.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Funding Sources
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.