Recommendation Algorithm Based on Probabilistic Matrix Factorization with Adaboost

: A current problem in diet recommendation systems is the matching of food preferences with nutritional requirements, taking into account individual characteristics, such as body weight with individual health conditions, such as diabetes. Current dietary recommendations employ association rules, content-based collaborative filtering, and constraint-based methods, which have several limitations. These limitations are due to the existence of a special user group and an imbalance of non-simple attributes. Making use of traditional dietary recommendation algorithm researches, we combine the Adaboost classifier with probabilistic matrix factorization. We present a personalized diet recommendation algorithm by taking advantage of probabilistic matrix factorization via Adaboost. A probabilistic matrix factorization method extracts the implicit factors between individual food preferences and nutritional characteristics. From this, we can make use of those features with strong influence while discarding those with little influence. After incorporating these changes into our approach, we evaluated our algorithm’s performance. Our results show that our method performed better than others at matching preferred foods with dietary requirements, benefiting user health as a result. The algorithm fully considers the constraint relationship between users’ attributes and nutritional characteristics of foods. Considering many complex factors in our algorithm, the recommended food result set meets both health standards and users’ dietary preferences. A comparison of our algorithm with others demonstrated that our method offers high accuracy and interpretability.


Introduction
The gradual advance of networks accompanied by a surge of information has led to the expansion of recommendation services to users in various fields. These services help users navigate volumes of information in a more precise way [Liu, Meng, Ding et al. (2019); Yin, Ding and Wang (2019)]. Diet recommendations are of particular interest, considering nutritional needs for patients in view of their dietary preferences. Current methods employ association rules, contentbased collaborative filtering, and constraint-based methods. However, these traditional methods have several limitations: (1) it is easy to miss some valuable food information; (2) many particular groups of users are overlooked, such as patients with diabetes, heart disease, obesity, and hypertension; (3) traditional recommendation methods do not balance health requirements with users' acceptance; and (4) the sparsity of food data for those users with special needs leads to low-quality recommendations. The main contribution of our work is researching and addressing the current limitations. In view of the existing research, we propose a combination of probabilistic matrix factorization with Adaboost combining dietary guidelines and users' individualized criteria. Recommendation quality improves based on the user's own attributes and food characteristics. In addition to considering the recommendation results, we classify some implicit factors that contribute to the correlation between dietary preferences and food nutritional content to discover those factors making greater contributions. The association between the clients and foods leads to personal diet recommendations accounting for both health needs and client preferences. Our experimental results show that the proposed algorithm effectively balances nutritional requirements and client preferences compared with the traditional methods, especially for diabetic diets. We divide our study as follows: Section 2 presents some problems relating to the diabetic diet and the inadequacies of traditional diet recommendation methods. Section 3 presents our proposed algorithm. Section 4 presents our experimental results and an analysis of them. Section 5 presents our conclusions.

Basic theory 2.1 Content-based recommendation algorithm
Content-based recommendation models consider users' preferences and food attributes separately. These models then compare historical details of a user's diet with food models [Oh and Kim (2017); Bobadilla, Ortega, Hernando et al. (2013)]. Although the content-based recommendation algorithm, without human involvement (artificial participation), is an unsupervised learning process, it is highly automated and capable of considering complex objects. However, the learning algorithm is unable to assist with newly added users. Hence, content-based recommendation systems have scalability issues.

Association-based recommendation algorithm
The most common representative of association-based recommendation algorithms is the Apriori algorithm [Liu, Yu, Wei et al. (2018); Djenouri and Comuzzi (2017)]. This algorithm generates strong association rules in a frequency set, with rules satisfying the minimum confidence support. Association analysis results in a network that describes data relationships. Association analysis is carried out when the minimum support and confidence threshold parameters are set. As long as the minimum confidence and support do not satisfy the preset threshold, the algorithm performs a pruning process. An artificially set threshold inevitably results in missing some food contributing factors, such as users' dietary preferences during the pruning process. Thus, the extraction rules needed for associationbased recommendation systems are more complicated. This method also does not handle synonyms well, such as product names or recommended content attributes, and its results are not well-individualized.

Collaborative filtering recommendation algorithm
The collaborative filtering recommendation algorithm [Zhang, Yang, Xu et al. (2017); Wen and Shu (2014); Qi, Li and Zhou (2017)] requires the user's ratings or behaviors. After calculating the similarity between users or items according to score records, a nearest neighbor dataset is obtained according to the similarity score. Then, recommendations are made relying on the selection of the target users or items in the nearest neighbor dataset. In the case of complex project attributes, the similarity can no longer be calculated by a user's rating or behavior. In this event, the algorithm does not reassign the weights of attributes, and the attributes of the items also differ in terms of user preferences. Therefore, it is difficult for the collaborative filtering recommendation approach to deal with complex object attributes.

Constraint-based recommendation algorithm
The constraint-based recommendation algorithm [Porcel, Ching-López, Bernabé-Moreno et al. (2017); Brown (2015); Felfernig, Schippel, Leitner et al. (2013)] relies on the project properties, recommending items from the project set that match user's preferences and requirements. To meet both diet criteria and food preferences, the users' dietary structures are constrained rigidly. In fact, if there are conflicting demands or no solution, it may be difficult to address problems using constraint solving. Although the constraint-based recommendation algorithm has improved compared with the traditional methods, the solution is obtained by gradually relaxing constraints surrounding conflicting requirements. While the demand conflicts and null solutions can be avoided, inappropriate foods matched to user's preferences rather than the nutritional need may enter the recommended dataset due to the relaxation of constraints.

Our algorithm 3.1 Problem definition
We define the dietary preferences set for users as U={u1, u2, …, un} and the food attributes set as V={v1, v2, …, vm}, as shown in Tab. 1.
In the characteristic correlation matrix (CCM), the n rows represent the n dietary preference characteristics of a diabetic patient, and the m columns represent the nutritional characteristics of foods [Felfernig, Schippel, Leitner et al. (2013); Xu and Xie (2017)]. If a diabetic patient has a preference on a food, the value of r will be distributed in the matrix. By calculating r, we can find combinations of users and foods satisfying both preferences and requirements.

Probabilistic matrix factorization
The basic assumption of probabilistic matrix factorization [Shi, Zheng and Yang (2017); Wang, Liu, Xia et al. (2017); Ren, Song, E et al. (2017)] is that only a small number of implicit factors contribute to the correlation between a user's preferences and an item's attributes. In the case of the personalized diet recommendation algorithm, there is a small number of implicit factors contributing to the correlation between user's dietary preferences and food characteristics.
We extract the users' diet preferences to form a low-dimensional matrix K M U R × ∈ and the food attributes to form a low-dimensional matrix where K ≤ min {M, N}. We use the inner product of U and V to constitute a "Users-Food Characteristics Correlation" matrix. Given the users' diet preferences characteristic vector Ui and food's feature vector Vi, the correlation between users' preferences and food attributes is quantified as rij, which has the distribution: is a density function of the Gaussian distribution with an average value µ and squared difference 2 σ . If each rij is independent, the conditional probability of observable correlation strength is as follows: where Iij is an indicator function. If a user i has an action on food j, the value of Iij is 1, otherwise it is 0.

Adaboost classifier factor
Adaboost is an iterative algorithm that trains different weak classifiers using the same training set and combines them to form a stronger one [Dinakaran and Thangaiah (2017); Zhang and Yang (2016)]. If the distribution correlation between a user's dietary preference and a food attribute is considered as a weak classifier, the Adaboost algorithm will filter it into a strong classifier.
Using a training dataset of Dm with a weight distribution to learn, the basic classifier is obtained: We then calculate the classification error rate of Gm(x) on the training data set as follows: The error rate em on the training dataset is the sum of sample weights misclassified by Gm(x). We calculate the coefficient of Gm(x), with am representing the importance degree of Gm(x) in the set of final relevance degree rij. Because the algorithm propposed in this paper will combine ( ) and into ( ) in the following description, and shows the importance of ( ) in the final classifier. The purpose is to calculate the weight of each relevance degree in the final relevance degree set to determine whether both nutritional and individual preferences are met. A smaller set of basic relevance levels affects the diabetic patient more in the final set. am is artificially set by consulting attending physicians and nutritionists. As the previous degree of association has been classified according to a certain feature set, there is an indirect effect on users. Namely, it affects users when the feature set of the association degree approaches the threshold. Using the new training set of weight distributions, the calculations during the next iteration become: We increase the relevance class weight of error class Gm(x) for the basic relevance set so that the weighted value decreases correctly. When this happens, we find those samples that are more difficult to classify and thus have no effect on diabetics. Therefore, we can remove those factors. Among them, Zm is a normalization factor introduced by Adaboost that enables Dm+1 to be a probability distribution: Finally, we assign weights to all of the associations and make a classification as well. Again, each category is weighted and classified, which means that each set is combined as follows: The final classification of relevance is then expressed as follows:

Description algorithm
In the probabilistic matrix factorization, each rij is independent, and all of the non-zero rij are classified by an Adaboost classifier. So the final classifier ( ) obtained by Adaboost algorithm is introduced into the probability matrix decomposition namely "Users-Food Characteristics Correlation" matrix, to replace the position of to process , thus ( ) can be generated. Each type of association contributes to users' personalized diet, which corresponds to different indicators. The algorithm balances the weight relationship between personal preferences and nutritional needs. By combining Eqs. (2) with (8), we obtain the following: Our algorithm does not require an artificial threshold set, thus avoiding the loss of valuable data due to pruning. The problem of non-modeling (i.e., null solutions) cannot occur even with the addition of new users as the association between food attributes and user's preferences is bound. At the same time, our algorithm analyzes loosely constrained, multi-attribute problems without empty solutions. Hence, it will always recommend its best results in the existing range available to diabetics, satisfying both nutritional requirements and personal preferences. When the screening and classifying of all of the non-zero rij values are completed, the final set of associations is the upper bound of error. The screening and classifying processes exclude non-compliant relevance values, reducing the error e in the process of elimination via the test: Although there are many different types of algorithms for diet recommendation, the content-based and collaborative filtering algorithms are most common. Specifically, the content-based dietary recommendation requires both basic food attributes (calories, protein, fat, and carbohydrates) and users' dietary records. Collaborative filtering requires data for each edible food score, food weight, and food caloric value. We show the steps for the three algorithms in Algorithms 1-3.

Algorithm 1: Diabetic diet recommendation algorithm based on collaborative filtering
Input: Scoring matrix, caloric value, and weight of food Output: Result set of food for simulation 1) Take K nearest neighbors.
2) Divide the dataset into test and training sets, using 1/m of the data as the test set. Using different K ≤ m−1, obtain different test and training sets under the same random seed condition. 3) Calculate the similarity between foods under simulated conditions and calculate the quantified similarity between vectors ItemA and ItemB to form the similarity matrix. 4) Sort the similarity matrix in the simulation and calculate the degree of interest matrix using K nearest neighbors. 5) Recommend the results according to the interest matrix.

Introduction to the data set
To highlight the accuracy of our algorithm in food recommendation compared with others, we selected a particular group of diabetic users and collected their data. However, diabetic patients can be non-compliant, as they may go to another hospital to receive treatment after leaving one. When this occurred, the patient's data was distributed across multiple hospitals, which led to incomplete data. In addition to collecting data from various hospitals, we interviewed and surveyed many patients. This collation and analysis was intended to make the data as reasonable as possible. For patients, we also analyzed the corresponding food characteristics including quantified measures for taste, texture, calories, fats, proteins, and carbohydrates. We also measured patients' physical indexes, including weight, blood pressure, glycosylated hemoglobin level, plasma glucose level, and working intensity.

Experimental evaluation indicator
Recommendation algorithms are traditionally evaluated according to accuracy rate, recall rate, and coverage rate [Wang, Gong, Li et al. (2016); Han and Yamana (2017)]. However, for the health field, we calculate only the precision rate and diversity:

Comparison of experimental methods
We compared the content-based diet recommendation algorithm (CB), the algorithm using collaborative filtering and our algorithm for analysis (CF). We varied the number of nearest neighbors K as 20, 40, 60, 80, 100, 120, 140, 160, 180, and 200 to represent an increasing number of users. We also set K as 5, 10, 20, 40, 80, 160, and 320 to compare the diversity of result sets recommended by each algorithm. Because our interest is in diabetic diets, we compared the accuracy of health and personalized indicators of each algorithm. We selected a diabetic patient at random for simulation, with 4 hours as the basic time interval. We compared the recommended food with the food selected by the participating users to calculate the accuracy of the diet's compatibility with user preferences. For nutritional requirements and health indicators, we compared the nutritional content of recommended foods with user needs to calculate accuracy. Then we analyze the relationship between diversity and accuracy to determine their trend. Finally, we tracked test data results. By simulating the physical indexes of users (only plasma glucose and body weight), we compared each algorithm's impact on users' physical indexes.

Experiments and result analysis
We performed time-based random generation to simulate the physical indicators of each diabetic patient and their diet preferences According to the dietary preferences, we modeled the users with the recommended food result set and calculated the diversity of the food result sets. In the individualized dietary areas, the extent to which individual algorithms have been personalized in this study is shown in Figs. 1 and 2 below.  Fig. 1 shows that the food result set recommended by our algorithm was favored by simulated users with a relatively high average score. The content-based diabetic diet recommendation algorithm had a large gap when K=80 due to its inability to handle newly added users.  Fig. 2 shows that the diversity of our algorithm was higher than the others, indicating better support for complex attributes. Our algorithm simulated users' physical indicators and diet preferences, and we analyzed the recommended results comprehensively. The other two algorithms did not perform well with complex properties. shows that the personalized accuracy rate of our algorithm was lower than that of others. This occurred because the diet preferences drive users to choose their favorite ingredients in the selection process, regardless of the ingredients' nutrient proportion and value. We emphasize that the diversity and personalization results from our recommendation algorithm are two different concepts, reflecting different needs. A comparison of Figs. 2 and 3 shows that diversity is the relationship between the recommended food result set and all of the food suitable for diabetics. The individualization is the relationship between users' food preferences and recommendation set. Fig. 4 shows the good therapeutic effect of our algorithm. The other algorithms placed too much emphasis on the patients' dietary preferences at the expense of nutritional suitability.
In this study, we initially set the body weight and plasma glucose level of each patient as 59 kg and 8.5 mmol/L, respectively, in the initial simulation, with patients taking food  shows that although all of the three algorithms reduced the plasma glucose level, our algorithm offered better performance than the other two. By rating the plasma glucose content (excellent 4.4-6.1, good≤7.0, poor>7.0, in mmol/L), we conclude that our algorithm reduced the plasma glucose of participating users to healthier levels. By the end of the second day, a turning point appeared. Our algorithm increased users' body weight and maintained the body weight balance for some time, while the other algorithms only minimized weight loss.

Conclusion
Making use of traditional dietary recommendation algorithm research, we incorporate the Adaboost classifier into our algorithm to strengthen the probabilistic matrix factorization. The algorithm fully considers of the constraint relationship between users' own attributes and nutritional characteristics of foods. Considering many complex factors in our algorithm, the recommended food result set meets both the health standards and users' dietary preferences. Therefore, our algorithm improves personalized diet recommendations, tailoring them to both patient preferences and nutritional needs. We anticipate that improved data collection methods using wireless networks [Wang, Zuo, Shen et al. (2015); Ullah, Abdullah, Kaiwartya et al. (2017)] will further improve the portability and scalability of our algorithm in the future.
Funding Statement: This work was supported in part by the National Natural Science Foundation of China (51679105, 51809112, 51939003, 61872160); "Thirteenth Five Plan" Science and Technology Project of Education Department, Jilin Province (JJKH20200990KJ).

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.