Service Recommendation with High Accuracy and Diversity

,


Introduction
In recent years, web services have developed rapidly and are playing an increasingly important role in E-commerce and virtual reality applications. With the increasing of Internet web services' numbers, people have more access to Internet information anytime and anywhere. However, people need to deal with a large amount of information resources, which makes it difficult for people to quickly find valuable services which they are interested in. In other words, the selection process is complicated in the age of big data [1][2][3][4]. Therefore, precise recommendation of web services is the key issue in service computing. As we all know, the recommender system has been widely used in many applications, such as https:// Amazon.com, https://TiVo.com, and https://Netflix.com [5]. And web service recommendation is a process of actively identifying suitable web services and recommending them to users. The most common method is traditional collaborative filtering [6].
As we all know, collaborative filtering usually explores users' preferences basing on users' historical usage records and then recommends the most appropriate service items to users automatically [7]. However, this method mainly focuses on improving the accuracy of recommendation, which may lead to the redundancy of services in a limited list of top-K recommendations. Worse, the recommendation results may reduce users' satisfaction and are not conducive to exploring users' potential preferences for other services. For example, it is assumed that there is a certain service category with similar or related functions that match the interests of users and has better quality of services than other categories of services. Ordinary service recommendation methods may only recommend this category of services to users in the final recommended list, but from users' points of view, recommendation services with similar functions are redundant, and this phenomenon is called overfitting. Accordingly, the recommender system should also pay attention to the diversity of service recommendations while ensuring a high accuracy of recommendation results. In this manner, other categories of services that users may be interested in can be included in the top-K recommended list [3,8].
Fortunately, diversification methods can not only avoid redundancy but also expand the range of users' choices, which is beneficial to avoid the uncertainty in the prediction of users' preferences [9]. However, there is a trade-off between accuracy and diversity [10] because high accuracy may often be obtained by safely recommending users the most popular and appropriate items, which can clearly lead to the reduction of diversity. And on the contrary, higher diversity can be achieved by trying to uncover and recommend highly idiosyncratic or personalized items with less data for each user, which will be more difficult to predict. And it may lead to the decrease of recommendation accuracy. Therefore, it is crucial for recommender systems to provide an optimal list of recommendations that takes into account both accuracy and diversity and to keep a balance between them [11][12][13][14]. This is also the main research direction of this paper. The main contributions of this paper are listed below: (i) A new web service recommendation method which pays attention to both accuracy and diversity is proposed (ii) Providing users with the list of top-K service recommendations, our method improves the disadvantages of traditional service recommendation methods and effectively solves the problem of overfitting (iii) Our method weighs well the double indicators of accuracy and diversity in order to achieve the best recommendation effect and improve users' satisfaction The remainder of this paper is organized as follows. Section 2 describes a scenario of web service recommendation, and based on that, the main motivation and research content of this paper are further described. Section 3 presents the framework and specific steps of the proposed web service recommendation method (named DivMTID). Section 4 introduces a case study, where a specific case is solved by DivMTID. Section 5 summarizes this paper, draws conclusions, and expounds future work.

Research Scenario and Motivation
In this section, the research scenario and motivation of this paper are described. All the work we have done is based on the research scenario and motivation.

Research Scenario.
Here, we use Figure 1 to describe the research scenario in this paper. Suppose that a website has many different types of modules (entertainment, military, sports, life, finance, cars, games, films, shopping, etc.), and there are many different web services under each module. Assume that there are M web services used by a user under all modules, and they are recorded as WS u1 , WS u2 ,…, WS uM . For each module, they are recorded as WS u1 , WS u2 ,…, WS ux (x is a variable). Meanwhile, there are N candidate web services recorded as WS 1 , WS 2 ,…, WS N in the set of candidate services. And each web service is described by the Web Service Description Language (which is called the WSDL document). In order to describe it exhaustively, the symbols mentioned in this paper and their meanings are shown in Table 1.

Motivation.
In this subsection, we utilize the example in Figure 2 to demonstrate the motivation of our proposal. It is assumed that the recommender system intends to recommend a list of web services to a user. In this condition, to recommend appropriate web services to the user, the similarity between historical web services and candidate web services should be calculated first. And then the system generates the top-K recommended list to the user. However, in the process of similarity calculation and recommendation calculation, we will face the following challenges: When calculating the similarity between historical web services and candidate web services, it is necessary to establish the relationship between historical records and the candidate service set. However, an effective method to predict the relative score of candidate service objects and filter the candidate web services is needed.
As the diversity of the recommended list is frequently neglected, the web services in the list may be similar to each other, which may lead to overfitting and failure to explore users' potential preferences and finally reduce the users' satisfaction.
Considering the above issues, a novel web service recommendation method named DivMTID is proposed, which will achieve the accuracy and diversity of recommendation results, and it will be presented in detail in the following sections.

A Diversified Service Recommendation Method Based on TF-IDF
Under the research scenario of Section 2, this paper proposes a new web service recommendation method named DivM-TID, which is based on the TF-IDF algorithm. It utilizes cosine similarity and combines WSDL documents to calculate the ranking score of each candidate service and then uses the diversity algorithm to select the best web services from candidate services to set the top-K service recommended list. Meanwhile, it takes into account the accuracy and diversity of recommendation results. Table 2 lists the basic framework of DivMTID, which includes four steps.

3.1.
Step 1: Explore Users' Preferences Approximately. In step 1, we first make an approximate positioning of users' 2 Wireless Communications and Mobile Computing preferences according to users' historical score records. In order to give more effectively personalized service recommendations, we need to figure out what users like and why they like it. In other words, using more effective preference representation methods may make recommendation algorithms exhibit higher performance. In most service recommendation methods, a user's score on web service can only represent the user's opinion on a service, but the user's preferences cannot be fully determined by a score record. However, a user's historical score records can be used to make an approximate positioning of the user's preferences. We can use the rating scores of web services to establish correlations with metadata and break the common limitation of expressing preferences with only one score.
For example, under the scenario described in Section 2, if a user rated 5 for all the web services under the module of military and rated 2 for all the web services under the module of finance, then the recommender system should infer that the user prefers the military module and should recommend more candidate web services about the military than finance.
We can establish the correlation between history scores and the information of the metadata module in equation (1), which utilizes score records for web services to calculate a user's preference degree for each module.
In equation (1), M j represents the degree of a user's preference for module j. r i represents a user's historical rating scores for the used web services. n r service-rated represents the number of web services which rated r i under the metadata module j, and n r service-used represents the number of all the used web services by the user under the metadata module j.
We can calculate the user's preference degree for the modules in equation (1) and make an approximate positioning of the user's preference. A threshold "a" is set here, and the module with a calculated result greater than "a" is defined as the user's preference module. For example, in the scenario of Section 2, we set a threshold 3. After calculation, if the modules with a result greater than 3 are military, finance, cars, and shopping, then the top-K recommended list should mainly consist of web services under these modules, which means that the modules below the threshold are automatically filtered out. At last, we put all the web services belonging to the preference modules together to form a set P. The above is the content of step 1, its pseudocode can be described by Algorithm 1. The j-th word in the corpus ω The weight vector of web service CosSim i,j The similarity level of web service i and web service j

Score j
The predicted ranking score of candidate web service j Step 2: Calculate TF-IDF Weight Vectors of Web Services. The task of step1 in DivMTID is to determine users' preferences, filtering out the web services under all modules with low history rating scores. It saves a lot of time for the subsequent recommendation algorithm to run. However, step 1 cannot exactly determine what kind of services users like, what characteristics the web services with high scores have, and how to select the best web services from so many candidate services.
Step 2 is designed to solve these problems. It is assumed that step 1 filtered out L web services together.
As is mentioned, each web service in set P has a corresponding WSDL document, the same as candidate services. Then, all meaningful words in the WSDL documents of all services can form a corpus. After that, a well-known TF-IDF algorithm [8,15] is used to assess the importance of words in the corpus for each web service. The importance is proportional to the number of times that words appear in the document and inversely proportional to the frequency of words appearing in the corpus. The explanation is as follows.
tf represents the word frequency, indicating the frequency of a word appearing in a WSDL document. It can be described in   Step 1: explore users' preferences approximately By establishing the relationship between a user's history score records and the information of the metadata module, the preference degree of each module is calculated, and the user's preferences is approximately explored.
Step 2: calculate TF-IDF weight vectors of web services Using the TF-IDF algorithm, the importance of words in the corpus to web services is calculated and finally represented by the TF-IDF weight vector in order to make a distinction among web services.
Step 3: predict the ranking scores of candidate services The similarity between candidate web services and historically used web services is calculated by using cosine similarity, and the ranking score values of candidate web services are predicted.
Step 4: create a diversified web service recommended list According to different index numbers, K different web services are selected to form multiple recommended lists. Then, it needs to calculate the list-diversity value of each list, and the list with the highest value becomes the web service recommended list that is finally recommended to the user.
n r service-used = count(WS ui ) 3. forr = r min to r max do 4.
n r service-rated * r 6. end for 7.
then add fWS ui | WS ui ∈ jg to P 10.
end if 11.end for 12.return P Algorithm 1: Explore users' preferences approximately. 4 Wireless Communications and Mobile Computing WSDL i Þ represents the number of times that t j appears in the WSDL i document, and |WSDL i | represents the number of words that appear in the WSDL i document. So we can also get the equation idf represents the inverse document frequency. It is expressed by the ratio of the total number of all WSDL documents and the number of documents containing the word. We can calculate the logarithm of the quotient in |WSDL | represents the total number of WSDL documents. And |fWSDL i : t j ∈ WSDL i g | represents the total number of documents containing word t j .
we use TF-IDF to assess the importance of words in a corpus for a web service. If a word appears with high frequency in a WSDL document of a web service and appears with low frequency in other WSDL documents of services, then we suppose that the word has a high importance and representativeness for this web service, which can be used to classify and distinguish different services.
Since WSDL documents are generally short, this paper chooses to give higher weight to the idf value to normalize the inherent bias with The common way to implement TF-IDF is to give the same weight to word frequency and the inverse document frequency. However, this paper gives higher weight to idf in order not only to standardize the inherent deviation of the tf measurement in short documents but also to better exclude the common words that frequently appear in web services in the corpus [16]. In this way, it can improve the classification and differentiation ability among web services and so improve the accuracy of a user's preferences. ω represents the calculation result. It is the TF-IDF weight of word t j to web services, which means the importance of word t j for web services. Utilizing all the words in the corpus, we calculate the TF-IDF weight of a web service by equation (4) to form the weight vector of a certain web service. We candidate the TF-IDF weight vectors of all web services in the set P, denoted as ω i , i = u1, u2, ⋯, uðM − LÞ. Similarly, for all candidate web services, their TF-IDF weight vectors are also calculated and denoted as ω j , j = 1, 2, ⋯, N. The above is

Input:
ω i ,ω j : weight vectors of services. r i : the rating scores. b: the threshold.
then add WS j to Y 9. end if 10.end for 11.returnY Algorithm 3: Predict the ranking scores of candidate services. 5 Wireless Communications and Mobile Computing the content of step 2; its pseudocode can be described by Algorithm 2.

Step 3: Predict the Ranking Scores of Candidate Services.
In order to evaluate the similarity between two web services, we use the TF-IDF weight vector of web services to calculate their cosine similarity [17] and define the similarity level between two web services as CosSim i,j . The reason that we choose cosine similarity to measure the distance between different services is twofold: (1) cosine similarity is not limited to dimension volume; (2) cosine similarity has higher accuracy and is intuitive enough to describe the similarity calculation. The value of CosSim i,j is calculated in In equation (5), |ω i | and |ω j | is the Euclidean length of the weight vector ω i and ω j . Besides, ω i · ω j is their dot product. Cosine similarity can be used to effectively evaluate the similarity degree between two vectors, so we can also evaluate the similarity between two web services. After that, we calculate CosSim i,j of candidate web services by combining each candidate web service and every web service in set P to get their value of cosine similarity in order.
We can get the similarity between the candidate web services and a user's history web services according to the value of CosSim i,j , so that we can calculate the ranking score of each candidate web service (defined as Score j ) in In equation (6), λ is the parameter and r i is users' rating on history web services. The aim of multiplying users' rating and the value ofCosSim i,j is to giveCosSim i,j a different weight. After that, we carry on the accumulation, and we can obtain the ranking score of each candidate service. At last, we sort the score and set a threshold "b." All the candidate web services with a ranking score greater than "b" form a set Y. And the web services in the top-K recommended list are selected from this set. The above is the content of step 3; its pseudocode can be described by Algorithm 3.

Step 4: Create a Diversified Web Service Recommended
List. The purpose of setting threshold "b" is to ensure the accuracy of the top-K recommended list, which is usually recommended to the user by selecting the first K services from high value to low value according to Score j . Although it ensures the high accuracy of the recommendation results, it leads to the decrease of the diversity. Besides, it may cause the problem of overfitting, which is not conducive to exploring the potential preferences of users [18][19][20][21]. Therefore, we need a method which can balance accuracy and diversity.

Input:
Y: set Y. K: the length of recommended list CosSim i,j : the similarity between service i and service j. Output: a diversified web service recommended list 1. f = |Y | //f denotes the number of web services in the set Y 2. Sort(Y) 3. Create indexes for f web services 4. forj = 1 to CK fdo//K < f 5. Form a list with K web services according to different index numbers 6. Calculate list-diversity according to equation (7) 7.end for 8.return the list with the highest list-diversity value Algorithm 4: Create a diversified web service recommended list. Step 4 provides a solution to how to make the recommendations more diverse while ensuring a high accuracy at the same time. First, we set up an index of all candidate web services in the set Y and select K services according to different index numbers to form multiple recommended lists. Then, we define the diversity of web services in recommended lists as the list-diversity and each recommended list's list-diversity is calculated in equation (7). Finally, we select the recommended list with the highest list-diversity value as the top-K recommended list to recommend to users.
The list-diversity means the average dissimilarity between each pair of web services in a recommended list. In equation (7), Y represents the set Y and N = |Y | . CosSim i,j represents the similarity of every two candidate web services in a list. The above is the content of step 4, its pseudocode can be described by Algorithm 4 (set the length of recommended list is K).

Case Study
In order to introduce the specific steps of DivMTID, and also to further illustrate the effectiveness of DivMTID, a case study is provided in this section.
Suppose that there are nine existing modules including entertainment, military, sports, life, finance, cars, games, films, and shopping. We assume that there are five different web services under each module and there are ten candidate web services. A user rated the web services he has used (rating values between 1 and 5, no rating value is recorded as null which equals to 0). Table 3 is the user's history rating records. Now, our work is providing the user with a top-K web service recommended list. We set the threshold "a" to 3.

4.1.
Step 1: Explore Users' Preferences Approximately. We use equation (1) to calculate the user's preference degree for each module and make an approximate positioning of the user's preference. After the calculation, we get the preference degree values M j , and the results are shown in Table 4.
Because we have set the threshold "a" to 3, the modules containing sports, life, and films whose M j greater than 3 are the user's approximate preference modules. The web services under these three modules form a set P.

Step 2: Calculate TF-IDF Weight Vectors of Web Services.
After approximately exploring the user's preferences, we calculate the weight vectors of web services utilizing the WSDL documents of all services in the set P and the WSDL documents of all candidate services. Table 5 shows the WSDL documents of all web services in the set P, and Table 6 shows the WSDL documents of all candidate services.
A corpus containing all meaningful words from the WSDL documents of all services in the set P and the WSDL documents of all candidate services is made (shooting, gymnastics, diving, marriage, cooking, Ang Lee, Hollywood, action movie, video, article, picture, long, short, fast, and  Cooking, cooking, picture Web 3 Shooting, video, short, fast Web 4 Marriage, video, long, slow Web 5 Diving, diving, picture Web 6 Gymnastics, article, long Web 7 Hollywood, picture Web 8 Hollywood, video, short, fast Web 9 Action movie, article Web 10 Shooting, shooting, article, long The films module: ω u1 ! = ð0, 0, 0, 0, 0, 7:01, 0, 0, 0, 0:54, 0, 0, 0, 0, 0Þ, 4.3. Step 3: Predict the Ranking Scores of Candidate Services. According to equation (5), the cosine similarity of the TF-IDF weight vectors is calculated sequentially for each candidate web service with each historically used web service in the set P, and the CosSim i,j value of each candidate service is obtained. Then, the ranking score of each candidate web service is calculated by equation (6), and it is shown in Table 7. We set the threshold "b" to 8 and make all candidate web services with a ranking score higher than 8 form a set Y. It is shown that the web services which are in set Y contain Web 3 , Web 8 , Web 4 , Web 2 , and Web 1 .

4.4.
Step 4: Create a Diversified Web Service Recommended List. Suppose the value of K is 3. Then, we need to build a diversified recommended list containing 3 web services for the user.
Step 4 establishes an index of all candidate web services in the set Y, and three web services are selected according to different index numbers to form multiple recommended lists. The list-diversity of each recommended list is calculated by equation (7). Finally, the recommended list with the highest list-diversity value is selected as the top-3 recommended list recommended to the user. The results are shown in Table 8.
As shown in Table 8, we can see that there are two recommended lists ranked first. If two lists have the same ranking value that indicates the same diversity, we need to consider accuracy to further rank them. In other words, we need to compare the sum of every candidate service's ranking score through Step 3. And the list that has a higher ranking score sum of candidate services is preferred. As a consequence, we choose the list including Web 3 , Web 2 , and Web 1 as the top-3 web service recommended list.

Conclusions and Future Work
This paper presents a new web service recommendation method called DivMTID. This method first uses users' history ratings about web services to approximately explore users' preferences. Second, it uses the TF-IDF algorithm to calculate the weight vectors of each web service. Third, it uses the cosine similarity to calculate the similarity between  DivMTID takes the accuracy and diversity index of web service recommendation into account and achieves high diversity of recommendation results while ensuring high accuracy. It comprehensively balances the influence of accuracy and diversity on recommendation results, avoiding the appearance of recommendation redundancy and solving the problem of overfitting. DivMTID is an effective, accurate, and diverse service recommendation method, which is worth popularizing and using. However, the specific influence of this method in many aspects of the recommender system is not measured. Therefore, in the future work, we will do more experiments about this method's influence on each index of the recommender system.
In addition, we will take the time and space factors into consideration to improve the algorithm from many aspects, such as privacy [22][23][24][25]. We will also further improve the performance and effectiveness of the algorithm [26][27][28] by combining some new approaches such as Blockchain and Edge Computing [29][30][31][32].

Data Availability
Our study does not need any data set. And all the data used to support the findings of this study are included within the article.

Conflicts of Interest
The authors declare that they have no conflicts of interest.