Online Optimization of Collaborative Web Service QoS Prediction Based on Approximate Dynamic Programming

More recently, with the increasing demand of web services on the World Wide Web used in the Internet of Things (IoTs), there has been a growing interest in the study of efficient web service quality evaluation approaches based on prediction strategies to obtain accurate quality-of-service (QoS) values. However, it is obvious that the web service quality changes significantly under the unpredictable network environment. Such changes impose very challenging obstacles to web service QoS prediction. Most of the traditional web service QoS prediction approaches are implemented only using a set of static model parameters with the help of designer's a priori knowledge. Unlike the traditional QoS prediction approaches, our algorithm in this paper is realized by incorporating approximate dynamic programming- (ADP-) based online parameter tuning strategy into the QoS prediction approach. Through online learning and optimization, the proposed approach provides the QoS prediction with automatic parameter tuning capability, and prior knowledge or identification of the prediction model is not required. Therefore, the near-optimal performance of QoS prediction can be achieved. Experimental studies are carried out to demonstrate the effectiveness of the proposed ADP-based prediction approach.


Introduction
Recently, with the increasing presence and adoption of cloud computing, the new idea of "anything as a service (XaaS)" is becoming more and more popular.XaaS enables the consumers to use the software with the form of "Use and Not Have".Therefore, it plays an important role in the applications of Internet of Things (IoTs).However, with the emergence of a huge number of cloud services, it is more and more difficult to choose an appropriate service in accordance with demand from the users.A number of web service composition and web service selection approaches have been proposed.It has led to the development of the service of computing (SOC) [1][2][3][4].
Obviously, only considering the services from the function has been unable to meet the requirements from users.Then, the service recommendation based on nonfunctional indexes (e.g., quality of service (QoS)) has become one of the attractive research fields in SOC [5,6].QoS represents the real user experience of a cloud service.Generally speaking, the QoS data from the user or server includes the availability, response time, throughput, delay, and delay variation and loss.While recommending a service based on the QoS data, one of the biggest problems is that the QoS data we have is not complete [7][8][9][10][11][12].Actually, the QoS values of web services can be collected from the server side or the client side.At the server side, QoS values are usually provided and collected by the service providers.Here, we only focus on the QoS values measured at the client side.Due to the influence of the unpredictable network connections and complex user application environment on the Internet, QoS values vary widely at the client side.Thus, the web service evaluation is conducted for obtaining detailed and accurate QoS values at the client side.However, because there are huge amounts of web services on today's Internet, it will take too much time to evaluate all the web service candidates for service users.Therefore, it might be difficult or even impractical to fulfil the above web service evaluation task at the client side.
To obtain accurate web service QoS values on condition that there are no sufficient service evaluations at the client side, some effective approaches with the help of prediction strategies are widely studied.The traditional QoS prediction algorithms are simple, and they are generally implemented by carrying out the arithmetic average operation for QoS values.Here, they use the average QoS values to predict the unknown QoS values.The major drawback of such methods is that they ignore some personalized factors, which may lead to a low prediction accuracy.In view of it, the collaborative filtering based approaches for making personalized QoS value prediction for the service users have been proposed.Specifically, the collaborative filtering technique is developed to automatically predict QoS values of the current user by collecting information from other similar users or items.In general, the collaborative filtering based QoS prediction approaches can be categorized into three major groups.The first group is the user-based collaborative filtering method using Pearson correlation coefficient (PCC), namely, UPCC.It is a very classical method.The QoS value prediction is implemented by employing similar users [13,14].The second group is the item-based collaborative filtering method using PCC, namely, IPCC.It is widely used in industrial companies (e.g., Amazon).This approach employs similar web services (i.e., items) for the QoS value prediction [15].The third group is the probabilistic matrix factorization (PMF) based collaborative prediction.This idea was proposed by Salakhutdinov and Mnih [16], where the user preference matrix is fitted by a product of two lower-rank matrices.This method may perform well on some large, sparse, and imbalanced data sets.Furthermore, through the use of fusion techniques [17,18], some advanced methods are also developed.For instance, Zheng et al. proposed a neighborhood-integrated matrix factorization (NIMF) approach by fusing the neighborhoodbased and model-based collaborative filtering approaches to improve the prediction accuracy [19].
Although those approaches mentioned above play important roles in applications, there are also some limitations.Most of the QoS prediction approaches depend on designers' a priori knowledge about the prediction model parameters.It is obvious that the information about a prediction model or more specifically precise knowledge about a prediction system is quite difficult to obtain in some practical applications.Then, model parameter identification through experiments is usually needed.However, it is time consuming for some largescale experiments.This practical limitation imposes challenging obstacles to the applicability of those QoS prediction methods.For instance, in [19], a series of experiments were conducted for the purpose of parameter identification used in QoS prediction.But it was computationally intensive for the scale of web service QoS data set.Furthermore, it was not suitable for online parameter tuning under complex network environment on the Internet.
In this paper, an optimal online parameter tuning methodology based on approximate dynamic programming (ADP) is proposed to improve the QoS prediction approach, and the experiment is conducted on a large-scale web service QoS data set that has some characteristics of big data.Because of the fast adaptability and approximation capabilities of neural network (NN) in its model-free reinforcement learning scheme, ADP using NNs is a powerful tool for computing optimal solution of a multistage dynamic decision process while avoiding the "curse of dimensionality" [20][21][22][23].Under the ADP scheme with actor-critic architecture, a critic network is designed for approximating cost function and an action network is designed for generating optimal actions [24].We propose a model-free online ADP learning algorithm for parameter tuning used in collaborative web service QoS prediction.By designing an ADP architecture and defining an appropriate reinforcement signal, the ADP tunes the QoS prediction model parameter optimally to achieve the satisfactory QoS prediction.In addition, for big data applications with large-scale data sets, some available QoS prediction approaches are time consuming in implementing model parameter identification by trial and error with painstakingly handcrafted exhaustion method.In contrast, the ADP-based method has automatic model parameter tuning capability and it may not be very computationally intensive for the scale of web service QoS data set.In this way, the ADP-based method may be a considerable alternative to big data analytics in this case.
Different from the traditional QoS prediction approaches, our methodology has the following advantages: (1) the optimal parameter tuner used in prediction approach can be implemented online; (2) no prior knowledge or identification of the prediction model is required; (3) the near-optimal performance may be achieved while the parameters of prediction model can adapt to the changes of network environment; and (4) the proposed structure and the associated algorithm can be also extended to applications for other types of QoS prediction model.This paper is organized as follows.In Section 2, we introduce a collaborative QoS prediction framework using PMF model.An ADP-based parameter tuner for QoS prediction model is proposed in Section 3. Experiment results are presented in Section 4. Conclusion is provided in Section 5.

QoS Prediction Model and Parameter Analysis
2.1.Collaborative QoS Prediction Model.A collaborative QoS prediction framework is shown in Figure 1, where the service users are encouraged to share their individually observed past web service QoS information.In this collaborative framework, a service user will obtain the QoS prediction service from the centralized server only if he/she contributes some QoS values.Meanwhile, more web service QoS values are contributed by a service user; more user features can then be mined from those contributed data.In this way, higher QoS value prediction accuracy can therefore be achieved.It is the essence of this collaborative framework [19].Furthermore, after providing the local QoS values to the server, the service users can get the prediction results via the following three phases.In the first phase, the system calculates the users' similarities using PCC and determines a set of top- similar users (i.e., neighbors) for the current user.In the second phase, by employing those neighbors' information mentioned above, the system designs a collaborative filtering model to predict the missing web service QoS values in the user-item matrix, where each element in this matrix is the value of a certain QoS property of a web observed by a service.In the last phase, an ADP-based parameter tuner is proposed to adjust the key parameters of the above collaborative filtering model to find out the nearoptimal results.Details of this phase are presented in the next section.
There is an  ×  user-item matrix , where  and  are the number of service users and web services, respectively.Here, we use algorithm PCC to compute the similarities between different service users.Generally, because PCC considers the differences in the user value style, it can achieve high accuracy [13].According to the computation approach of PCC, the similarity between two service users  and  can be expressed as where  is the subset of web services that are invoked by both service user  and user .And   is the element in the matrix , which is the value of a certain client-side QoS property of web service  observed by service user .In addition,   and   are the average QoS values of different web services observed by service users  and , respectively.With those PCC values computed in (1), we can identify a set of top- similar users.For a service user , a set of similar users () is defined as [19] where top-() represents a set of top- similar users for the user .
Then, a neighborhood-integrated matrix factorization approach is employed to make prediction in the user-item matrix [19].For an  ×  user-item matrix , it is fitted by a matrix  =    using the matrix factorization algorithm, where  ∈ R × ,  ∈ R × , and  represents the rank of the matrix  [16].With this factorization of the matrix , the web service QoS values from a user can be predicted by making a linear combination of the factor vectors in .Actually, each column of  can be regarded as a linear predictor for a web service.Therefore, the prediction can be implemented by minimizing the following sum-of-squared-errors objective function  with quadratic regularization terms [19]: where  ∈ [0, 1] is a weight coefficient designed to balance the usages of information from user and from the user's neighbors,   and   are two parameters used to avoid overfitting, and ‖ ⋅ ‖ represents the Frobenius norm.In addition,   represents the normalized similarity between users  and , where it is defined as By applying a gradient descent rule for   and   , the objective function (, ) is minimized and a local minimum can be found.

Parameter Analysis.
In function (3), the weight coefficient  bounded within [0, 1] determines how much the objective function relies on the users themselves and their similar users, where the QoS prediction can be made only by using the information from the current user when  = 1, and the missing QoS value is predicted purely by using the information from those similar users (i.e., the current user's neighbors) when  = 0.
In addition, the top- value determines the number of similar users employed in collaborative QoS prediction.Generally speaking, a too large top- value may hurt the prediction accuracy, because it means that some dissimilar users may be involved in the prediction computation.Meanwhile, a too small top- value may be often computationally debilitating.
Therefore, an appropriate  and a top- value may help to improve the QoS prediction accuracy with low computational complexity.Here, we design an ADP-based parameter tuner which aims to find the satisfactory parameters  and top- through online optimization.

ADP-Based Parameter Tuner
3.1.System Architecture.For this QoS prediction problem, those available collaborative filtering approaches (e.g., [19]) are developed in accordance with the designer's experience for parameter setting used in system model.And a set of static parameters lack adaptability.Thus, for an unpredictable network environment, the prediction performance is unsatisfactory only by using predefined static parameters.
The QoS prediction framework using our ADP-based parameter tuner is illustrated in Figure 2, which aims to address the above issue in the design of QoS prediction model.Here, the proposed ADP-based tuner is employed to update two key parameters in the QoS prediction model, that is,  and top- value.
In Figure 2 (5) Moreover, the ADP-based parameter tuner in Figure 2 is implemented in accordance with actor-critic architecture of ADP [21].It is illustrated in Figure 3, where the solid lines are signal flow and the dashed lines are the paths for NNs' parameter tuning.As mentioned above, there are two NNs in the ADP-based tuner.One is designed as an action network to take the system state () as input to output the control

Action network
Critic network signal () at time .And the other is designed as a critic network to take () and () as inputs to generate ().The discount factor is  ∈ (0, 1).In the ADP-based tuner, () is a reinforcement signal which is provided from the QoS prediction model.Specifically, for the prediction problem, the reinforcement signal can be defined as a function of the measure of prediction results at time  (e.g., mean absolute error (MAE)).A small MAE will be encouraged.Actually, () is generated to indicate effectiveness of the control () and speed up the convergence.
Considering the reinforcement signal () as an instantaneous cost function, the cost function can be defined as The optimal parameter tuning problem is to find a control policy to minimize the cost function.So the optimal cost function can be defined as { ( + 1) +  * ( + 1)} .
Here, the output of the critic network (i.e., ()) is to approximate the cost function ().When the critic network is well trained, () should satisfy the equation ( − 1) = () + () or () − [( − 1) − ()] = 0 in accordance with (7).And the output of the action network (i.e., ()) is to minimize the difference between the approximate function () and the desired objective, denoted by   () which is set to 0 without loss of generality in this paper.Under this actor-critic architecture with convergence guarantees, a nearoptimal control policy is achieved.
In addition, the regulatory signals are generated by using modulation transform for () ≜ ( 1 (),  2 ()) as follows: where  1 ,  2 ∈ R + are modulation factors.Then, we provide the implementation details for our ADP-based parameter tuner.

The Action Network and the Critic Network.
In our implementation, the feedforward NN with one hidden layer is used to design the action network and the critic network.The action network is illustrated in Figure 4, where ( 1 (),  2 ()) = ((), ()) are the inputs, ( 1 (),  2 ()) are the outputs,  (1)    is the weight connecting the th input node to the th hidden node, and  (2)    is the weight connecting the th hidden node to the th output node.In addition, V() is the input vector of the output nodes and ℎ() and () are the input and output vectors of the hidden nodes, respectively.Let (⋅) be a hyperbolic tangent threshold function used in the hidden layer, and it can be defined as In the action network, the approximate error and the objective function to be minimized are defined as The critic network is illustrated in Figure 5, where  (1)   is the weight connecting the th input node to the th hidden node and  (2)   is the weight connecting the th hidden node to the output node.Moreover, () and () are the input and output vectors of the hidden nodes, respectively.Similarly, the error and the objective function to be minimized can be expressed as follows:

Weight Online Learning Rules.
In our proposed online learning and optimization strategy, the feature of not requiring prior knowledge means that the action network and the critic network in ADP-based parameter tuner can both be randomly initialized toward their network weights in the initialization phase of collaborative QoS prediction.Once a system state is observed, an action will be subsequently produced through the computation for the combination of those weights in the action network.A good action will be encouraged, while a bad action will be punished.
The weight online learning rule for the action network is a gradient-based adaptation designed as where   () is the weight vector in the action network and   () > 0 represents the learning rate of the action network at time .
Here, the output () can be written as International Journal of Distributed Sensor Networks where  ℎ is the number of hidden nodes in the action network.
The weight online learning rule for the critic network is also a gradient-based adaptation designed as where   () is the weight vector in the critic network and   () > 0 represents the learning rate of the critic network at time .
Then, the output () is written as where  ℎ is the number of hidden nodes in the critic network.
In the implementation of ADP-based tuner, it should be noted that the weight normalization is performed in both networks as follows: Finally, the ADP-based parameter tuner algorithm is summarized in Algorithm 1.Here,   and   are the internal cycles of the critic network and the action network, respectively.In addition,   and   are the internal training error thresholds for the critic network and the action network, respectively.

Data Set Description.
To test the performance of our approach in real-world, we use a common data set collected from 339 service users (in 30 countries) on 5,825 web services (in 73 countries) [25,26].This experiment focuses on the investigation for the response time of different web services and service users.Response time is one of the representative QoS values, which is defined as the time duration between sending a request and receiving a corresponding response for a service user.
In this experiment, there is a total of 1,974,675 realworld web service invocation data.After processing those web service invocation data, there is a 339 × 5,825 useritem matrix, where each element in this matrix represents the response time value observed by a user on a web service [19].

Algorithm Settings.
The parameters used in the experiments are summarized in Table 1, where the notations are defined as follows: (0): initial learning rate of the action network,   (0): initial learning rate of the critic network,   (): learning rate of the action network at time  which is decreased by 0.05 every 5 time steps until it reaches   () and stays thereafter,   (): learning rate of the critic network at time  which is decreased by 0.05 every 5 time steps until it reaches   () and stays thereafter, Here, the weights in the action and the critic networks are trained using their internal cycles or their internal training error threshold.

Experiment Results for Case 1 4.3.1. Statement of Evaluation.
In this experiment, the mean absolute error (MAE) and root-mean-squared error (RMSE) metrics are used to measure the prediction quality of our approach in the comparative study.MAE is defined by where R represents the predicted QoS value of web service  observed by service user  and  is the number of predicted values.And RMSE is defined by Here, the MAE is to measure how close predicted QoS values are to the observed QoS values.Compared to the MAE, the RMSE provides an effective way to severely punish large errors.
After 170 time steps,  and  almost reach steady states, which indicates that the learning process of ADP has converged and near-optimal  and  are obtained.Moreover, to evaluate the prediction accuracy, we compare our approach with the following approaches: (i) UPCC (user-based collaborative filtering method using PCC): it only employs similar users for the QoS value prediction [13,14]; (ii) IPCC (item-based collaborative filtering method using PCC): it only employs similar web services for the QoS value prediction [15]; (iii) UIPCC: it employs both similar users and similar web services for the QoS value prediction [27]; (iv) NMF (nonnegative matrix factorization): it is also a collaborative filtering method based on matrix factorization; unlike other matrix factorization methods, it enforces the constraint that the factorized factors must be nonnegative [28]; (v) NIMF (neighborhood-integrated matrix factorization): it fuses the neighborhood-based and modelbased collaborative filtering approaches for the QoS value prediction [19].
The experimental results are shown in Table 2. From Table 2, we can observe that our method can almost obtain better prediction accuracy (with smaller MAE and RMSE values) than UPCC, IPCC, UIPCC, and NMF methods for response time.But the performance of our method is slightly poorer than the NIMF method, which is due to the fact that the NIMF method can get the globally optimal solutions of  and  through global search in [19], while our method obtains the near-optimal solutions through online parameter tuning methodology based on ADP without any designers' a priori knowledge.However, the NIMF method needs to conduct an exhaustive search for the optimal parameters mindlessly in a large parameters space by trial and error; our method can automatically tune the parameters through online learning and optimization.In this regard, our method may reduce the computing time and improve the computational efficiency for prediction implementation.Moreover, when the matrix density is 20%, the MAE and RMSE values are smaller than those values when the matrix density is 10%, since denser matrix can provide more information for predicting the missing values.

Experiment Results for Case 2.
To be more complex, we design a changeful network environment by adding some random disturbance to those 1, 974, 675 web service invocation results of the data set used in this experiment.If some similar users' network environment changes, those similar users may not be worthy of trust.It is obvious that the initial parameters of prediction model are unreasonable now.With the online optimization of ADP-based tuner, our method will automatically adjust parameters of prediction model to conform with the changes in the new network environment.
In this case, we make QoS predictions for a service user under the changed network environment.Here, we compare our ADP-based prediction method with the NIMF method [19].
In this case, we consider a kind of disturbance for similar users.In the experiment, we artificially add a series of unrelated users to replace part of similar users with the purpose of reducing the PCC value among similar users.As our ADPbased method perceives the changes happening in similar users, it will adjust the number of similar users employed in the method by reducing the top- value (i.e., the value  in our approach) and reduce the usage of information from similar users by increasing the value  which appeared in (3).
In Figure 8, we check the response time of the first 500 missing web services observed by one service user under the changed network environment.In this figure, the red line with spots represents the actual QoS value in every missing entry, and the blue line with stars represents the prediction data obtain by using the NIMF method with the fixed parameters of prediction model.In addition, the green line with rhombus represents the predictions results obtained The serial number of missing services for one user Actual data Predicted data using NIMF method Predicted data using ADP-based method by using our ADP-based online method with optimized parameters of prediction model.We can find a big error between the actual data and the prediction data using NIMF method in a changed network environment.Meanwhile, the prediction method with the ADP-based parameter tuner can achieve the satisfactory QoS prediction performance.

Conclusion
In this paper, we propose an ADP-based parameter tuner for QoS prediction model using collaborative filtering algorithm.Due to its online adaptability and approximation capabilities through successive iterations, this ADP-based parameter tuner shows satisfactory performance in a large-scale experiment studied in this paper.Moreover, our approach can easily adapt to the changes of network environment without requiring prior knowledge or identification of the prediction model.Then, our method may not be computationally intensive for the scale of web service QoS data set in a complex and changeful network environment.Furthermore, with the help of the flexible structure and learning algorithm of ADP, the proposed method can be extended to other types of QoS prediction models and achieve near-optimal performance.Our experiment studies validate the effectiveness of the proposed ADP-based parameter tuner.

Figure 1 :
Figure 1: The proposed collaborative QoS prediction framework.

Figure 2 :
Figure 2: The QoS prediction framework with ADP-based parameter tuner.

Figure 3 :
Figure 3: Schematic diagram for implementation of ADP-based parameter tuner.
: internal cycle of the action network,   : internal cycle of the critic network,   : internal training error threshold for the action network,   : internal training error threshold for the critic network.

Figure 7 :
Figure 7: A typical trajectory of  during a successful learning trial for the ADP-based parameter tuner.

Figure 8 :
Figure 8: The comparison of prediction results.

Table 1 :
return prediction result and MAE.Summary of parameters used in ADP-based tuner for QoS prediction.