A Database-Driven Algorithm for Building Top-k Service-Based Systems

The purpose of this work aims is to automatically build top-k (the number of suggested results) light weight service based systems (LitSBSs) on the basis of user-given keywords. Compared with our previous work, we use a score (oscore) to evaluate the keyword matching degree and QoS performance of a service so that we could find top-k LitSBSs with both high keyword matching degree and great QoS performance at the same time. In addition, to guarantee the quality of found top-k LitSBSs and improve the time efficiency, we redesign the database-driven algorithm (LitDB). We add the step of referential services selecting into the process of the LitDB, which could prioritize services with high quality (high keyword matching degree and great QoS performance). We design comprehensive experiments to demonstrate the great time performance of LitDB.


Introduction
Service composition is a technology of developing service-based systems (SBSs) by composing certain existing services [1]. Traditional service composition methods are too complex for non-expert users. More and more works [2,3] have been devoted to simplifying the service composition process. Meanwhile the researches [4,5] on the service composition algorithm based on keyword query came into being. These algorithms could build SBSs which reflects uses' preferences automatically, thus the nonexpert users could build complex SBSs easily just provide a few keywords. Some researches [6][7][8] develop several keyword-query based algorithms to find light weight SBSs (LitSBSs). Compared with other SBSs that satisfy certain query keywords, LitSBS has the least number of component services. In addition, compared with non-LitSBS, LitSBS is easy to managed, executed, monitored, debugged, deployed and scaled. There are some keyword-based algorithms using relational databases (DB) to store services since the query techniques of DB are mature and robust. In addition, using DB could guarantee the time efficiency of query functions.
Although many works use DB store services, there are also some problems. For example, in the works [9,10], both solutions of them store all possible SBSs in the database in advance requiring a lot of time and storage space. About the above issue, we proposed a database driven algorithm to build LitSBSs in our previously work [11]. We first evaluate keyword matching degree for each candidate service, and use utility function to calculate keyword matching score (kscore) for them. After that we design service composition algorithm to find top-k LitSBSs. Finally, we resort these top-k LitSBSs by calculating their QoS quality score (qscore). However, there is a problem in our previous work is that we cannot find top-k LitSBSs that meets both keyword matching degree and QoS quality at the same time, in other words, we have to split it into two separate processes: (1) Finding top-k LitSBSs with highest kscores. (2) And resorting these top-k LitSBSs by their qscores.
In this work, we integrate qscore and kscore to a new evaluating score oscore, therefore the step of QoS performance ranking could be canceled. In addition, in order to find top-k high-quality service compositions that meet users' needs, we propose the concept of preferential services. In the step of matched keyword table generating, we select the preferential services which with high oscores to ensure that they can be composed first.
The particular contributions we do in this work are as follows:  Oscore is used to integrate kscore and qscore, so that guarantee the QoS and keyword matching degree of the founded LitSBSs.


We add the preferential services selecting in the process of matched service table generating, the aim is to give priority to the advantageous services.
 Extensive experiments are conduced to illustrate the great time performance of the fast service composition algorithm. The structure of this paper is as follows: Section 2 clearly defines the issue we are willing to solve; Section 3 details the LitDB algorithm; Section 4 designs the experiments and illustrates the results; and Section 5 summarizes the full text.

Service Database
We design a service database to represent service library. We call the database service database L, which includes 4 tables with different style: The service table (TS), the input table (TI), the output table (TO) and the parameter table (TP). All the details could be found in our previous work [11].

The Keyword Matching Score
We set keywords matching score kscore to measure the matching degree between certain query keywords and service. kscore(s, E s , q r ) = ∑ kscore(s, e i , q r ) where kscore(s, e i , q r ) is defined by Formula 2.
score(s, e i , q r ) = 1+ln(1+ln tf) where, tf is the frequency of q r in e i ; df is the number of tuples in E i containing keyword q r and e i is the value union of the i th attribute of services in TS; dl i is the character number in e i ; avdl = (dl 1 + ⋯ + dl m )/m; N is the number of services in TS; and σ is a constant (usually 0.2).

The Quality Score of a Service
The following basic formula is a score function evaluating the overall QoS performance of a service.
For a positive QoS attribute of service s which belongs to a matched keyword table (We will discuss it detailly in Section 4.1) K j , the value is calculated by Formula 4.
where, for a negative QoS attribute of service s which belongs to MT K j , the value is calculated by Formula 5.

Integrated Score
oscore measures the overall performance of a service by considering both its keyword matching degree and QoS performance. oscore = f(kscore, qscore) (6) where, f is an aggregation function of kscore and qscore. For example, we can use the weighted average to aggregate kscore and qscore, or define f as: oscore = kscore + qscore. To get top-k LitSBSs, we design an algorithm automatically querying service database and then recommending top-k LitSBSs to users, which is based on the input query keywords and QoS constraints.

LitDB: An Algorithm for Building Top-k LitSBSs
A database-driven algorithm (called LitDB) is proposed to efficiently build top-k LitSBSs according to user-given keywords. Fig. 1 shows the process of this algorithm. The LitDB includes three main stages: (1) The step of keyword matching is to search for services containing certain query keywords. (2) Matched service table generating builds a service matched table (MT) for each query keyword, which means all found services are related to a certain query keyword will be put into its MT. In addition, in each MT, those preferential services will be selected and sorted in descending order according to oscores firstly. Then those non-preferential services will be sorted behind those preferential services. (3) Service composition algorithm will find the top-k LitSBSs quickly and efficiently.

Matched Service Table Generating
According to the oscores, several matched keyword tables (MT) are generated for certain query keyword. For instance, Tab. 1 shows the MTs for certain query keywords: Car hire, Flight and Insurance quote. Services in a MT are ranked in descending order according to oscores. Table1: Matched keyword tables for query keywords 3.1.1 Preferential Services qscore is design to measure the QoS performance of a service and kscore is to evaluate the matching degree between a query keyword and a service. The higher the kscore and qscore of a service which means the better this service matches the query keywords as well as the better its QoS performance. As is shown in Fig. 2, each service is presented as point in 2-dimensional space. We can see that service a is a preferential service, because there is no other service that has both higher qscore and kscore than a. Similarly, Service b, c, d has this condition too, therefore b, c, d are also preferential services. Since preferential services have better performance than other services, to save time, we should guarantee that they will be selected firstly to be composed. Before generating a MT for each keyword, we should select preferential services first, then these preferential services in the MT will be sorted in descending order according to oscores. Finally other services will be ranked behind the preferential services also in descending order according to oscores.

Service Composition Algorithm
We develop three service composition algorithms searching for top-k LitSBSs that matching Q among the MTs we mentioned before. We first introduce the most basic algorithm called Intuitive. Then we introduce other algorithm called Enhanced with better time performance than Intuitive by adding the pruning strategy. Finally, the most useful one called Fastk will be introduced, which has better time performance than the former two.

Intuitive Algorithm
Intuitive is the most basic algorithm, which is on the basis of exhaustive search. Algorithm 1 shows the process of Intuitive. We put combination of services obtained by searching the MTs into BuildSBS function in order to check whether the combination could be composed or not. If a combination of services can be composed, we put the service composition in R. When the search finished, we rank all LitSBSs in R in descending order by calculating their oscores. Finally, we get the top-k LitSBSs in R with highest oscores.
Function BuildSBS (Algorithm 2) is used to check whether a set of services can be composed or not. We use Γ to limit the number of the component services in an SBS in order to avoid wasting too much time on searching for an SBS with very huge size. In this algorithm, an expansion rule is design to expand service composition: If a service in TS could be combined with a service in a service composition SC, then SC will be expanded. We could use SQL query to search a service . In this function, the result we get must with the smallest size of all the results.

Enhanced Algorithm
Enhanced algorithm is proposed to improve the time efficiency of Intuitive algorithm. Compared with Intuitive algorithm, Enhanced algorithm use an upper bound to improve time efficiency. We call the upper bound the most top service composition value (MTSCV) defined by the definition 3. Alg. 3 shows the whole process of the Enhanced. It first finds k LitSBSs, it then calculates for the remaining tuples in . If the current is no larger than the lowest oscore of the found k LitSBSs, the algorithm will stop and return the k LitSBS; or else, it will continue searching the LitSBSs based on the remaining tuples. .
represents the upper bound of the oscores of all possible LitSBSs in the remaining tuples (i.e., excluding the tuples in the found k LitSBSs). The inputs of Algorithm 3 are query keywords Q, k, a set of MTs ( , … , ). R keeps the possible LitSBSs, which are always sorted in descending order according to oscore. It first finds k LitSBSs (lines 2-9 in Algorithm 3). Then for those services waiting to be composed, we calculate their boundary . If their exceed the lowest oscore of all the found k LitSBSs in R, the BuildSBS function is used to check whether they could be composed or not (lines 10-16 in Algorithm 3). After that, if combination's oscore is higher than the one of the found k LitSBSs, the LitSBS with the lowest oscore will be replaced (lines 17-18 in Algorithm 3). At last, we get top-k LitSBS in R.

Fastk Algorithm
Finally, we design Fastk has better time performance than Enhanced. Algorithm 4 shows the whole process of Fastk algorithm. The inputs of Algorithm 4 are query keywords Q, k, and a set of MTs ( 1 , … , ). We set a stack N( ) for each to store the processed tuples of . R is established to store the possible top-k LitSBSs, which are always sorted in descending order according to LitSBSs' oscores. P is set to keep the final top-k LitSBSs, besides the top unprocessed tuple of is stored in h( ). After that, we check whether the top tuple of each could be composed or not. If they could be composed, a LitSBS will be built and put into R (line 4 in Algorithm 4). And then we calculate of each service in h( ), ∀i ϵ 1, … , v, and move the service with the highest MTSCV to N( ) (lines 9-10 in Algorithm 4). Then for each service combination, we use BuildSBS function to check whether they can be composed or not. (lines 11-12 in Algorithm 4). Finally, if those LitSBSs with oscores ≥ the current in R will be moved to P. (line 14 in Algorithm 4). What is more, the previous operations in lines [9][10][11][12][13][14] in Algorithm 4 will be repeated until P contains k LitSBSs.

Experiment
Performance testing of both Fastk and Enhanced is conducted in this section on the basis of WSC-2009-web challenge datasets [9]. The device we use to run certain experiments is a server with a core CPU at 16 GB RAM and 2.60 GHZ, running Windows10 x64 Enterprise. We perform each experiment five times to obtain the average execution time.

Dataset
The time efficiency is evaluated pertaining to the value of k (from 1 to 10) and the number of services (from 1000 to 9000 in five datasets). Each test is conduced five times to get the average performance of these two algorithms. The 5 datasets we use are from WSC-2009-web challenge datasets [12]. What is more, each service contains several information including service ID, four or five output and input parameters, service name and two QoS properties including response time and throughput. We will use all the information into our experiments. Since our main purpose is to compare the time efficiency of Fastk with that of Enhanced, we simulate oscores of each query keywords for each dataset by randomly setting these oscores in {1,2,3, … ,10} instead of practically calculating them. Based on these oscores, we create matched service tables (MTs) for each query keyword.

Effect of Ns (Numbers of Services)
We set k = 1 and the number of query keywords (Nq) = 2 to compare the time performance of the Fastk with that of the Enhanced when Ns varies from 1000 to 9000. The average time is shown in Fig. 3. The average execution time of both algorithms rises as the Ns increases. When Ns rises from 1000 to 7000, the average execution time of Fastk gradually becomes faster than that of the Enhanced. In addition, during this interval, both average execution time increase slowly. While when Ns ranges from 7000 to 9000, both two average execution time rise dramatically. When Ns = 9000, the average execution time of the Enhanced is almost 3 times as much as that of the Fastk.

Effect of k
We fix Ns = 1000, Nq = 2 to test the time performance of the Fastk and the Enhanced as k varies from 1 to 15. The average execution time is shown in Fig. 4. When k increases from 1 to 9, the average execution time of both two algorithm change dramatically. While when k rangess from 10 to 15, the average execution time of both two algorithm change slightly. On the whole, the trend of average execution time is increasing.
From the above result we can see, compared with Enhanced, Fastk has greater time efficiency performance with the rising of number of services. Fig. 3 illustrates that Fastk has greater time performance than Enhanced when the number services rises. While when k changes, the difference of time performance between Fastk and Enhanced is not clear.

Conclusion
We integrate the kscore which is used to evaluate the keyword matching degree and qscore evaluating the QoS of a service to a new score oscore. Thus we could find the top-k LitSBSs with both high keyword matching degree and great QoS performance at the same time. In addition, we redesign the database-driven algorithm by adding the step of preferential services selecting. This step guarantees the the services which with high quality will be considered preferentially in the whole process of the algorithm. The experimental results show the effectiveness of our algorithm.