Towards Service Composition based on Hybrid Bio-Inspired Cloud-based QoS Provisioning Approach

It has recently become a critical issue to provide software development in a service-based conceptual style for business companies . As a powerful technology for service-oriented computing, the composition of web services is investigated. This oﬀered great opportunities to improve IT industries and business processes by forming new value-added services that satisfy the user’s complex requirements. Unfortunately, many challenges are facing the service composition process. These include the diﬃculties to satisfy the user’s complex demands, maintaining the performance to be matched with the quality of service (QoS) requirements, and search space reduction for QoS missing or changeable values. Accordingly, this paper proposes a cloud-based QoS provisioning service composition (CQPC) framework to address these challenges. To prove the concept and the applicability of the CQPC framework, a Hybrid Bio-Inspired QoS provisioning (HBIQP) technique is presented for the operation of the CQPC framework modules. The solution space is reduced via utilizing skyline concepts to have faster execution time and keep only reliable and most interesting services. The CQPC framework is equipped with two proposed algorithms: (i) the modiﬁed highly accurate prediction (MHAP) algorithm to enhance the prediction of QoS values of the services participating in the composition process, (ii) the MapReduce fruit ﬂy Particle swarm Optimization (MR-FPSO) algorithm to handle composing web services for large scale of data in the cloud environment. The experimental results demonstrate the worthiness of the HBIQP technique to meet the performance metrics more than other state-of-the-art techniques in terms of average ﬁtness value, accuracy, and execution time.


Introduction
Undoubtedly,service-oriented architecture (SOA) has significant use in enterprise information technology (IT) environments due to its technical and cost benefits. The use of SOA is recognized as one of the major trends in complex organizations for so many reasons as follows:(i) it helps the solution architects and designers to conduct better capacity planning, (ii) define better solutions for resource sharing, and (iii) adopt easier service migration solutions in terms of the deployment and maintenance [1]. With the exponential growth of web services available in the cloud infrastructure, selecting the one that best fits the requirements of users is a challenging issue. Choosing the web service that fulfills the different needs of users is the core of service-oriented computing in the software developing environment.
With the rapid development of the Web, more services are created that provide specific functions. Recently, the convergence between the Internet of Things (IoT), big data ,and cloud computing has received significant attention in both academia and industry. The cloud will act as a transitional layer between IoT entities and applications to efficiently manage resources. The IoT consists of virtually linked objects or intelligent devices that are physically distinguishable and suitable for sensing, processing, and interaction with the circumstances. The integration of smart device services will create an application based on the IoT domain. For the operation of IoT application tasks, smart devices are equipped with sensors. These sensors collect data around the area they are deployed. The real-time data obtained from sensor devices is converted into new data sources to improve the accuracy of predictive services and to optimize network operations for efficient automation. The IoT-based cloud infrastructure prototype that allows access to the IoT is defined as sensing as a service (SaS). The SaS enables ubiquitous management of remote sensors through web services and facilitates the efficient delivery of sensor data on a demand basis to satisfy the users' requirements.
The future of Internet services has concentrated on "its capacity to build a framework has the ability for delivering the service for a significant number of users efficiently". The primary mission is to move services based on "pay as you go," from conventional form to the form of web applications so that we can supply Quality of Service (QoS). Accordingly, efficient information dissemination has become the new production factor, notably in terms of developing more web services over the network. To this end, organizations start to provide their core business in the form of services over the web. So, the concept of web services becomes very important ,and a challenge for many researchers [2]. The best service composition and service recommendation in the SOA integration context are two important aspects of applications that mainly depend on web services. Service composition is an application of web services, where a single available service may not able to guarantee user satisfaction. For this reason, many services are joined together to create a new composite service with a new value-added function. The proper creation of the composite service is a challenging issue. Additionally, several services may have the same functional attribute but differ in non-functional attributes. Hence, the service selection to participate in the workflow of composite service is another challenging issue.
In the service composition, the time and optimality dimensions are the most common factors. It is important to minimize the time to create a composite service. The execution time is highly related to the large available solution space. Besides, it is crucial to guarantee that the global composite service satisfies the users' requirements. Hence, the composition problem should be solved by taking into consideration these two factors. Composing the services is considered as an optimization problem with a large solution space of candidate services with time constraints.
Due to the Internet environment nature, the QoS of candidate services may be missed or changed along the time. The QoS values affect the optimality of composite service. Services with unstable QoS value leads to inefficient composite service. Therefore, stable services and accurate prediction of QoS values are needed. To summarize, optimality, execution time and QoS values are essential parameters to guarantee user satisfaction. Also, achieving the scalability of available services in a certain amount of time is a composition challenge.
Accordingly, this paper addresses the following research issues: Selecting the most appropriate candidate service that best participates in the global end-to-end solution.
Reducing the search space of candidate services to reduce the execution time for composite service.
Providing an accurately predicted value for the QoS of service to cope with the changes in the network environment.
Measuring the final solution optimality. In this study, a novel cloud-based QoS provisioning Service composition (CQPC) framework is proposed to adopt the whole process of service composition. The CQPC framework consists of five modules as follows: (i)the normalizer module gives a certain range for all QoS attributes and historical user orders. (ii) the Clusters generator module reduces the search space and prepares a set of QoS predictive values for the next module. (iii) the QoS predictor module is responsible for predicting the QoS values that may be changed or missed because of the dynamic changes of the network. (iv)the service reduction module reduces the available candidate services to improve the composition process. Finally, (vi) the composition process is done, and a list of candidate composite services is generated through the service composer and evaluator module.
Moreover, this study presented a Hybrid Bio-Inspired QoS-provisioning (HBIQP) technique that manages the operation of the CQPC framework modules. First, the service users are clustered using the fuzzy C-Means clustering module. Then, entropy with correlation variation statistical methods is used to eliminate the unreliable services with values that have a larger deviation by using in stable service selector module. Many existing algorithms for service selection can be used [3] [4] [5]. However, after filtering solution space based on uncertainty, it is still large and has a slow execution time. The skyline service sub-module is used to reduce the solution space. The main objective is to provide faster execution time and keep only reliable and most interesting services.
To this end, some important issues should be addressed, such as the lack of monitoring of the uncertainty and unreliability of services due to the dynamic nature of the environment. QoS attributes could be changed or even becomes unavailable. The use of inaccurate values for QoS leads to inaccurate results in the composition process. For this reason, A modified highly accurate prediction (MHAP) algorithm is proposed. MHAP algorithm operates in the QoS predictor module to predict the QoS values based on the available historical information. The MapReduce fruit fly particle swarm optimization (MR-FPSO) algorithm is proposed in the service composer module to handle the large scale of data in the cloud environment. The proposed algorithms perform the composition process in an acceptable time cost with more accuracy while satisfying the users' needs.
The paper is organized as follows: in section 2 the related state-of-the-art studies and researches are reviewed. Section 3 describes the preliminary concepts of web services. Section 4 explains various modules of the proposed CQPC framework for service composition. In section 5, the proposed HBIQ is illustrated. In section 6 the simulation results that compare the proposed HBIQP technique with the other state-of-the-art techniques are discussed. Finally, section 7 outlines conclusions and future work.

Literature Review
Recently, there has been extensive research on creating reliable services, selection of services ,and the composition process. This section reviews the related studies to the service computing that has been widely studied, and it explores several issues and challenges. Rehman et al. [6] and Hossain et al. [7], proposed a multi-phase process and apply multiple criteria decision making to find the value of QoS. Besides, they used the weights assigned to each QoS. However, the global end-to-end constraints provided by the user were not satisfied with the results although the time cost was acceptable. Jin et al. [8] performed the composition of services in the cloud platform using a genetic algorithm besides the correlation-aware model.
Mahmoud et al. [9] proposed a Map-Reduce Non-Dominated Sorting Genetic Algorithm (MRNSGA-II) to handle the composition problem by applying multiobjective optimization NSGA-II over Map Reduce structure beside using clustering technique. MRNSGA-II limitations were that it can't handle large scale candidate services efficiently. Besides, it is limited to search for similar services according to similar previous user requests. Seghir and Khababa [10], proposed a method that combines two evolutionary algorithms. The fruit fly optimization algorithm is combined with the genetic algorithm to take advantage of each one to find the composite service. Chen et al. [11], investigated a method that uses partial selection to select the composed services based on QoS awareness. They proved their effectiveness theoretically besides a large number of simulations.
McNabb et al. [12] presented MapReduce Particle Swarm Optimization(MRPSO). MRPSO is introduced to avoid problems arisen from large-scale parallelization. Khalid et al. [13] introduced the MRGA algorithm. They propose and evaluate the performance of GA using MapReduce (MR). They showed that their combined algorithm achieved acceptable performance while handling a huge amount of data.
Zhang et al. [14], proposed an approach that combines the PSO algorithm with Hadoop infrastructure for large-scale service composition over a distributed environment. Hossain et al. [15], proposed a method to handle the massive number of available services. They used the parallel PSO algorithm with the K-means clustering algorithm over the distributed platform, Hadoop. Unfortunately, it was inefficient in data reading. Yong et al. [16] presented a two-phase approach for service composition. The first phase applies k-means clustering for reducing the number of candidate services for selection. In the second phase, a mixed-integer programming approach is used for selecting the most appropriate service among the reduced set. Zhang et al. [17], introduced an improved version of FOA simulated with service composition to handle large-scale services available in the cloud. The disadvantage was that the execution time required was high, according to the services available.
Jin et al. [8], proposed an approach to prevent premature convergence issues. Their approach computes the fitness of individuals using a ranking method with a genetic algorithm where the services' correlation is regarded. Ma et al. [18], presented a method for predicting the QoS values as a result of the dynamic environment of the services and network issues. They use the information of the users of services, historical information, besides the information about services itself. They aimed to find the similarity between services and service users and to obtain the similarities between users and services into a model. This model best provides predicted values for missing QoS properties based on the linear regression of different predicted values. Most of the heuristic methods are semi optimal solutions with low convergence speed and low time in large-scale data. Li [19] proposed a QoS-optimized cuckoo algorithm to improve security and service efficiency. Additionally, a multi-strategy semantic matching algorithm for improving accuracy is proposed. Farsi et al. [1] proposed a Performance-Oriented Integration Design (POID) framework that guides the software architects and designers to build better designs for the integration in SOAs. The objective of the POID framework is proved through a case study to verify the capability of the framework to achieve better accuracy in calculating the response time for the different service compositions' approaches.
Hasnain et al. [20] proposed web ranking system that is based on quality metrics of reliability and response time. Five key modules are included and 96.17% is achieved by the proposed method. TWSCO is the system proposed by Chunling et al. [21] for composing web services efficiently. The TWSCO system is designed for the trust grantee of the composition processes with a service filter, and an interface-based service. Shunshun et al. [22] implemented an algorithm to maximize adaptive brain storm (MCaBSO). MCaBSO used a dual adaptive approach to better integrate web services. A short time, good speed of convergence and scalability is achieved with the MCaBSO algorithm. Huang et al. [23] have proposed the composite stochastic Petri network model of edge servers and clusters to solve problems in large-scale networks and/or complex service processes. The main goal was to allow reliability to be achieved.
In this study, a hybrid algorithm that combines heuristic approaches particle swarm optimization (PSO) [24] and fruit fly (FOA) algorithm [25] is proposed. Also, it suggests applying this for the distributed computing platform, Hadoop. The proposed HBIQP technique aims to obtain acceptable time cost and as nearoptimal to the end-to-end constraints with minimum error rate.

COMPOSITION MODEL IN THE CLOUD PLATFORM
Machines to Machines (M2M) communication manage our everyday lives. The "intelligent agents" people have touted for ages will finally materialize. Intelligent Systems' intelligent mobility initiatives provide new possibilities for optimizing the use of real-time data from various sensory resources. This has resulted in the new production factor being efficiently disseminated, particularly in improving the use of network resources, mitigating traffic congestion, and reducing the power consumption of transmission over the network. To prove a large number of tackled solutions, cloud computing is a relatively stable environment. However, the limits to network bandwidth as well as the extension into the data transfer process are the more drawbacks. Cloud computing and IoT convergence cannot meet the diverse requirements of users.
Even at the basic level, many issues and challenges such as scale, heterogeneity, physical knowledge, high complexity, privacy and security, and QoS support must be solved for successful IoT implementation. In recent years, both academia and industry have gained considerable attention for the amalgamation between the cloud and the service composition. The future of Internet services has been based on its ability to build a system or framework for a substantial number of users to efficiently deliver the service." In this vein, seven layers are as shown in figure 1 in the amalgamation model: 1) Sensors and smart devices for the collection of IoT information comprises the perception layer. 2) The network layer makes links, sensors, and network devices to other servers. The second important aspect of this layer is that the data obtained from the perception layer is processed and transferred.
3) The Fog layer provides remote IoT scalability for a huge number of IoT devices and large volumes for lowspeed applications in real-time. 4) Various sub-services to various private or public clouds may be supplied in the cloud layer. 5) The service composition layer is responsible for composing several sub-services together regarding the user's functional and non-functional requirements. 6) The application layer shall conduct end-users according to their requests with unique composite services. The process of forming a composite service over the cloud platform is very similar to such a traditional one. The user request represents a complex task that consists of N small tasks,where N = {t 1 , t 2 , ......, t N }. For each abstract task T i , a set of M concrete/candidate set of services is available that must satisfy the user requirement. Concrete service represented by C ij = {CC i1 , CC i2 , ......, CC iM }, where CC ij refers to the j th concrete service candidate for the i th abstract task. Based on the QoS constraints, a concrete service that best fits in the global solution is selected, as shown in Figure 2.

QoS-BASED MODEL
Assume S service classes are available where each class contains a set of services with the same functionality, but they are with different QoS attributes according to the service provider, such as they may differ in price, response times ,and other factors.
The goal is to find the best combination of services from different classes that meet the global end-to-end user preferences. There are many important used QoS non-functional attributes for describing web services over the distributed environment such as cost, response time, reliability, availability . . . etc. Table 1 reviews the brief description of important QoS parameters [26] [27].

WORKFLOW MODELS AND QOS VALUES AGGREGATION
Many workflows are available for modeling the cloud service composition task. The different composition models are based on the complexity of the user request and The time server needs to process a specific request.

Price
How much the user that request specific services has to pay to peruse the service.
the posed constraints [27]. The composition models are sequential, parallel ,and conditional. An aggregated function is used for a complex task that consists of many subtasks. This to compute each QoS attribute for the abstract task per an appropriate workflow based on various types of QoS attributes and used workflow. For each attribute, the aggregate QoS value is calculated according to the aggregated function of the most common models in Table 2 [10]. Where j is the index value of the selected concrete cloud service from the related candidate set S i .Also, P ri is the probability of i.

The Proposed CQPC Framework
The proposed CQPC framework consists of five modules as shown in Figure 3. First, the normalizer module is responsible for giving a certain range for all QoS attributes and historical user orders. It converts QoS values for the candidate services to normalized values ranged between 0 and 1. Normalization is essential to grantee the performance of the other CQPCs' modules.
Second, the Clusters Generator (CG) module, which is responsible for clustering data according to QoS-based services, users, and historical user orders. The CG module consists mainly of three sub-modules. These submodules are the service clustering (SC), the user clustering (UC), and the requests clustering (RC). The goal of the SC sub-module is to reduce the solution space of candidate services to reduce execution time. It improves the prediction of QoS values done by the predictor module. The UC and RC sub-modules clusters historical requests of users to determine the route of clusters to follow for any new user request.
Third, the QoS predictor (QP) module, which is responsible for predicting the QoS values that may be changed or missed because of the dynamic changes of the network. Accurate or near accurate predicted values for QoS would result in efficient composition. Many algorithms can implement this module. These algorithms include collaborative filtering, time series forecasting and . . . .etc. Most of these algorithms Fourth, the Service reduction (SR) module which is responsible for reducing the available candidate services to improve the composition process. This module mainly consists of two sub-modules. These sub-modules are the Statistical-based Reduction (SR) and the Domination-based Reduction (DR). The SR is responsible for removing unstable services that may affect the quality of the composition process. It employs some statistical concepts to keep only reliable and stable services. On the other hand, the DR removes redundant services for further solution space reduction.
Finally, the service composer and evaluator (SCE) module is responsible for the composition process. It generates a list of composite services. The SCE can be implemented through several algorithms. MRNSGA-II and the proposed MR-FPSO algorithm are practical examples to implement this module. The SCE is also responsible for the execution and monitoring of the service composition process. In case of failure of any composite service, it selects an appropriate alternative composite service from the generated list.

The Hybrid Bio-Inspired QoS-based (HBIQP) Technique
The HBIQP technique, as shown in figure 4, implements the various modules of the proposed CQPC framework. This includes the Normalizer module, CG module, QP module, SR module, and SC module. The fuzzy clustering phase implements the CG module. The user clustering sub-module is used to cluster the service users into a set of clusters to the fuzzy cluster. Also, the QoS value prediction phase implements the QP module to predict the value of QoS. Moreover, the Stable Service Selector (SS) and Skyline Service Selector phases implement the SR and DR sub-modules. Finally, The SCE module is implemented through the proposed MR-FPSO algorithm to perform the composition process. In the next subsection, the paper discusses more details about each module.  (1) and (2) are used to set all values in a range from 0 to 1 [28]. The normalization module uses the available maximum and minimum values for QoS parameters.
Where q refers to the value of QoS that should be normalized, q min andq max refer to the minimum and maximum values for specific QoS property respectively. q nrm refers to the value after normalization. Normalized QoS values for the candidate services are ready for the next following phases.

CLUSTERS GENERATOR MODULE
To implement the CG module, fuzzy clustering is used to cluster the service into a set of clusters. This gives several advantages. First, it reduces the search space of the candidate services. Second, it helps to get a set of QoS prediction values required for the next phase.

Fuzzy C-Means clustering (FCM) Algorithm
The Fuzzy C-Means clustering (FCM) algorithm implements the UC sub-module. FCM is used because QoS attributes are not good enough to be divided into sharp clusters since the values may be overlapped. To determine the input and the output of the fuzzy C-means algorithm, assume there are T of users defined as T = {us 1 , us 2 , us 3 , ......, us T } where us j refers to the j th user.
In addition, there are W of services of similar function W = {se 1 , se 2 , ......, se r } where se i refers to the j th web service. For each web service, there is a vector of QoS properties describing the web service se i as an evaluation after being invoked by user us i referred by q ij . Then, for all services invoked by user us i , the overall vector describing the QoS of services is denoted by Q j = {q j1 , q j2 , q j3 , ......, q jw }.
Hence, the input to the clustering algorithm is the service users and some parameters related to the clustering process. Each user is defined by a vector of QoS attributes represent the set of historical values of services invoked by the user. The output of the fuzzy C-Means algorithm is a set of centers of the clusters and the memberships of objects belong to those clusters. In the basic fuzzy C-means algorithm, the used similarity is measured through Euclidean distance. It gets distance, the similarity between the objects to be grouped and the center of that group. To get more accuracy in the clustering process, the Pearson Correlation Coefficient (PCC) algorithm is used with fuzzy C-Means instead of Euclidean distance in computations. PCC has a better accuracy since it indicates to what degree service is related and more correlated to another service. For this reason, PCC is more appropriate when dealing with quality values observed by users in the real world. In the context of PCC-based FCM, suppose that there are two service users denoted as a and b. To find the correlation and similarity between them, PCC provides the following equation: Where q a and q b denote the average values that both users a and b observed for QoS values. I(a) and I(b) refer to the set of services invoked by users a and b respectively. High similarity means there is a small distance between users and vice versa. According to this, the final modified form of the objective function of FCM that should minimize clustering iteration is described as: FCM iterates until a termination condition is reached which is |N DM − CDM | ≤ ε Where, NDM is the new degree of membership and CDM is the current degree of membership.

QoS PREDICTOR MODULE
In this module, a modified highly accurate prediction (MHAP) algorithm is proposed to get an accurately predicted value of the QoS values. The proposed MHAP algorithm uses the pre-phase of fuzzy clustering. Fuzzy clustering is used to cluster similar users in the same group and use similar users to predict the values of QoS. Combining Fuzzy clustering with the HAPA algorithm leads to having a more accurate predicted value rather than values achieved by the HAPA algorithm. An evaluation of the proposed MHAP algorithm compared to HAPA is also discussed in a later section.

Modified HAPA (MHAP) Algorithm
As mentioned before, HAPA [8] is an algorithm designed to accurately find predicted values from QoS values. The HAP Algorithm consists of three main steps: prediction based on similar users, on similar services ,and the overall predicted value combined from previous steps. To enhance the prediction values, the papers suggest combining fuzzy clusters which are implemented in the previous to the HAP algorithm. Services that are invoked by a user may belong to many clusters. Therefore, a user may participate in the prediction values in many clusters. A cluster is characterized by a membership matrix and cluster center matrix. To make the prediction values more accurate, the membership of each user multiplied by the centers of each cluster is considered. The predicted value for user u is represented by equation (5) as follows: Where the predicted value of r u,i denoted by pre su r u,i for a cluster of d-dimensions. Then, it is generalized for all similar users in KU . Similarly, the predicted value based on the service i is: The proposed MHAP algorithm guarantees considering the participation of a similar user and service in all clusters with different degrees of memberships. This leads to having a more accurate predicted value. The pseudo-code of the MHAP algorithm is shown in Algorithm 1.

SERVICE REDUCTION MODULE
This module is responsible for reducing the search space of candidate services to decrease composition time. Removing redundant and unstable services reduces the available services for the composition process. This module is implemented via two phases. First, stable service selector (SSS) represents the uncertainty filtering of services. Second, Skyline Services selector (SKSS) that applies the skyline operator to find the most interesting services.

Algorithm 1 MHAP Algorithm
Input User-Clusters Output predicted-Value 1: procedure User Prediction Phase 2: Find the Similarity between u and similar users using P CC.

3:
Apply linear regression to reduce solutions into one Rough prediction r u,i . 4: Find the predicted value based on roots x 1 and x 2 and incorporating the fuzzy cluster centers using equation 5.

5:
Use the ku users to predict the overall value. 6: end procedure 7: procedure Service Prediction Phase 8: Find the Similarity between service i and similar service using P CC. 9: Apply linear regression to reduce solutions into one Rough prediction r u,i . 10: Find the predicted value based on roots x 1 and x 2 and incorporating the fuzzy cluster centers using equation 6. 11: Use the KS users to predict the overall value. 12: end procedure 13: procedure Overall Prediction 14: Use the decomposition of the three models to find the overall predicted value. 15: end procedure

STABLE SERVICE SELECTOR (SSS)
The SSS is a part of the service reduction module in the proposed CQPC framework. The process of selecting candidate services needs a lot of time depending on the size of the solution space. Hence, there is a vital need to reduce the solution space. The reduction in solution space affects the selection process then the composition process according to the execution time factor. The SSS employs some statistical concepts to keep only reliable and stable services. Table 3 presents an example of three web services with QoS response time. Considering average values mean that WS3 is the best service as it has less response time. By a closer look at the values, we note that the range of values for WS3 is distributed over a large range. It means that the values for these services are unstable. Whereas WS2 has an average response time larger than WS1 and WS3. But the range of values is closed to each other than the other two services. We conclude that WS2 is more stable in its values rather than the two other services. A vector of Entropy and correlation variation determines the degree of disordering and deviations of the values.
Correlation variation (CV ) and entropy would help in filtering the services according to their uncertainty. Hence, less reliable services would be filtered. Entropy refers to the level of disordering and average uncertainty that exists in the ser-vice. As the entropy measure increases as the uncertainty that exists in a service decreases.
To employ and calculate entropy theory, we consider the historical values of services for the real-world data to be the values to calculate the entropy using the following equation: Where X is a random variable with the range {X 1 , X 2 , X 3 , ......, X n }. Whereas the correlation variation refers to the degree of variations that occurred which is considered as a measure of the stability of service. Having a large correlation variation value refers to more uncertainty and instability in services. To apply the coefficient variation measure, the following equations are used: Such thatX represents the mathematical average for random X variable within the range X 1 , X 2 , X 3 , X n . Both Correlation variation and entropy measures are used to filter and eliminate the less reliable services that would slow down the execution time. Besides, using both measures (not only one of them) is essential to get accurate results. Since there are some situations in which services have an equal entropy. So, CV is used to differentiate which one is more reliable than the other. As a conclusion, the process of filtering services is described as follows: firstly, entropy is applied for the service set of candidate services, R. The result of entropy is the smallest list in entropy R1 contained in R. Then, CV is used to filter list R1 further into a more reduced and reliable, list R2. This refers to the most reliable service available among candidate services. So, the result obtained by this module would reduce the search space to be used for the next module.

Skyline Services selector (SKSS)
After filtering solution space with entropy and CV based on uncertainty, solution space is still large and has a slow execution time. By using the skyline service submodule, solution space can be reduced to have faster execution time by keeping only reliable and most interesting services. It selects services that are not dominated by other services in terms of QoS aspects. These aspects such as response time, price, reliability, and . . . etc. Applying skyline operator separates these services that are potential from those that are not potential for the composition process. In other words, the best combination for service composition to fulfill global constraints is the one that depends on service S i that is considered from the skyline services. Hence, it reduces a large number of redundant services and speeds up the process of service selection. This leads to improving the final optimal solution in the composite service.
The paper suggests using the Block Nested Loop skyline (BN L) algorithm for calculating skyline services [29]. To extract skyline points, BN L compares all services in terms of their QoS with each other (two-by-two). Then, it returns the skyline services that are not dominated by any other services. BN L algorithm is applied to clustered data in each map function. So, BN L is implemented for each Map function to extract skyline services in it.
Additionally, a method for determining a set of the most interesting services indicated as interesting services that best define the skyline services is investigated. The extracted skyline services might be large and consume more computations and time. Therefore, we need to identify the top-K best interesting skyline services among available skyline services. The challenge is, how to select the best interesting skyline services that provide the trade-off between different QoS available while satisfying the user-defined constraints. Also, interesting services should be small enough for computations to satisfy user requirements. To obtain top-K interesting skyline services, The cluster center is determined by the average of points or services contained. After determining a center for each group of skyline services, the top K of skyline services is chosen based on PCC distance measure. For each skyline service S in a group with a center CL, top K is chosen such that minimizing P CC(S i , CL i ).

SERVICE COMPOSER and EVALUATOR MODULE
This module is responsible for finding the best service composition that fulfills the user needs by selecting the most appropriate services from selected clusters. The composition process is performed in such a way to obtain accurate near-optimal solutions. To implement the service composer module, we propose a new hybrid bio-inspired algorithm MR-FPSO. It merges between PSO and FOA. Additionally, it is integrated with the MapReduce framework. The standard PSO algorithm suffers from falling into local optimum solutions, which results in low-quality results far from the optimal solution. To address this problem, a proposed method with modified PSO is introduced to enhance the features of PSO. The proposed algorithm MR-FPSO is implemented in the MapReduce environment to handle the huge amount of data in the cloud environment.
To improve the PSO algorithm, the fruit fly algorithm and newly updated velocity mechanism are used and integrated into PSO. This improves the position of the particle, solution, toward the location of the new solution. We utilize the advantages of FOA such as simplicity and speed. Also, the smell operator of FOA is used to update the position based on the available information for fruit flies for smell sensation.

FITNESS FUNCTION
Each solution is represented as a particle. We use a fitness function to evaluate the performance of each solution that represents a composite service. The main objective is to select the services that best match the user requirements among concrete services. In other words, maximizing the overall aggregated QoS value of the composite service is achieved by maximizing the fitness function provided by equation 11.

Composition strategy for solution encoding
The paper assumes a strategy for coding using a fixed integer array. Each task is represented by a set of abstract services represented by an array of integers. Each abstract service is represented by a set of concrete services. Each array item represents the candidate service number. As shown in figure 5. The fitness function is represented by the weighted sum of the QoS parameters of the newly added service. In equation 11, the variables W r indicates the weights of the r th QoS attributes of the required composite service C n and q nrm refers to the normalized value of the r th QoS attributes of the composite service. The main objective of the proposed algorithm is handling large-scale data of services to create a composite service with high accuracy in an acceptable amount of time.
It provides a way of parallelism. MapReduce structure is used with its two main functions Map() and Reduce(). The proposed algorithm consists of two main steps: the preparation step, and the parallelism step. Each step is described in more detail in the following sub-sections. A summary of the overall algorithm is presented in Algorithm 2 and 3 .

Preparation step
In the first step, an initial population, i.e. swarm, is created with N particles represent some possible solutions from the solution space. For each particle in the initial population, an initial random position x i is initialized. Then, the fitness function is calculated for each particle to evaluate such particles. The global best position is initially generated. For each particle in the initial population, a record of < key, value > pair is generated for further implementation in the MapReduce structure. For every particle, the key is denoted by a constant integer number ,and the value represents information about the particle v i , x i , P b est, G b est, and f itness.
. . .., V i,n , and F ( * ) refers to the fitness function evaluation. The particles are then stored in files as initial input for MapReduce.

Parallelism
Step Service data are partitioned into data blocks of disjoint subsets according to partitioning criteria. Each data block is dispatched for a Map node for parallel processing. The goal of the process of partitioning the data into blocks is efficiently fitting the partitions into memory. The workload can be distributed over many servers to increase performance. Therefore, any query is executed on each partition in a parallel way. Finally, the results are merged to get the global result. Data partitioning is employed by many schemes of partitioning such as grid-portioning [30], angular partitioning [31] and . . . etc. The angular partitioning approach is more effective compared to the other approaches. It reduces the required time and fairly shares the offered workload [32]. The second step is an iterative step executed as a job in the MapReduce. At the end of each MapReduce iteration, a new swarm of updated particles is created. The new swarm replaces the previous swarm of solutions. The modifications done in each swarm is based on the functions implemented in both Map and Reduce functions.
The initial population of PSO is divided into SN subpopulations where each subpopulation is an input to the Map function. The output of each MapReduce iteration is an input to the next MapReduce iteration.
Map( ) function: After the master divides the population into subpopulations, each subpopulation is processed independently on each Map node. Each subpopulation information is updated on each Map. Algorithm 2 describes the applied steps at the Map function. There are two proposed steps to improve the PSO performance and avoid falling into the local optimum. First, newly updated velocity calculation formulas are used [33]. Second, the distance and smell operator from the FOA optimization algorithm is used to enhance the position of the particle to avoid the local optimum [34]. For the first step of newly updated formulas, it is based mainly on two cases. Firstly, if the particle is not the one with the best position, the position of the particle is updated by an experienced best position, Pbest, and other better particles instead of Gbest. This can be achieved by comparing the fitness value of X i with other particles. The particles with a fitness value better than x i are chosen in a setBest. One of the particles in setBest is chosen as the guide for updating X i based on a random number T . The particle X i with a fitness value greater than T is chosen as a parameter for updating the particle x i using the following equations: ,else The second case for updating xi when the particle refers to the best position. In this case, opposition learning is applied to more widely search the solution space. To guarantee more diversity of the population a convex combination between x i and another random particle in the population x k . Convex combination searches the space as much as possible to let the optimal solution be found. The process of convex combination and opposition learning is provided using the following equations: x new The position of the particle is updated in both cases. Afterward, distance and smell operators of the FOA algorithm are applied to improve the position toward the optimal solution in short movements. The resulting position is updated using the following equations: Then, Pbest is compared with the updated value of xi to determine whether the Pbest is to be replaced. For each subpopulation at each Map task, there is a local best position that represents the global best position for the subpopulation. The local best position for all subpopulations are merged further in the Reduce function to find the global best position for the population. The integer key and updated value of particle information are submitted at the end of the Map task.

Reduce() function:
In the Reduce function, all local best position particles are merged to determine the global best particle, which represents the best solution at the end of the iteration. At the end of the Reduce function, a new swarm with updated particles is emitted. Afterward, it is used as an input to the next iteration of the MapReduce job. The steps applied at the Reduce function is shown in Algorithm 3.

Algorithm 2 MR-FPSO(MAP function)
Input candidate services. Output (int,LP bestList) 1: The initial population of N particles. 2: for each particle in N do 3: Initialize random position x i . 4: Calculate fitness value fi for x i . 5: end for 6: Initial Gbest = max(f i ) 7: // Split population into SN subpopulations 8: // Map function 9: Calculate fitness value for all candidates in subpopulation using Eq.10. 10: for i =1 to SN do 11: Determine setBest.

12:
Calculate fitness value fi for x i . 13: if the best solution is NOT x i then 14: Calculate velocity v i . 15: Calculate inertial weight. 16 if fitness(LP best)¿fitness(Gbest) then 4: Gbest=LP best.

5:
end if 6: end for 7: for each particle in LP bestList do do 8: Update particle (key, value) information; x i , v i , fitness. 9: end for

Experimental Results Discussion and Analysis
This section presents the experiments employed for proving the validity of the proposed CQPC framework, the evaluation of the results, and the different evaluation metrics such as fitness and execution time. The related discussion will be presented in the next subsections.

REQUIREMENTS FOR EXPERIMENTS AND PARAMETER CONFIGURATION
The experiments are performed using the parallel principle of Hadoop 2.8.5 on a Hadoop cluster. There are five nodes , one for the master and the other nodes for the data. To implement the MapReduce concept, a reduction function with four maps is used. Windows 7 with Intel Core I 7, 2,40 GHz and 8 GB is used . The effectiveness and feasibility of the proposed approach is compared to other techniques, such as MR-GA [13], MR-IDPSO [14], MRNSGA-II [9], and MRPSO [12] for the ability to handle huge amounts of data. The WSDream [35] dataset is used for MR-FPSO simulations. It comprises 5825 concrete services invoked by approximately 339 users, with a response time and throughput of two QoS parameters reflecting services invoked by different users. Five abstract services are used to manage the vast amount of data. Every one has approximately 10,000 concrete services. The mutation and the crossover are set to 0.7 and 0.21 respectively for MR-GA.

Impact of The Number oF Iterations
As shown in figure 6, the average fitness value increases as the number of iterations increase with an acceptable rate rather than other techniques. This proves the feasibility of MR-FPSO to find the optimal solutions. The reason for that is using skyline operators and reliable measures such as entropy and correlation variation to reduce the solution space to have more reliable services. Another reason is the improvements in the position of particles in MapReduce jobs.   Figure 9 show that MR-FPSO outperforms other state-of-the-art techniques. The fitness value increases with the increase in the number of concrete services. Since using the service reduction module helps in reducing the candidate services hence achieving less execution time. The increase in the number of concrete services refers to increasing the number of available services to participate in the initial population. Therefore, the probability of increasing fitness values increases.  Figure 10 shows that the impact of increasing the used population size over the fitness value. The more increase in population size leads to more concrete services to participate in the solution. As a result, the number of reliable services toward the optimal solution is increased. More population size results in more reliable services with good quality to be used in the context of the composition process. Where Q ij denotes the observed QoS attribute value by the user.WhileQ ij denotes the predicted value of QoS and the number of values already predicted referred by N . with a threshold value of F CM = 0.01. The prediction error decreases when the number of clusters increases until a certain point, the prediction errors increase to an adequate level. This means that the number of clusters is determined in the experiments according to the nature of patterns to be clustered.

The Performance of Proposed MHAP Algorithm Against HAPA
The comparison between the MHAP algorithm and HAPA is shown in figure 12.
The results show that with the increasing number of top similar users, the error rate MAE decreased. On the other hand, the increasing number of similar users over a certain threshold due to the excessive increase in the number of used similar users would produce high error rates. In all cases, the MHAP algorithm provides a less error rate rather than HAPA. The reason for that is combining the fuzzy clustering with standard HAPA. This allows similar users to be located with each other in the same cluster. This provides more accuracy rather than the HAPA algorithm.

Conclusion and Future Work
With the rapid proliferation of web services over the cloud environment, the service composition process involves several issues and challenges. This paper proposed a novel Cloud-based QoS-Provisioning Service Composition (CQPC) framework to address service composition. Besides, the HBIQP technique is presented to prove the concept of the proposed framework and its applicability. Also, the MHAP algorithm that integrates the HAPA algorithm over the fuzzy clustering for accurate predictions is presented. Furthermore, Skyline operator is used over the MapReduce platform and information theory concepts to reduce the solution domain. Moreover, a proposed MR-FPSO algorithm is introduced to improve the global end-to-end solution. It provides parallel service composition efficiently and uses subjective trimming operators to provide diversity for the next generations of solutions. Simulation results showed that the proposed technique efficiently addresses the challenges of In the future, further enhancements for the efficiency of the proposed technique will be conducted. More efficient optimization for the service composition will be introduced considering the dynamic environment of service composition in largescale data.