A HRGO approach for resilience enhancement service composition and optimal selection in cloud manufacturing

: Cloud manufacturing (CM) establishes a collaborative manufacturing services chain among dispersed producers, which enables the efficient satisfaction of personalized manufacturing requirements. To further strengthen this effect, the manufacturing service composition and optimal selection (SCOS) in CM, as a NP-hard combinatorial problem, is a crucial issue. Quality of service (QoS) attributes of manufacturing services, as the basic criterion of functions and capabilities, are decisive criterions of SCOS. However, most traditional QoS attributes of CM ignore the dynamic equilibrium of manufacturing services and only rely on initial static characterizations such as reliability and availability. In a high uncertainty and dynamicity environment, a major concern is the equilibrium of manufacturing services for recovering their functions after dysfunctional damage. Therefore, this paper proposes a hybrid resilience-aware global optimization (HRGO) approach to address the SCOS problem in CM. This approach helps manufacturing demanders to acquire efficient, resilient, and satisfying manufacturing services. First, the problem description and resilience measurement method on resilience-aware SCOS is modeled. Then, a services filter strategy, based on the fuzzy similarity degree, is introduced to filter redundant and unqualified candidate services. Finally, a modified non-dominated sorting genetic algorithm (MNSGA-III) is proposed, based on diversity judgment and dual-track parallelism, to address combination optimization processing in SCOS. A series of experiments were conducted, the results show the proposed method is more preferable in optimal services searching and more efficient in scalability


Introduction
The current manufacturing global operations asks for more stringent requirements than ever before, such as customized demand, personalized products, strict deadlines, standardization of manufacturing processes, quick response and continuous service, as well as disruptive innovations [1].To address these challenges, Cloud manufacturing (CM) leverages distributed and heterogeneous manufacturing services to meet the customized manufacturing service requirements from manufacturing enterprises and customers via management and integration.In other words, CM utilizes existing information technology to improve control and collaboration of the manufacturing process, which benefits both the enterprises and customers.Specifically, advances in sensors and communications technology can provide the necessary technology to upload operation data and resources information from the physical facility (i.e., the physical world) to the cyber world of internet applications and software.Then, Cloud manufacturing system (CMS) uses advanced data processing and intelligent management to monitor them in the cyber world.At the same time, the actual operations in the physical world are handled at both the manufacturing process level and system operational level [2].Manufacturers registered on the CMS collaborate to improve their design, manufacturing, and marketing capabilities by sharing their material resources and technical capabilities, which resulting in high customer satisfaction.
Because of the ubiquitous diversity, heterogeneity, and complexity of manufacturing environment, if a single manufacturer cannot meet the requirements of the manufacturing service demanders (MSD), service composition and optimal selection (SCOS) is considered as one of the pivotal technologies that need to implemented in CM.Moreover, SCOS helps MSD to sort and select each candidate service according to specified criteria, such as business goals, strategic value, and resource constraints [3].The manufacturing resources and capabilities are virtualized by CMS and encapsulated as candidate services, which are then stored in a resource pool.When the manufacturing task submitted by MSD necessitates the cooperation of multiple manufacturers, CMS divides the manufacturing task into multiple sub-tasks and matches relevant candidate services for each sub-task from the resource pool and optimally combine these to achieve the highest overall quality of service (QoS).
The concept of resilience has been widely studied in manufacturing supply chain, service-oriented computing (SOC), and cloud computing (CC), but not in SCOS.For instance, Tapolcai et al. [4] introduced quality of resilience (QoR) as a new class of QoS attribute to describe both service availability and recovery performance.In addition to the expected events that drive manufacturing operations, at unknown points during manufacturing life cycles, unexpected detrimental simple events, or aggregation of events which detrimental effect QoR of manufacturing, may also occur.The uncertainty and dynamicity can be classified as one of the most important causes of operational disruption [5].These uncertainties, which are related to services, tasks and service correlations, may disrupt the execution of the composite services and may consequently harm the dynamic equilibrium of the submitted manufacturing task.Therefore, a new and measurable QoS attribute needs to be defined to describe the dynamic equilibrium of manufacturing services.
Although previous research identified preliminary motivations for resilience-aware manufacturing, a systematical SCOS approach that considers resilience awareness in CM still needs
to be established.The definition of resilience in SCOS should contain a statement that defines the concept of resilience of a service and a dynamic set.The dynamic set describes the equilibriums of a service throughout its execution.This set includes the abilities of a service to resist various detrimental factors that unfavorably interfere with the regular workflow.The resilience measurement of service should, by definition, consider various equilibriums.The resilience of a service, calculated according to this definition, should be identical to other attributes of QoS, and reflect the QoS.Thereafter, the resilience-aware SCOS process should select appropriate services to compile a list of candidates that are satisfied with the QoS expectation of the MSD.Then, the selected candidates need to be composited to obtain a service solution that realizes the power of resilience in manufacturing.
In this paper, to cope with the resilience-ware service composition in CM, we focus on developing the resilience-enhanced SCOS framework, and a novel hybrid resilience-aware global optimization (HRGO) approach consists of service filter strategy and modified non-dominated sorting genetic algorithm (MNSGA-III) algorithm is proposed to address the combination optimization problem in SCOS.The experiments of optimizing the cooperation partners selection strategy for automobile manufacturers is illustrated to demonstrate the efficiency of the proposed method.
There are three major contributions in this work.First, by evaluating the equilibrium values of manufacturing services, a resilience measurement method is established which can convert the service resilience to QoS indicator.Second, a service filter strategy based on fuzzy QoS model and fuzzy similarity degree is put forward to acquire the resilience satisfied candidates.Third, the MNSGA-III algorithm which can determine the diversity of population in each generation and choose the corresponding environmental selection mechanism is developed to solve the proposed optimization problem.The numerical results show that our method has better performance and scalability.
The remainder of this paper is organized as follows: The most significant research works relevant to SCOS and resilience-aware CM contexts are reviewed in section 2. The resilience measurement method, fuzzy QoS representation, and the definition of resilience are proposed and then mathematically formulated in section 3. The HRGO approach, including a local filter strategy and a multi-objective optimization algorithm, is introduced in section 4. A case study on the supply chain of automobile company is presented in section 5, the results of which verify the effectiveness of the proposed approach.Finally, section 6 concludes this paper and discusses the future research plans.

Related work
With the growing need of manufacturing for customized and personalized designs, the singlefunction service approach no longer has manufacturing significance because of its associated rigidity and lack of flexibility.In contrast, focusing on providing a more robust combination mechanism can lead to more scalability and flexibility through collaboration across various service providers.Considerable research has addressed the SCOS problem from different methodological perspectives.Huang et al. [6] proposed a genetic algorithm based on uncertainty to optimize the choice of services, and introduced comprehensive performance evaluation indicators to analyze the local and global performance of the algorithm.Zhang et al. [7]summarized the latest research on the new paradigm of manufacturing and service.Yi et al. [8] introduced the basic concepts of harmony search algorithm and presented a survey of the application of harmony search algorithm in the field of intelligent manufacturing.Suresh et al. [9] proposed the BF2VHDR optimization algorithm based on fluid dynamics to solve the optimization selection problem.Rahman et al. [10] proposed a privacy protection service selection framework for cloud-based service system.Cao et al. [11] established a service selection scheduling model that considers time, quality, cost, service and other factors, and combines fuzzy decision theory with ant colony optimization to solve the model.Huang et al. [12] studied the optimization of service combination selection, and proposed an optimization algorithm based on chaos control theory.This algorithm is less time consumption while obtaining the optimal solution.Lin and Chong [13] proposed a genetic algorithm based on resource constraints to complete project scheduling aiming at the problem of resource allocation in CM paradigm.Laili et al. [14] suggested that with the increasing number of alternative services with similar functions, QoS is considered to be the key criterion for selecting the most suitable service.Moreover, a hybrid resource allocation model and an improved niche immune algorithm are proposed.Liu et al. [15] proposed a service combination of 'multi-composition for each task' based on the global method to solve the incompetent composite services.Alrifai and Risse [16] proposed a solution that satisfies both global optimization and local selection to meet the QoS requirements and preferences of end-to-end users.Table 1 lists the main subjects of the studies published during period of 2015-2018 on SCOS.Most of the studies have focused on mathematical modelling or optimization of QoS-aware service composition, and were mainly related to time, cost, reliability, availability.However, the resilience of composite services has not been investigated to date.A resilient manufacturing system model designed to sustain operations under disturbances was proposed by Gu et al. [30].Furthermore, Zhang and Luttervelt [31] proposed the concept of resilient manufacturing systems as well as the guidelines for the design and management of such systems.Zhang et al. [32] proposed a grid manufacturing service scheduling process and proposed an architectural view of failure management.Tao et al. [33]confirm the identified research gaps and presented a comprehensive literature review on manufacturing service management.Shah and Babiceanu [34] proposed the concept of resilience of interdependent infrastructure systems, and compared the final system-level resilience with component infrastructure system resilience.Bouzary [35] promotes the development of methods for the study of deep uncertainties for resilient assessment, while retaining the same probability to express uncertainties about highly uncertain, unforeseen or unpredictable dangers in design and management activities.All these studies emphasized the lack of research in composite service resilience and fault tolerance.A formal definition of resilience, which addresses the ability to prepare for, absorb, and recover from actually or potentially adverse events, is shown in Table 2.Although several studied in the CC and SOC literature addressed the architectures, frameworks and adaptive approaches for improving fault tolerance of web service, these approaches cannot be directly implemented in manufacturing system because of the special characteristics of the composition.

Table 2. Manufacturing-based definitions of resilience.
Definition of resilience by the reference Reference Ability to prepare for actual or potential adverse events, Ability to absorb and recover from actual or potential adverse events [34] Robustness, Flexibility and Ability to recover [35] Ability to anticipate and Ability to adapt [31] Ability to anticipate, Recover rapidly and Ability to efficiently adjust [30] Identification of frequent and significant failure patterns, Quantification of interdependency related indicators and Empirically risk analyses [36] Ability to return to its equilibrium state and Ability to reach its original state [37] As mentioned above, SCOS is one of the core technologies for implementing CM.For complex mechanical product such as automobile, unlike the resilience problem in SOC, CC, and supply chain, resilience mainly considers the reasonable and robust scheduling between local manufacturing, interactive logistics and cloud trading.Specifically, despite the significance of resilience composite services, few studies have been based on conceptual or mathematical definition of resilience.The candidate services and research subjects that needed to be considered are massive, and thus, an efficient selection approach considering that simultaneously considers system resilience and QoS is still worthy of further research.
Based on the above analysis, this paper proposes a HRGO approach to assist CM to obtain excellent manufacturing services.Firstly, the problem description on resilience-aware SCOS is
summarized.Second, based on the fuzzy similarity degree, a services pool filter strategy is presented in detail to identify appropriate candidate services.Third, a global Optimization Algorithms, modified by NSGA-III, which performs in high dimension objects, is used to conduct the combinatorial optimization process.

System overview
As a service-oriented manufacturing model, CM originated from pre-existing current manufacturing models and corporation information technologies, supported by manufacturing ability virtualization, CC, Internet of Things (IoT), service-oriented technologies, and advanced computing technologies.To accomplish a complex manufacturing task, the requirements by MSD are decomposed into several sub-tasks, each of which can be completed by a candidate set.More specifically, this technology needs to sequentially solve the three following sub-problems, as show in Figure 1.Table 3 summarizes the notations used in problem representation.(3) SCOS: Finally, orchestrating the selected manufacturing candidate service from services pools to form a SCOS plan [38].Here, a candidate service is selected from each corresponding candidate set to composite services while ensuring that the overall service QoS is optimal.


The number of relevant variables to study reliability , , i j m en Endogenous attributes, 1, 2, , The mth endogenous attribute l The lth exogenous attribute ,, i j m


The weight of each exogenous attributes, 1, 2, , The expected cost from service demander The expected resilience from service demander

The definition of resilience in CM and the measurement of equilibriums
Resilience has often been defined as the ability of a partially disrupted manufacturing services to recover its key functions and components after, and even in the event of, a destructive event [5].Industry experts evaluate the resilience of a service by measuring their equilibriums, which reflect their abilities to resist different, unexpected, and detrimental factors.When the manufacturing service is in operation, unexpected and detrimental factors, that can cause breakdown of manufacturing services in the manufacturing life cycle, include task failure rate, workflow failure, resource failure rate, cybersecurity penetration, partner credit-default, and logistics delay.Intrinsically, resilience and two types of resilience attributes in SCOS were identified as follows: Definition: In CM, resilience-aware manufacturing service refers to a service process in which the service leverages the exogenous and/or endogenous equilibriums to return from the disruptive perturbation deviating state to the state of equilibrium.
Exogenous attributes:  The ability to withstand communication blocking and cybersecurity penetration;  The ability to withstand behaviors of breaching reputation and trust;  The ability of composite services to anticipate irresistible factors such as logistics delays Endogenous attributes:  The ability to successfully predict work flow level failure;  The ability of the composite services to operate within pre-defined parameters and condition;  The ability of composite services to reconfigure and adapt to changes at runtime According to the above, the resilience measurement of a service is composed of its equilibriums to resist various detrimental factors (manner of composition in section 3.4).The equilibrium of manufacturing services to resist certain detrimental factors is measured by the fluctuation of the corresponding measured value during the observation period.For example, calculating the equilibrium to withstand communication blocking and cybersecurity penetration requires the data transmission speed as input.Calculating the equilibrium to withstand breaching reputation is to measure the delay in delivery time when partners break faith.The equilibrium to resist a certain detrimental factors is equal to the integral of the absolute value of the disturbance value, divided by the maximum amount of change of measured value (that is, the maximum performance degradation due to the detrimental factors).

Mathematical definition: fuzzy QoS representation
QoS attributes provide a standard for measuring the QoS stem from CC and was later applied to CM. QoS is usually used to represent the utility value of web services based on customers' QoS demands.However, QoS is not suitable for estimating manufacturing services because it is not suitable for indicating manufacturing capabilities such as precision, matching relationship, and working life.These receive more attention in the manufacturing industry.Moreover, because of uncertain manufacturing environment, manufacturing industries tend to use fuzzy number to describe the capabilities of manufacturing companies.As a result, this paper uses fuzzy theory to extend the QoS model of CC into a fuzzy QoS model to describe the QoS.
The following are a number of the representations of manufacturing QoS attributes.
Where MS refers to manufacturing service, cost q is used to describe the service cost such as processing cost and logistics cost, time q represents the total service time from demand response to the final end, availbility q represents available time ratio of composite service, reliability q represents the satisfaction of the manufacturing service, reputation q is applied to describe the authenticity of the service.
To represent the fuzzy attributes of QoS, triangular fuzzy numbers and interval numbers are used to express the linguistic variables and value constraints respectively.The interval number can be defined as . The triangular fuzzy number can be expressed as

Mathematical definition: representation of optimization objectives
The optimization objectives of SCOS are always high-dimensional and correlated.Depending on the manufacturing background, several of the multiple-objective optimization can be converted to single-objective optimization using simple additive weighting (SAW).This paper adopts multiobjective optimization algorithms and mainly takes cost, time, reliability and resilience into account for the following optimization process.Since the QoS representations of different composition models are different (e.g., sequence model, parallel model, selective model, and circular model), and considering the case application of this paper, the sequence composition model was used.Considering the dynamic operating range of manufacturing decentralized network architecture leads to considering the QoS as a dynamic function, depending on node-in and node-out, respectively.The requested sets of inputs and outputs from a given service such (i.e., , Nin Nout

QoS
), Nin and Nout represent node-in and node-out, respectively.A resilience-aware basis for the SCOS scenario including the following QoS attributes. Cost: attribute includes all the inherent costs for the service execution as well as the associated preparations and adjustments.The price is function of the service implied and the manufacturing demand (i.e., inputs and outputs) and defined beforehand by its related service provider.
For the reach of the optimal solution, cost must be minimized and can be calculated from the maximum price that the service demander is willing to invest independently of the currency.
, , , . .( ) The duration is a function of the precision and quality of the equipment and a feature of the request.Identical to the price, duration is a parameter that is minimized and evaluated from the requested deadline.
, , , . .( ) attribute of a service reflects the ability to successfully complete manufacturing tasks at a given time and condition.The reliability is primarily driven by the global reliability of the service provider.However, the need for specifications and the use of a particular resource can also impact the ultimate reliability.The reliability is expressed as a percentage value and is measured against the minimum level of reliability accepted by the service demander., , attribute of this paper mainly consider the exogenous equilibriums of withstanding communication blocking, partner credit-default, and logistics delay, and the endogenous equilibriums of withstand facility failure prognostication error, equipment life-span prediction error, and employee work failure.The difference between reliability and resilience is that reliability is the property the service uses to measure the life of the system (i.e., the continued success of the system), while resilience is an attribute of the service, which is used to measure the insensitivity of the system to disturbances [31].Furthermore, the proposed resilience QoS mathematical model can also be used to solve any other combination problem.Only slight adjustment is needed according to the user's own application environment, and the proposed model is expected to obtain more competitive or superior results than existing QoS structures.The mathematical morphology of resilience is expressed in percentage as follows:

. ( )
. , , , . .( ) Based on the principles discussed above, the mathematical formulations of the resilience-aware optimization problem in CM can be represented as follows.

Hybrid resilience-aware global optimization approach
Virtual service pools represent the primary causal and most correlated attribute of resilienceaware SCOS [39].To increase the resilience of a manufacturing system and enhance the operating efficiency of manufacturing SCOS, the service filter strategy is proposed prior to the composition process to filter the redundant candidate services and prevent uncertain and dynamic hazards.This step of filter can also improve the efficiency of the subsequent optimization process.
The resilience-aware SCOS is a typical multi-objective optimization problem.The focus of this process is to acquire optimally composited services, which provide the resilience-efficient advantage and better satisfy MSD.Therefore, in this section, based on services filter strategy and MNSGA-III, a HRGO algorithm is proposed.The framework of HRGO approach is shown in Figure 2.

Services filter strategy based on fuzzy similarity degree
The CM needs to select the manufacturing services with matching functions and constraints from numerous manufacturing services according to the requirements specified by the service demander.Therefore, a fuzzy similarity degree theory is introduced to filter out redundant and inappropriate service.The literature [40] summarizes important concepts of fuzzy numbers, among which, "dominant" and "no difference" are two important concepts of fuzzy numbers.These concepts constitute a new method which is introduced as follows: Let a and b be two fuzzy numbers, originating from different QoS values.If there is overlap between a and b, these values can be considered as no different in the overlapping part; if there is a non-overlapping part between a and b, then, for each overlapping part, not a control b or b control a.The direction of domination is defined as Table 5.

Table 5. The direction of fuzzy numbers domination.
Case 1 In case of nonoverlap between fuzzy number a and b are the following 1.The area representing that a dominates b is defined by  Fuzzy number a is in the right-hand side of the fuzzy number b,  Fuzzy number b is in the left-hand side of the fuzzy number a.   Figures 3 and 4 illustrate the notions of the dominance and the no difference in the nonoverlap and the overlap cases, respectively.
Suppose there are the kth QoS parameter of ith candidate service of the jth sub-task in the form of triangular fuzzy number, , jk a and , jk b in the Eq (8) refer to a same QoS attributes of different candidate services in the same sub-task.The similarity degree between two candidate services a and b can be calculated by: , ,  depend on the QoS parameter preference of decision makers and can be computed by analytic hierarchy process (AHP).The fuzzy similarity degree is calculated on the kth QoS parameter value of the ith candidate service of the jth sub-task.The candidate service with outlier is filtered out, and the remaining candidate services participate in global optimization.

A modified NSGA-III based on diversity judgment and dual-track parallelism
In this section, the details of MNSGA-III are introduced to address the optimization problem as mentioned is section 3.4.The candidate service, filtered by the service filtering strategy, is used as input of MNSGA-III.Then, the optimal service composition is obtained.NSGA-III substitutes the reference-point-based environmental selection mechanism for the crowding distance operator in NSGA-II.This paper adds an environment selection mechanism based on clustering, while modifying the environmental selection mechanism based on reference points.By assessing the diversity of the offspring, one of the above two environmental selection mechanisms are applied for the selection of individuals from the offspring.The above steps can better retain the distribution characteristics of the population, and are more suitable to solve the problem of how QoS attributes interactively affect the final composition in SCOS.The following briefly presents the framework of MNSGA-III, which is followed by a description of the main components of the proposed algorithm.The framework of the existing NSGA-III and proposed MNSGA-III is shown in Figure 5.

Existing Algorithms
Each generation t of the NSGA-III procedure is briefly described as follows: first, the offspring population Qt is generated by crossover and mutation of the parent population Pt; then, the parent population and offspring population are merged into St.The population St is classified into nondominated levels Fi and the next generation is formed as . NSGA-III guarantees the diversity of optimal solutions through predefined reference points on a hyper-plane as shown in Figure 6.To achieve this, population members St is normalized by identifying the minimum value so that the translated ideal point becomes a zero vector.The definition of reference lines w corresponds to each reference point on the hyper-plane; then, each population member is associated

Framework of the proposed algorithm
For completeness, a brief description of the MNSGA-III is first presented in algorithm 1.The initial population P0 with N members is coded by filtered candidate services.The offspring population Qt is obtained by using a recombination and mutation operator.Then, Pt and Qt are combined to form St with a size of 2N.Non-dominant sorting is used to sort St into different non-domination levels, denoted by (F1, F2, …), thus forming St in the order of non-dominant levels, starting from F1 and ending with Fl-1.The following steps present the implemented improvements to NSGA-III.If |St| = N, St is directly used as Pt+1.If not, whether Fl is diverse needs to be calculated, and the corresponding environmental selection mechanism to select the next generation is selected.To achieve this, fuzzy cmeans (FCM) was used to divide Fl into K clusters.Then, the diversity of Fl was evaluated using the cluster quality index [41].If Fl has a poor diversity (i.e., R < Rflr), by calculating the perpendicular distance from a reference point to cluster center, a stray reference point can be found.Then, based on stray reference points, an environmental selection mechanism was applied to construct Pt+1.If Fl already has good diversity (i.e., R ≥ Rflr), an environmental selection mechanism [42] based on cluster centers was applied to construct Pt+1.

Two-layered reference points structure
To increase the diversity of the obtained solutions, NSGA-III generates a set of N reference points to help create offspring populations.Here, a two-layered reference points structure is adopted as previously suggested [43].The two-layers refer to boundary layer H1 and internal layer H2, respectively.The population size can be calculated as: For instance, if in a three-objective problem (m = 3), two divisions (H1 = 2) are chosen for each objective axis in the boundary layers and one division (H2 = 1) are chosen for each objective axis in the inside layers, then, a total of reference points are created.For clarity, these reference points are shown in Figure 7. Since these reference points are widely distributed over the whole normalized hyperplane, the obtained solutions may also be widely distributed in or near the Pareto-optimal frontier (POF), i.e., the corresponding approximate Pareto-optimal solution, thus, the method avoids falling into the local optimal solution.

Crossover, mutation, and elite-preserving mechanism
Crossover and mutation are conducted with two selected paternal individuals Pt to obtain offspring individual Qt.Crossover implies that two individuals each contribute a portion of their chromosomes to generate two new individuals.Mutation refers to a random change in a number of genes in a new individual.Specifically, the simulated binary crossover (SBX) and polynomial mutation are used for inheritance in this paper.Note that the use of differential evolution (DE) instead of SBX and polynomial mutation operators, as used in a previous study [44], has also achieved good performance.However, this study follows the advice presented with the original algorithm and uses
larger values of distribution index for the SBX operator.
To avoid the loss of well-behaved chromosomes within the population, the elite-preserving mechanism merges the parent population Pt and the offspring population Qt into a mixed population (Rt ← Pt∪Qt).With this mechanism, all paternal individuals compete with offspring individuals, and elite chromosomes will be preserved.Specifically, the dominant relationship between two individuals in the population is based on their performance with regard to specified objectives.The dominating principles are defined as follows: if

Diversity discrimination based on clustering
After normalization, at the last non-dominated level Fl, NSGA-III uses reference points based on environmental selection mechanism to enhance offspring diversity.This paper considers the distribution characteristics of the population in each generation, and then applies the corresponding environmental selection mechanism to select the offspring.Therefore, populations of the nondominant level Fl are clustered.It is worth noting that the number of clusters after clustering must neither be too large nor too small.This is because the environmental selection mechanism is affected by the number of clusters and will ultimately affect the distribution of Pareto-optimal solutions.When the number of clusters is too small, the individuals in each cluster occupy a wide solution space; thus, Pareto-optimal solutions cannot be subdivided or traversed along the entire POF.When the number of clusters is too large, although the solution can be subdivided and traverse the entire POF, it will cause high computational complexity.
Therefore, this study adopts the FCM algorithm, which continuously updates the membership matrix U and the clustering center V through an iterative scheme to achieve convergence of the objective function For The cluster center is updated as follows: Repeat Eqs (11) and (12) until to determine the diversity of the population.If the value of the clustering quality index Q is small, the individuals in Fl are concentrated and form a single cluster.When the value of Q is large, this suggests that population Fl has a widespread distribution.Only when the population Fl has a widespread distribution, a widely distributed Pareto-optimal solution can be obtained that represents the POF.The obtained Pareto-optimal solution can represent the POF of different regions.
Here,  is defined as the criterion to determine the diversity of the current population: Where, t Q represents the Q value of the population in the tth generation,  is a penalty factor that prevents undue oscillation of t Q as the population approaches POF, t   means that  is calculated at the tth generation and is used as threshold for the next  generations.If t Q is lower than t   , from 1  t  to t , the population is considered to have poor diversity; otherwise, the population is considered to have good diversity.

Environment selection mechanism
In this section, different environmental selection mechanisms are chosen according to whether the population is diverse.The difference is that if the current generation is diverse, the selection method of the offspring will be based on the distribution characteristics (clusters) of the population; if the current generation is not diverse, the selection method of the offspring will likely be based on the reference point.
If t t Q    (i.e., the population in the current generation has poor diversity), first, the distance from the nth reference point to the kth cluster center is calculated and denoted by matrix Φkn.Then, the sum of the distances from the nth reference point to all center points is calculated and denoted by Slot Slot Slot .
In the event of tt Q    (i.e., that the population in the current generation has good diversity), first, all individuals in Fl are assigned to each cluster center based on their Euclidean distance.Then, for all cluster centers, if the number of assigned individuals is greater than or equal to 3, the individual closest to the cluster center is placed in T1, and the other individuals are placed in T2.If the number of assigned individuals is less than 3, the individual closest to the cluster center is placed in T1, and the other individuals are placed in T3.Similarly, K individuals are selected to fill St according to their order of priority

Case study
A series of experiments were designed to test the performance of the proposed algorithm in section 5.1, experiment background is briefly described.Then, in section 5.2, the parameter setting of seven algorithms are listed.Finally, in section 5.3, three experimental methods which are metric of reference points, metric of Pareto-optimal points, and impact of number of candidate service were used to state the performance of comparison results.Among them, the aim of section 5.3.1 and section 5.3.2 is to evaluate the effectiveness of improvement strategies of multi-objective optimization algorithm for MNSGA-III.The aim of section 5.3.3 is to evaluate the effectiveness of the performance of SCOS capability for HRGO.

Order task chain model
To demonstrate the effectiveness of the proposed method, this paper investigates the SCOS in a group manufacturing enterprise (http://www.cheryinternational.com/) named Chery Automobile Co., Ltd.(Chery).The products manufactured by Chery are exported to more than 80 countries and regions across the world.The products of Chery are mainly composed of systems for automotive engine parts, automotive chassis parts, automotive body and accessory parts, and automotive electronic and electrical parts.Each component system consists of many sub-components that need to be selected for the best manufacturing service.
To meet the production requirements, this paper considers SCOS of Chery for the abstract network structure (NS) of automotive electronic and electrical parts systems.These include the manufacturing process of automotive body electronic control components and on-board electronics such as engine storage battery and control system (ECS).The information not only includes QoS attributes, but also processing data and after-sales data.At the initial stage of service optimal selection, information about customer demand should be analyzed to identify QoS constraints.Table 6 shows an example, with the collected QoS constraints of auto parts service of Chery's demand.Applying the services filter strategy as proposed in Section 4, the candidate services can be identified, as shown in Table 7.
Where ( , )   d v A represents the minimum Euclidean distance between v and the points in A.  Set coverage (C-metric) [50]: Let A and B be the two optimal solutions obtained by two algorithms. ( , )  C A B refers to the percentage of optimal solution in B that are dominated by at least one optimal solution in A.

 
|: ( , )  Optimality of the composition [51]: The metric represents the ratio between the value of optimization objective of the algorithm  Elapsed time: The metric refers to the convergence time of the algorithm or the time when the number of iterations to reach 400.The unit is seconds.

Performance metric of reference points
Recently, IGD is a way to compute multi-objective evolutionary algorithms (MOEAs), in which the reference points or reference directions are supplied.This paper uses IGD to perform a comparative experiment with algorithms containing reference points (MOEA and NSGA-III).With regard to the four optimization objectives detailed in section 3.4, Q1 and Q2 are the basic subjects of the SCOS, and Q3 and Q4 are advanced subjects, which need to be considered in conjunction with Q1 and Q2.Therefore, Q3 and Q4 are not only studied separately in the experiments.
The experiments are conducted on two-objective optimization problems: Q1&Q2, Q1&Q3, Q1&Q4, Q2&Q3, and Q2&Q4; three-objective optimization problems: Q1&Q2&Q3, Q1&Q2&Q4, Q1&Q3&Q4, and Q2&Q3&Q4; and the four-objective optimization problem: Q1&Q2&Q3&Q4.Twenty independent runs were conducted with each of the compared algorithms.Each run produced a set of non-dominant solutions.
Table 9 gives the comparisons between MOEA/D, NSGA III and MNSGA-III terms of IGDmetric.The better performance is marked in bold.Table 9 shows that for the NS problem MNSGA-III and NSGA-III are tied in terms of the IGD metric, followed by MOEA/D.For the two-objective problem, MNSGA-III slightly performs better than NSGA-III.However, in three-objective problems, NSGA-III performs better, we suspect that the reason is that MNSGA-III sometimes fails to capture the associate reference point in 3-dimensional objective space.MOEA/D consistently does not perform well in all dimensional versions of the problem.This observation is similar to that concluded in the original MOEA/D study [50] based on two-objective and three-objective problems.

Performance metric of Pareto-optimal points
However, such an IGD is not applicable to MOEAs without using reference points or directions, e.g., NSGA-II.For these algorithms, the many-objective optimization task is to identify sparsely
Table 12 shows that for the test instances Q2&Q3, Q2&Q4, Q1&Q2&Q3, and Q1&Q3&Q4, MNSGA-III has slightly worse C-metric than NSGA-III.Similarly, the reason is likely that the centroid of the cluster in MNSGA-III sometimes fails to capture the reference point.However, the final result obtained by MNSGA-III is slightly better than that obtained by NSGA-III with regard to the C-metric.Taking Q1&Q2 as example, an average of 63.1% of the final solutions generated by NSGA-III are dominated by those generated by MNSGA-III, while in only 21.3%, it is the other way around.To evaluate the resilience related to redundant candidate services and operating efficiency of this proposed algorithm, two scenarios are considered.In the first scenarios, the numbers of sub tasks are set to 10, 15, and 20, and the number of related candidate services varies from 30 to 120 with an increment of 30.In the second scenarios, the number of sub tasks is set to 10 and the number of related candidate services varies from 30 to 180 with an increment of 15.The stopping criterion for all compared algorithms is that the best fitness value has to remain unchanged over the last 20 iterations, or the maximum number of iterations is reached.
In the first scenarios, the box plots of optimality of the composition obtained with CGA, HGAFOA, FPSO and HRGO are plotted and compared by varying the number of candidate services.As observed in Figure 9, the sub-task is set to 10 and the candidate services is varies from 30 to 120 with an increment of 30, when the number of candidate services is low, the optimality of the composition obtained with HRGO is slightly lower than the optimal value obtained in the case of the HFAFOA algorithm, but with the increase of candidate services the HRGO remains more efficient than the HFAFOA and CGA algorithms and especially more efficient than FPSO algorithms.Indeed, the HFAFOA, CGA and HRGO uses a selection phase that is based on the QoS level to filter out candidate services that have a positive effect on the optimality of the composition.In Figure 10 with the sub-task is set to 15 and the candidate services is still varies from 30 to 120 and in Figure 11 with the sub-task is change to 20 and the candidate services is still varies from 30 to 120, we observe that the optimality of the composition for the HRGO and HGAFOA algorithms still close to 90%, when the number of concrete services increases.This discovery can be interpreted as that as the number of candidate services increases, the probability of high-quality candidate services being discovered also increases, and the probability of obtaining the close-to-optimal solution also increases.In addition, an advantage of the HRGO is that it provides a composition that is very close to the optimum, with a much more reduced elapsed time in comparison to the HFAFOA, CGA and FPSO algorithms.
In the second scenarios, the elapsed time of above four algorithms are evaluated.Figure 12 shows that the elapsed time of almost all four algorithms increases with the number of candidate services.This result can be interpreted to the time needed to find the suitable candidate services which belong to a feasible composition increases with the number of candidate services.Figure 8 also shows that the elapsed time of the CGA, HGAFOA and HRGO increases slightly with the increase of number of candidate services.The reason of reduction in execution time is the candidate services filter based on the fuzzy similarity degree used in the HRGO drastically reduces the number of candidate services before optimization process, namely, reduce the search space of compositions.The filtering process used in the HRGO is very efficient when the number of candidate services is abundance because of the significant reduction of the search space of the compositions.

Conclusions
SCOS is one of the most pivotal problems in the field of CM.To increase the ability of manufacturing enterprises to resist the interference of uncertainty, a hybrid resilience-aware SCOS is introduced.In general, three strategies are proposed to ensure consistency in the manufacturing process: The QoS attributes that refer to a system with the capacity to return to its state of equilibrium after disruptive perturbation are described and modeled mathematically.A candidate services filtering strategy, based on the fuzzy similarity degree, is proposed to filter services and enhance the resilience of CM.Based on diversity judgment and dual-track parallelism, the MNSGA-III is proposed to solve the combinatorial optimization problem.To validate the efficiency of the proposed HRGO algorithm for solving SCOS problems, a series of experiments were designed and conducted.The results showed that compared with MOEA/D and NSGA-III, in general, MNSGA-III performs well in terms of the diversity of the non-dominated solution.Compared with MOEA/D, NSGA-II, and NSGA-III, MNSGA-III performs well in the majority of experimental cases in terms of the quality of POF.Compared with CGA, FPSO, and HGAFOA, HRGO remains superior in terms of the optimality of the composition, at the expense of a negligible increase of the average elapsed time.
In the structure of the MNSGA-III, by judging the diversity of a population, the two-track parallel environment selection mechanism is used to select individuals from the offspring.When the population has good diversity, selecting individuals from different clusters can obtain the Pareto-optimal solutions in the distributed regions of POF.In contrast, when the population has poor diversity, it tends to use the reference point to select individuals, and Pareto-optimal solutions in distributed regions of POF can also be obtained.The advantage of this proposed algorithm is that it makes the best use of the

Figure 1 .
Figure 1.System architecture of CM.

( 1 )
Task Decomposition: breaking down each tasks published by MSD into sub-tasks, T is the task and j st is the jth sub-task of T, and J represents the total number of sub-tasks.Each sub-task has certain QoS requirement determined by the MSD, and only those manufacturing services that meet the QoS requirements are eligible to be candidate services for the final optimization step.(2)Service Swarming: swarming of alternative manufacturing candidate services in services

WhereAR
refers to the measured value corresponding to equilibrium A, A V represents real-time measured value of equilibrium A, E A V represents measured value of equilibrium A in consistent operation, a t and b t represent the establishment time and final time of the inspection.The denominator indicates the value of the maximum change of measured value during the inspection period.The larger the change of this value, the worse the performance of the service when it is attacked by detrimental factors and the worse the resilience.
where 2, () ij Q  represents the time of the ith candidate of the jth sub-task, , ( , ) transport duration incurred by the ith candidate service, which is also related to the mode of transport , ij t , and exp T represents the deadline imposed by the service demander.Reliability: The reliability 3, () ij Q  service reliability of the ith candidate of the jth sub-task,


the resilience of manufacturing service of the ith candidate of the jth sub-task, by Eq (1).M and L represent the number of endogenous attributes and the number of exogenous attributes.respectively; ,, represent the weight of each endogenous attributes and the weight of each exogenous attributes, respectively; , ij  and , ij  represent the weight of total endogenous attributes and the weight of total exogenous attributes, respectively; exp RS represents the resilience, as expected from the service demander.

2 .Case 2
The area representing that b dominates a is defined by  Fuzzy number b is in the right-hand side of the fuzzy number a,  Fuzzy number a is in the left-hand side of the fuzzy number b In case of overlap between fuzzy number a and b are the following 1.The area representing that a dominates b is defined by  The area belongs to a and is in the right-hand side of the overlap area, or  The area belongs to b and is in the left-hand side of the overlap area 2. The area representing that b dominates a is defined by  The area belongs to a and is in the left-hand side of the overlap area, or  The area belongs to b and is in the right-hand side of the overlap area.

Figure 3 .
Figure 3. Two dominance situations in the nonoverlap case.

Figure 4 .
Figure 4. Two possible situations of dominance and no difference in overlap case.


represent weight parameters that can be determined by decision makers, and the values of k  and k with a reference point by calculating the minimum perpendicular distance member to each reference line.By applying a niche-preservation operation, an appropriate number of members of the population is connected to a reference point.At last, K members are chosen, one at a time, from Fl to construct Pt+1.

Figure 5 .
Figure 5. Framework of the existing NSGA-III and proposed MNSGA-III.

Figure 6 .
Figure 6.Environment selection mechanism of existing algorithms (three-objective problem with p = 4).
M represents the object dimensions.Non-dominant sorting was performed for individuals in the mixed population Rt.Individuals within the population who are not dominated by other individuals are classified as non-dominant F1.The level F1 is then removed to identify new non-dominant individuals in level F2.This process is repeated until all individuals in Rt are divided into several non-dominant levels F1, F2, …, Fn.Individuals at the former level performed better than individuals at the latter level with regard to each objective.Therefore, individuals at the former level are preferentially preserved in the next generation.
from the ith individual to the center of the kth cluster: Euclidean norm.During iteration, the membership function is updated as follows:

n.
Reference points are sorted according to n  and individuals near stray reference points are selected (i.e., the top  points with the largest n  ).Individuals closest to the stray reference point are placed in 1 Slot .The cluster whose center is closest to the stray reference point is called the stray cluster, and the individual closest to the center of this stray cluster is placed in 2 Slot .Finally, K  points are randomly selected in the stray cluster and placed in 3 Slot .Finally, K individuals are selected to fill St according to their order of priority 1 2 3

Figure 8 .
Figure 8.An example of encoding for SCOS problem.
of optimization objective of composite service obtained with an exhaustive search algorithm

Table 8 . 1 SN
Definitions of algorithm parameters, where n represents the number of variables. and the number of the selected positions 1 L  .FPSOThe inertia weight coefficient 0.8 w  , and the acceleration factor c1 varies from 1.5 to 2.5 linearly, while c2 varies from 1.5 to 2.5.HRGOProbability of crossover 0.5

Figure 9 .
Figure 9. Optimality of the composition versus the number of sub-task is 10 and candidate services varies from 30 to 120.

Figure 10 .
Figure 10.Optimality of the composition versus the number of sub-task is 15 and candidate services varies from 30 to 120.

Figure 11 .
Figure 11.Optimality of the composition versus the number of sub-task is 20 and candidate services varies from 30 to 120.

Figure 12 .
Figure 12.Elapsed time versus the number of candidate services.

Table 1 .
The main research subjects in CM SCOS since 2015.
cs represents the ith candidate services of S, and I is the total number of candidate services. i

Table 3 .
Summary of notations.

Table 4 .
Linguistic variables and fuzzy evaluation.
of the ith candidate of the jth sub-task,

Table 6 .
Example of collected QoS constraints.

Table 12 .
Average set coverage between MNSGA-III and NSGA-III.