Dynamic Voting Classifier for Risk Identification in Supply Chain 4.0

: Supply chain 4.0 refers to the fourth industrial revolution’s supply chain management systems, which integrate the supply chain’s manufacturing operations, information technology, and telecommunication processes. Although supply chain 4.0 aims to improve supply chains’ production systems and profitability, it is subject to different operational and disruptive risks. Operational risks are a big challenge in the cycle of supply chain 4.0 for controlling the demand and supply operations to produce and deliver products across IT systems. This paper proposes a voting classifier to identify the operational risks in the supply chain 4.0 based on a Sine Cosine Dynamic Group (SCDG) algorithm. Exploration and exploitation mechanisms of the basic Sine Cosine Algorithm (CSA) are adjusted and controlled by two groups of agents that can be changed dynamically during the iterations. External and internal features were collected and analyzed from different data sources of service level agreements and transaction data from various KSA firms to validate the proposed algorithm’s efficiency. A balanced accuracy of 0.989 and a Mean Square Error (MSE) of 0.0476 were achieved compared with other optimization-based classifier techniques. A one-way analysis of variance (ANOVA) and Wilcoxon rank-sum tests were performed to show the superiority of the proposed SCDG algorithm. Thus, the experimental results indicate the effectiveness of the proposed SCDG algorithm-based voting classifier.


Introduction
Given the array of uncertainties in supply chain management, supply chain performance is vulnerable to several risk factors. Supply chain risk is defined as the probability of a risk event that impacts the supply chain at the micro or macro level and leads to disruption at any stage of supply chain operations [1]. Risk management is the process of assessing and predicting risks to identify risk events to minimize or avoid their effects [2]. Supply risk can be categorized into disruption type or operation type [3][4][5]. The risk associated with natural disasters, such as earthquake or flooding, is difficult to control. Operational risk leads suboptimal or failed A precise analysis of managing risks through better methods is necessary for supply chain 4.0 risk management. Identification and mitigation processes are the main elements of controlling risk events and that includes the concept of understanding the reasons for risk probability and impacts. Management of risk is an essential component of risk analysis, and it can improve decision making for mitigating risks. A supply chain's profitability depends mainly on identifying and controlling external and internal factors through appropriate responsiveness, efficiency, and reliability [10]. To sustain firms' profitability levels, supply chains must respond rapidly to internal and external risk events to maintain their businesses effectively and dynamically [11][12][13]. Different researchers have applied several methods, including quantitative methods, for defining risk events in the supply chain's operational processes [14][15][16].
This paper focuses on managing risk in supply chain 4.0 by identifying, assessing, and mitigating external risk events. It proposes a voting classifier based on a Sine Cosine Dynamic Group (SCDG) optimization algorithm to identify and quantify external and internal risk events. The proposed method helps firms mitigate risk events. Exterior and interior features were collected and analyzed from data sources such as service level agreements and Kingdom of Saudi Arabia's (KSA) firms' transaction data to validate the proposed algorithm's efficiency. Experiments were designed to determine the proposed SCDG algorithm-based voting classifier's effectiveness using balanced accuracy and Mean Square Error (MSE) metrics. Results were compared with other optimization-based classifier techniques. The proposed voting SCDG classifier was compared with Particle Swarm Optimization (PSO) [17,18], Whale Optimization Algorithm (WOA) [19,20], Grey Wolf Optimizer (GWO) [21,22], and Genetic Algorithm (GA)-based [23,24] voting classifier algorithms. One-way analysis of variance (ANOVA) and Wilcoxon rank-sum tests were performed to test the proposed SCDG algorithm's superiority. This paper is organized as follows. Section 2 discusses the background and related work of this research. Section 3 explains the artificial intelligence methods for managing risk in supply chain 4.0. Section 4 discusses the proposed Sine Cosine Dynamic Group Algorithm. Section 5 describes the experimental results. Finally, Section 6 presents the conclusions of the study.

Related Work
Different external risk events affect supply chain performance and cause harm to a supply chain's internal processes, which can lead to severe financial issues that drag down firms [14,15].
The authors in [8] defined risk management as helping firms describe risk before it happens and trying to mitigate it in any possible way. Most of the previous studies have categorized risk management into three steps: risk identification, risk assessment, and risk mitigation. There have been multiple descriptions of the risk management process from different authors. The authors in [8,25,26] described risk identification as the first step in risk management, which can help the decision maker manage the risk if defined well.
The authors in [27][28][29] explained risk assessment as the method or system that helps a firm assess and evaluate the impact of historical data of the firm. In [16,30,31], the authors discussed risk mitigation as the method that helps the decision makers quantify risk before it occurs, thus allowing the firm to prevent it. It is essential to study the external risk of supply chain 4.0 to find a better method for measuring its impact on the supply chain. For example, DHL provides Resilience 360, which helps firms map the supply chain end to end and build a system for identifying critical risks by alerting stakeholders on time, which helps mitigate the risk [32]. Many recent machine learning techniques, such as [33][34][35][36][37][38][39], can be applied to such problems. Fig. 1 shows the importance of industry 4.0 in the supply chain, which can help firms to attain competitive and sustainable advantages. This paper focuses on quantifying risk events for improving supply chain 4.0 firms' decision making.

Supply Chain 4.0 Operation
Industry 4.0 refers to the fourth industrial revolution and a new model of an intelligent system that helps enterprises in the production and manufacturing environment. It emphasizes global networks in a smart factory for controlling and exchanging information [40]. Supply chain 4.0 consists of independent activities that are geographically separated, combined in various ways, and linked through varied companies, resulting in a capability to respond to consumers' necessities and needs. As shown in Fig. 2, the dependencies in a supply chain 4.0 include customers, vendors, devices, manufacturing plants, and other physical source systems [41]. Supply chain 4.0 is a disruption that causes firms to rethink the components, processes, and designs of their supply chain. In response to client requirements and the need for speedy for fulfillment, numerous strategies have arisen that have changed typical working techniques. Furthermore, the demand for naturalization and supply establishments can also be used to reach the next horizon of operational efficiency, establish the company as an electronic supply chain, as well as change the service provider right into a digital supply chain [42]. To take advantage of these trends and deal with changing requirements, supply chains need to become faster, more credible, and more accurate.
Different researchers have attempted to apply various techniques for defining risks, which is the first step in risk management [43][44][45]. The authors in [46] stated that artificial intelligence is the technique of the future, and it will help capture risk events automatically by finding the correlation between risk features and labels, as shown in Tab. 1. One of the significant challenges in previous studies was the lack of real data or visible data sources that help yield accurate results for defining risk events by categorizing the firm's decision when risks occur into three decisions: avoid, reduce, or accept the risks. The proposed framework quantifies the risk as either low, medium, or high, which helps firms make better decisions without uncertainty. Fig. 3 shows the internal and external risk events that impact the processes of supply chain 4.0. Interior/exterior Is the local or international venue where firms receive or dispatch their orders

Big Data Collection Process
Big data collection from different KSA firms was performed, and the data was pre-processed to make it more meaningful for the proposed technique, as depicted in Fig. 4. The first step is categorized into two stages. The first stage involved gathering information from SLAs, including the products' features and supply chain 4.0 attributes. The second stage linked the labels to the data's features and attributes by the firms' chief executive officer (CEO), as shown in Tab. 1. According to these features' values, the CEOs can manually identify the potential risks that the company faces. The next step logged the data into the Structured Query Language (SQL) format to read the data to identify the risk labels and then sort them into Object Linking and Embedding, Database (Ole DB). The tested dataset in this work comprised nine risk labels. To automatically define the firms' risks, the relationships between risk labels and their attributes are identified by the proposed voting classifier based on the SCDG optimization algorithm.

Sine Cosine Algorithm
The basic Sine Cosine Algorithm (SCA) was first proposed in [47] for optimization problems. The algorithm was based initially on the sine and cosine oscillation functions for updating the candidate solutions' position. SCA uses a set of random variables to indicate the movement direction and how far the movement should be in order to emphasize/deemphasize the effect of the destination and to switch between the cosine and sine components. SCA uses the following mathematical form for updating the positions of different solutions: where Xit is the current solution position in the ith dimension, and Pit represents the best solution in the ith dimension. The parameters r 2 , r 3 , and r 4 are random values in [0,1]. Eq. 1 shows that the agents' positions are updated using the position of the best solution. To achieve a balance between the exploitation and exploration processes in the SCA algorithm, parameter r 1 can be updated during iterations as: where t represents the current iteration; a is a constant; and t max is the maximum number of iterations.
The initial population positions with n agents in the SCA algorithm are randomly set up as shown in Algorithm (1). The objective function is computed in Step 5 for all agents to find the best solution's position. P in Step 6 indicates the best solution. Parameter r 1 is updated according to Eq. The original SCA algorithm shows high exploitation of the search space compared to a wide range of other meta-heuristics owing to its use of a single best solution to guide other candidate solutions. This makes the algorithm efficient in terms of memory usage and convergence speed. However, this algorithm may show slightly lower performance in problems with many locally optimal solutions. This motivated our attempt to overcome this drawback in the proposed Sine Cosine Dynamic Group algorithm.

Proposed Sine Cosine Dynamic Group Algorithm
The proposed optimization technique in this work is called the SCDG algorithm. The SCDG algorithm can be employed for risk identification in supply chain 4.0 based on an ensemble model. The SCDG algorithm starts by randomly generating several individuals, as shown in Algorithm (2). Each individual indicates a solution that can be a candidate solution to the supply chain 4.0 problem. After calculating the objective function F n for each agent X i , the best solution is selected and indicated as P.
The Dynamic Groups behavior of the SCDG algorithm divides all the individuals into an exploration group (n 1 ) and an exploitation group (n 2 ). The number of solutions in each group is managed dynamically with each iteration according to the best solution. The exploration group processes with n 1 agents, and the exploitation group with n 2 agents, as shown in Fig. 5. SCDG initiates the groups with 50% exploration and 50% exploitation. Then, the number of agents in the exploration group (n 1 ) is decreased, and the number of agents in the exploitation group (n 2 ) is increased.

Algorithm 2: Proposed SCDG Algorithm
However, suppose the best solution's objective function value did not change for three continuous iterations. In that case, the algorithm starts to increase the number of agents in the exploration group (n1) to get another best solution and hopefully avoid local optima. Fig. 6 shows the balancing between exploration and exploitation in the proposed SCDG algorithm during iterations. SCDG uses the Sine Cosine Eq. (1) for updating the positions of the exploration group (n 1 ) and the exploitation group (n 2 ). Parameter r 1 is updated during iterations as r 1 = a − a×t t max , where t is the current iteration; a is a constant; and t max is the number of iterations. At the end of each iteration, SCDG updates the agents in the search space, and the agent's order is randomly changed to exchange the agents' roles in the exploration and exploitation groups. In the final step, SCDG returns the best solution.

Experimental Results
This section details three different experiments and statistical tests that were conducted to verify the accuracy of the proposed algorithm. In the first experiment, Support Vector Machine (SVM) [48], Neural Network (NN) [49], k-Nearest Neighbor (KNN) [50], and Random Forest [51] classifiers were applied to identify the operational risks in the supply chain 4.0. The second experiment was designed to compare the proposed SCDG-based voting classifier with the bagging and majority voting ensemble techniques. The last experiment compared the proposed voting SCDG algorithm with Particle Swarm Optimization (PSO) [17], Whale Optimization Algorithm (WOA) [19], Grey Wolf Optimizer (GWO) [21], and the Genetic Algorithm (GA)-based [23] voting classifier algorithms to test the algorithm's effectiveness. The ANOVA and Wilcoxon's rank-sum statistical tests were performed to verify the efficacy of the proposed SCDG algorithm. Tab. 2 lists the configurations of the proposed SCDG algorithm and the other algorithms used in the experiments.

Metrics of Performance Evaluation
The AUC (area under the ROC curve) and MSE (Mean Square Error) metrics were employed in this experiment as performance metrics. AUC or balanced accuracy indicates the classification performance independently between class distribution [20]. For binary classification, AUC can be directly calculated as the average of sensitivity and specificity, resulting in binary predictions rather than scores. The balanced accuracy or AUC value is mathematically expressed as: ( 3 ) Figure 6: Balancing between exploitation and exploration in the proposed SCDG algorithm Mean Square Error or MSE indicates the performance of the classifiers. The MSE value is mainly based on the difference between the actual and the required value of the classifier' output using the following form: where n is the number of outputs when the h th training instance is applied, and d th x is the x th optimal output of the input neuron. When the h th training instance appears in the input, oh x is the actual output of the x th input neuron.

Results
In the first experiment, the output results for the single classifiers SVM, NN, KNN, and Random Forest (RF) are shown in Tab Fig. 7 shows the algorithms' convergence curves.  As seen from the results, the proposed algorithm obtains a better solution in minimum time. To confirm the proposed SCDG voting classifier's effectiveness with other voting classifiers based on PSO, WOA, GWO, and GA algorithms through visualization, Fig. 8 shows the respective ROCs. Tab. 6 lists the results for this curve. As shown from the output results in Tab. 6, the proposed SCDG classifier achieved an area under the curve of about 1.0. Therefore, the proposed classifier has a performance that can distinguish the data in supply chain 4.0 with a high AUC.

Statistical Analysis
The ANOVA test was first applied to identify the statistical difference between the MSE of the proposed SCDG voting classifier and other compared classifiers. Two hypotheses, the null hypothesis and alternate hypothesis, were formulated. The null hypothesis was ( ), and the alternate hypothesis was (H 1 : non-equal means). Tab. 7 shows the descriptive statistics of the data. The results of the ANOVA test are provided in Tab. 8. Fig. 8 also shows the ANOVA test results based on the proposed voting SCDG classifier's objective function and the compared classifiers. The results show that the alternate hypothesis H 1 was accepted.
Wilcoxon's rank-sum test was then employed to obtain the p-values between the proposed SCDG voting classifier and other classifiers. The main aim of this test was to determine whether the results of the proposed SCDG voting classifier and different classifiers had a significant difference. p-value < 0.05 means significant superiority of the SCDG classifier. If the p-value > 0.05, it means that there is no significant difference. Two hypotheses, the null hypothesis and alternate hypothesis, were formulated for this test also. The null hypothesis was , and the alternate hypothesis was (H 1 : nonequal means). The p-value results are presented in Tab. 9. The p-values were less than 0.05. This was achieved for the results between the proposed SCDG classifier and other classifiers. The results showed superiority of the proposed classifier and the statistical significance of the classifier. The alternate hypothesis H 1 was accepted.

Residuals vs. Fits Plot
The possible issues could be observed from the recurring values as well as the residual plots as opposed to the original dataset plot. Some datasets are not good for classification. The ideal situation is attained if the residual values are equally randomly spaced around the horizontal axis. The residual value is calculated as (Real value -Predicted value), with the mean and sum of the residuals equal to zero. Fig. 9 shows the residual plot. The heteroscedasticity plot, also shown in Fig. 9, can help discover violations of assumptions, thus boosting the credibility of the research study's findings.
Homoscedasticity describes a situation in which the error term (arbitrary disturbance in the connection between the dependent variable and the independent variables, or noise) is the same throughout the independent variables' values. The quantile-quantile (QQ) plot, shown in Fig. 9, is known as a chance plot. It is mostly used by plotting the quantiles and comparing them to contrast two probability distributions. As the figure shows, the points' distributions in the QQ approximately fit the line. Therefore, the actual and the forecasted residuals were linearly related, thus validating the recommended SCDG ballot classifier's efficiency in identifying operational threats in the supply chain 4.0.

Conclusion
Supply chain management systems' fourth revolution, called supply chain 4.0, integrates the manufacturing operations of the supply chain, telecommunication, and information technology processes. Supply chain 4.0 aims to improve supply chains' production systems and profitability; however, it suffers from different operational and disruptive risks. A voting classifier based on a proposed optimization algorithm is proposed in this paper to identify the operational risks in the supply chain 4.0. The Sine Cosine Dynamic Group (SCDG) algorithm is proposed. The mechanisms of exploitation and exploration of the original Sine Cosine Algorithm (CSA) are adjusted by dynamic groups that are updated based on some conditions during the iterations. External and internal features were collected and analyzed from different data sources of service level agreements (SLAs) and various KSA firms' transaction data to validate the proposed algorithm's efficiency. A high balanced accuracy or AUC and a Minimum Mean Square Error (MSE) were achieved compared with other optimization-based classifiers. The ANOVA and Wilcoxon-rank-sum tests were performed, which showed the superiority of the proposed SCDG voting classifier. Thus, the experimental results indicate the effectiveness of the proposed SCDG algorithm-based voting classifier.