Reliability analysis and resilience measure of complex systems in shock events

: The working environment of complex systems is complex and variable, and their performance is often affected by various shock events during the service phase. In this paper, first, considering that the system performance will be affected by shocks again in the process of maintenance, the reliability changes and fault process of complex systems are discussed. Second, the performance change processes of complex systems are analyzed under multiple shocks and maintenance. Then, based on performance loss and recovery, this paper analyzes the reliability and resilience of complex systems under the intersecting process of multiple shocks and maintenance. Considering the direct and indirect losses caused by shocks, as well as maintenance costs, the changes in total costs are analyzed. Finally, the practicability of the proposed model is checked by using a specific welding robot system.


Introduction
As the continuous advancement of current industrial production systems in the direction of complexity, large-scale and intelligence, their functions are highly integrated, performance parameters are continuously improved, environmental conditions are more severe, and working loads are complex and changeable.Accurately assessing the reliability and resilience of complex systems is fundamental to reliable operation and safe service of the system.Many scholars have studied the performance change and reliability in shock environments.Ma and Trivedi [1] introduced a process to analyze the changes in performance and reliability of wireless broadcasting networks under different network parameters.Zhao et al. [2] analyzed the state changes of multi-state systems after various types of shocks, and proposed various replacement and maintenance strategies after failure.Dhulipala et al. [3] proposed a post hazard-event recovery model to forecast performance recovery trajectories from a single hazard to multi-hazard.Dehghani et al. [4] discussed the performance recovery situation of industrial systems subjected to stochastic hazards.[5] presented a novel reliability model for composite insulators, which considers the amplitude and duration of detrimental shocks, as well as the impact of self-recovery processes on the performance changes of composite insulators.Wan et al. [6] used a model to describe the dynamic performance degradation.Wang et al. [7,8] proposed more universal continuous and discrete degradation and shock models based on time and system state.Wang et al. [9] deduced the reliability of dynamic equilibrium systems and static equilibrium systems with multi-state protection equipment affected by external shock and internal degradation.Dong et al. [10] classified faults into hard fault processes and soft fault processes, and established a more generalized system reliability model under shock conditions.Zhang et al. [11] modeled the actuator degradation process through the Wiener process, and derived the corresponding actuator reliability.Zhao et al. [12] obtained indexes related to reliability indexes for multi-state equilibrium systems when subjected to external shocks.Wang et al. [13] assessed the failure probability and reliability of a k-out-n: F capacity-balancing system under shock environments with multiple sources.Ranjkesh et al. [14] assessed the system reliability considering the correlation between the appearance time of random shocks and shock strength.Anwar et al. [15] evaluated the stress-strength reliability of complex systems under the unified progressive mixed truncation scheme.Song et al. [16] analyzed the fault process and reliability of a multi-state k: F system with performance sharing, balanced requirements, and protection strategy operation mechanisms.Some scholars have studied the resilience and costs of complex systems.In [17], a resilience model integrating the effects of system recovery behavior, system hardening, and the severity of shock events was established to quantify the ability of microgrids to withstand shocks.Zeng et al. [18] conducted a time-static and time-dependent resilience model for energy systems under extreme shock events.Liu et al. [19] used a modified projection tracing model to assess the resilience for metro stations to flood hazards from three aspects: defense, restoration, adaptation.Somayajulu et al. [20] modeled changes in infrastructure performance under multiple disaster events.Tang et al. [21] expanded the optimization model of linear programming to model the resilience for urban railway transport systems under multiple shocks.Levitin et al. [22] evaluated and optimized the expected costs and task reliability of the warm backup system under external shocks.Dui et al. [23] discussed the relationship between resilience, importance measure and maintenance.[24] combined maintenance costs related to component and system failures with importance measure theory.Andrzejczak et al. [25] modeled corrective maintenance costs and replacement costs of vehicle and developed a simple model of vehicle fault.[26] analyzed the performance changes under the influence of maintenance costs and repair time simultaneously.Teixeira et al. [27] used generalized stochastic Petri nets to model offshore wind turbine systems and analyzed the changes in system performance and cost under various maintenance strategies.An optimal model for aircraft scheduling on air corridor ramps based on time and cost was presented [28].Dui et al. [29] analyzed the fault and maintenance cost management based on the importance measures.Given the costs related to energy and time, Chang et al. [30] proposed effective protocols for complex networks with pinning control.[31] referred to the costs of transforming system states as response costs, and classified it into operating costs incurred by executing actions and the effect costs that can measure the consequences of shocks.Yan et al. [32] proposed three optimization models aimed at balancing the bonus of discovering defects and the costs related to ship survey.
However, existing studies have mainly focused onthe performance degradation process and the maintenance process under one shock, or the performance degradation process after multiple shocks, and do not consider the intersecting process of multiple shocks and maintenance activities.In this paper, considering that the system performance will be affected by shocks again in the process of maintenance, the performance change processes of the system are analyzed.Then, system reliability and resilience are studied based on performance loss and recovery under the intersecting process of multiple shocks and maintenance.Finally, the direct and indirect economic losses caused by multiple shocks, as well as maintenance costs, are analyzed.
The reminder of this paper is structured as follows.Section 2 discusses performance change and reliability of complex systems under the intersecting process of many shocks and maintenance activities.In Section 3, we analyze the performance change process of complex systems under performance degradation and recovery processes.Then, the resilience and total costs models are analyzed.Section 4 illustrates the above models with a specific example of a welding robot system and Section 5 concludes this paper.

Reliability analysis in shock events
Consider external factors such as pressure, load changes, sudden changes in temperature and humidity, electromagnetic fluctuations, and vibration shocks as shocks that complex systems are subjected to during operation.Complex systems may be subjected to multiple shocks, and each shock will affect system performance.One reason for the performance degradation is the shocks, and the other is the natural degradation process such as fatigue and aging of the system itself.The natural degradation process is also influenced by external factors.The fault process of complex systems is shown in Figure 1.Assuming that the complex system has a total of (k + 1) states, the different states of the system are marked with integers between 0 and k.Among them, state 0 represents the optimal state, state k represents that the complex system is unable to run at all, and other states represent performance degradation states.The performance set corresponding to the states is { , , ⋯ , } .( ) represents the performance of the complex system at time t, ( ) ∈ { , , ⋯ , }.The probability that the complex system is in state m can be represented by ( ).The probability set is ( ) = [ ( ), ( ), ⋯ , ( )].
Even though the complex system does not receive the subsequent shock again, its performance will continue to deteriorate, and the time between state transitions during the process of performance deterioration will be shortened, owing to the continuous influence of shocks on performance.After the complex system's performance has been brought down to a fault condition, it must be repaired in order to increase the reliability.However, the system may experience shocks once more before its performance is restored to its peak level.This maintenance stops when a fresh shock occurs, and the system can only be restored after the system state drops to a failed state.The shocks and maintenance processes of the complex system intersect in this form, and the performance change of the system is shown in Figure 2. In Figure 2, represents the optimal performance.represents the performance in the fault state.The complex system is initially performing at its optimal state.As the shocks occur, the system state will transition from state v to state z (0 ≤ < ≤ ), and the values of system performance will also be reduced from to .Multiple shocks accelerate the transition time between states in a multi-state complex system.After multiple shocks, the performance state drops to the fault state .The system has to be repaired at this point.After maintenance, the system performance will be improved from to (0 ≤ < ≤ ).Before the system performance recovers to its optimal level, it is subjected to a fourth shock, at which point maintenance is interrupted and the system performance begins to decrease again.Otherwise, the system will continue to be repaired until its performance recovers to the optimal performance .The effect of shocks and natural degradation on the performance of complex systems can be chiefly seen in the variation of the state degradation rate.Let , be the complex system's state degradation rate from state m to state n after the arrival of the x-th shock event.Considering how the effect on performance adds up, we define that the value of , is related to the initial state degradation rate , .The expression for , is where is related to the system state just prior to the occurrence of the x-th shock event.is not related to the number and strength of shocks, but is defined as a constant that changes with state m.
is concerned with the strength of the x-th shock.The initial complex system is a system whose performance state is in its initial state without degradation or whose performance state has recovered to its best performance state after the shocks.
Even if complex systems have not failed, running at very low performance can also affect normal operation.To ensure the efficiency of completing work, a performance threshold is defined.When the performance is less than or equal to the threshold, it is necessary to repair the complex system.
The reliability here can be understood as the probability that the system performance is not lower than the performance threshold.So, the expression for the reliability is

Resilience measure of complex systems
Each shock will affect the system performance degradation rate and state transition process.Assume that both system performance degradation and maintenance processes change state by state and cannot cross intermediate states.
, ( ) represents the performance degradation rate from state m to state n (m < n).
, ( ) represents the performance recovery rate from state m to state n (m > n).The state transition process is shown in Figure 3. Assuming that the process of performance recovery does not depend on the quantity and strength of shocks, the performance recovery rate matrix remains the same regardless of the number of maintenances performed.The performance recovery rate matrix ( ) of complex systems from the beginning of the f-th maintenance to the time it is subjected to shocks again can be expressed as where represents the starting time of the f-th maintenance, represents the arrival time of the xth shock event occurs and represents the time when the system is repaired to its best performance state.Under various shocks, it may be repaired multiple times, so there may be multiple values for .The element with a value of 0 in the matrix ( ) represents that the system cannot change from a high state to a lower state, meaning that system performance cannot be reduced but can only be improved.
When = 1, the initial performance degradation rate matrix ( ) after the arrival of the first shock and before the arrival of the second shock or before repair is The element with a value of 0 in the ( ) represents that the complex system can only undergo state degradation after being subject to shocks, meaning that the performance cannot be improved but can only be reduced.
Under the background of multiple shocks, the performance degradation rate matrix ( ) from the arrival of the x-th shock to the arrival of the (x + 1)-th shock or before repair is From Eq (1), the expression for ( ) is Under the given initial condition ( ) = [1,0, ⋯ ,0], the probability of the complex system being in different states could be obtained by Eq (4).
where M represents the matrix ( ) or matrix ( ), ≥ 1 of complex systems.At time t, the average performance ̅ ( ) of complex systems is Set as the time when the performance drops to for the first time after multiple shocks.

Case study
A welding robot, as a complex system, is mainly used in the production activity of various automotive parts and can replace manual work in welding, cutting, or thermal spraying.Positioning accuracy is commonly used to measure the working performance of a welding robot.In robot arc welding operations, due to changes in the actual welding environment, such as tooling errors and welding thermal deformation, the welding gun moving according to the original trajectory can no longer ensure accurate alignment of the weld seam, which can lead to a decrease in positioning accuracy and even the inability to maintain the normal welding process.
Taking a six-axis welding robot as an example, suppose that the welding robot system is divided into five states: 0, 1, 2, 3, and 4. State 0 represents the optimal positioning accuracy of the system.At the initial running time, the welding robot is in the optimal performance state.State 4 represents the system has failed, indicating that the positioning accuracy error of the welding robot system is too large to finish tasks normally.States between 0 and 4 represent a gradual decrease in performance, meaning that the positioning accuracy gradually increases.To enable the welding robot to operate efficiently, state 3 is defined as the threshold state of the welding robot system.When the state is state 3 or state 4, the welding robot needs to be repaired.The positioning accuracy value corresponding to each state is = 0.01 , = 0.02 , = 0.04 , = 0.06 , = 0.08 .For the convenience of calculation, we simplify the positioning accuracy value , = 0,1,2,3,4 and obtain the performance value corresponding to positioning accuracy, which is = 1.0, = 0.8, = 0.6, = 0.3, = 0. So, the performance threshold = 0.3.After the welding robot system is subjected to shocks or maintenance, the system state deteriorates or recovers one by one in the five states.Figure 4 shows the state degradation and recovery process.Assuming that the performance degradation rate of the initial welding robot system is , = 0.25 , , = 0.34 , , = 0.30 , , = 0.22 , and the performance recovery rate is , = 0.51 , , = 0.48 , , = 0.62 , , = 0.46 , respectively.So, the matrix and matrix can be expressed as is only related to the state of the welding robot, and the values of vary when the welding robot is in different states.Table 1 gives the values of , = 0,1,2,3,4.Assume that the welding robot system is subjected to the first shock at the initial time, the second shock in year 4, and the third shock in year 25.
The value of , = 2,3 depends on the strength of the second and third shock, with = 2.3 and = 2.According to Eq (1), it can be calculated that , = 0.69 , , = 0.94 , , = 0.83 , , = 0.61 and , = 0.53 , , = 0.71 , , = 0.63 , , = 0.46 .So, the performance degradation rate matrices after being subjected to the second and third shocks can be expressed as At first, the welding robot system is operating at its best performance, so (0) = [1,0,0,0,0].Then, according to Eq (4), the probability for the welding robot being in various states during three shocks and two maintenance processes can be obtained.Figure 5 shows the probability changes.In Figure 5, (a) represents the probability change process of each state of the welding robot system after the first random shock.In year 4, the system can still complete its work normally and does not require maintenance.Figure 5(b) shows that the system has degraded to the state that requires maintenance in year 18.15.Figure 5(c) shows that the system begins to be repaired in year 18.15, but has not yet returned to its optimal performance state by year 25. Figure 5(d) shows that the system is subjected to the third shock in year 25 and degrades again to the state that requires maintenance in year 42.25.Figure 5(e) shows that the system is repaired again in year 42.25, after which the performance of the welding robot system is no longer affected by the shock environments.Figure 5(f) shows the probability changes in each state during the entire process mentioned above.
According to Eq (2), the reliability of welding robot system during three random shocks and two maintenance process can be obtained, as shown in Figure 6.As shown in Figure 6, the welding robot system has the highest reliability at the initial time.After being subjected to the first and second shocks, the system reliability decreases until it reaches its lowest value in year 18.15.Then, the system is repaired, and the system performance begins to recover.The reliability gradually increases.At year 25, before the system reliability returns to its maximum value, it is subjected to the third shock, and then the system reliability gradually decreases again.After the reliability is reduced to its minimum value in year 42.25, the welding robot system is repaired again.Afterwards, the system performance is no longer affected by shocks.
After obtaining the probability of the welding robot in various states, the average performance of the system can be obtained.According to Eq (8), the resilience curve of the welding robot system during two maintenance processes can be obtained, as shown in  Before year 18.15, the system suffers two shocks without any maintenance behaviors, so resilience does not exist.Figure 7 shows that the resilience of welding robot system continuously increases after maintenance.Figure 7(a) shows the resilience change of the system during the first maintenance period.At year 25, the first maintenance is terminated and the system cannot return to its optimal state, resulting in a low resilience value.After the third shock, the system performance continues to decline.At year 42.25, ̅ ( ) = , so ( ) = 0. Afterward, the system will no longer be shocked and its performance can continue to recover, so, as shown in Figure 7(b), the resilience continues to increase.
Table 2 shows the parameter values related to the total costs.According to Eq (9), the total costs of the welding robot system during three shocks and maintenance processes can be obtained, as shown in Figure 8.In Figure 8, the first and third identification points represent the time points of the second and third shocks, respectively.The second and fourth identification points represent maintenance to the system, respectively.After being subjected to shocks, the system will operate in a lower state and the losses of benefits will increase.Therefore, after multiple shocks, the growth rate of the total costs will increase.After maintenance, the system performance will be improved and the losses of system benefits will be reduced.So, the growth rate of the total costs will decrease.

Conclusions and future work
We first analyze the performance change for complex systems under shocks and maintenance.Considering the effect of system state and shock strength on system performance, different state transition rates can be obtained.A resilience model is established based on the performance loss and performance recovery of the system under the cross-process of multiple shocks and maintenance.After that, the economic loss costs and maintenance costs are analyzed.Finally, combined with a case study of a welding robot system, the feasibility and effectiveness of the model can be verified, which has certain reference significance for resilience and cost analysis of complex systems.This paper assumes that the state transition rate is related to the strength of shocks and the system state.In the future, we can conduct sensitivity analysis on the strength parameter and state parameter, and analyze the effect of parameter values on the performance and costs of complex systems.In the future, other processes can be considered to describe the state transition process of complex systems.

Figure 2 .
Figure 2. Performance change under shock events.

Figure 4 .
Figure 4.The state degradation and recovery process of the welding robot system.

Figure 5 .
Figure 5.The probability changes in each state.

Figure 6 .
Figure 6.The reliability of the welding robot system.

Figure 7 .
Figure 7.The resilience of welding robot system.

Figure 8 .
Figure 8.The total costs of the welding robot system.

Table 1 .
The values of .

Table 2 .
Parameter values related to the total costs.