A Novel Perspective on Reliable System Design With Erlang Failures and Realistic Constraints for Incomplete Switching Mechanisms

This study focuses on effectively designing reliable systems. Such systems are capable of withstanding failure events by applying multiple realistic workarounds. These workarounds include allocating redundancy, component reliability, and backup strategy that are considered concurrently as decision variables. In this novel view, the strategy and reliability of components are determined freely and independently to achieve optimality. The research problem is implemented in a general case to audit the capabilities of the proposed approach in realistic situations. This case deploys an Erlang time-to-failure probability density function together with incomplete switching. With its improved reliability and resource functions, the proposed model challenges the existing presumption regarding the superiority of the cold-standby approach in the mentioned field and provides a realistic trade-off between different redundancy strategies. This practical view reflects on reliability, cost, weight, and volume of the switch, simultaneously. The findings revealed that the proposed joint reliability-redundancy allocation problem, with an added freedom of strategy choice, outperforms the pure cold-standby counterpart. Owing to the NP-hard nature of the problem, a simplified particle swarm optimization algorithm is suggested and utilized as a solution method. The performance of the novel view is assessed using multiple benchmark instances including some typical problems from the literature. Our numerical analysis demonstrates the superiority of this approach with our maximum possible index reaching up to %96 compared to the existing results in the past works. Furthermore, the selected solution algorithm is compared with a differential evolution algorithm. We show that this simplified particle swarm optimization algorithm performs considerably better in all tested scenarios.


I. INTRODUCTION
Nowadays companies are in tough rivalries on an international scale. In this context, system reliability and availability are considered critical criteria in decision-making. In order to address this game-changing challenge mathematically, many studies in the past years have concentrated on reliability optimization problems which offer a trade-off between The associate editor coordinating the review of this manuscript and approving it for publication was Wei Liu. multiple characteristics of the system and the respective overall reliability [1], [2], [3], [4], [5]. A review of the literature confirms that three major perspectives are recognized to optimize the reliability/availability of systems: (i) Reliability Allocation, (ii) Redundancy Allocation (RAP), and the more recent (iii) Joint Reliability-Redundancy Allocation (JRRAP) [2], [3], [6].
Despite all the advantages of the first approach, there exists a known fragility that makes redundancy allocation more attractive to both researchers and system designers. In fact, reliability allocation relies on utilizing components with relatively high levels of reliability. Such components are either rare in the market or extremely expensive to acquire. The third approach, which has been in focus in the past few years, balances the benefits of the former two and has been shown to provide more efficient solutions. For a thorough review of the JRRAP approach, one may refer to the detailed literature review section in [2].
If the space and weight of these systems allow us, we may increase the reliability of the systems by adding a redundant to critical or sensitive parts/subsystems. On the other hand, if the budget is sufficient, we may choose more durable components. Here is where an appropriate trade-of between the above two workarounds becomes more important. In general, this practice is usually reasonable when we are not able to maintain the system on time due to the difficulty/complexity of maintenance operations. These difficulties may also include the cost of the repair operations which might be very significant compared to replacing the system as a whole. Furthermore, the JRRAP is a practical approach that may help us meet the minimum reliability sought after for the system in other situations, e.g., when replacing the faulty part is not possible at all owing to some access barriers related to the installation/operation location of the system (e.g., deep in the sea, in the air, or space). The JRRAP approach is applicable to a quite wide range of real systems, from marine systems to satellites to aerial vehicles, and may include other expensive devices that are required to be reliable for a specific period of time. However, in the meantime, we need to fulfill the duties without violating the available recourses, like space and weight, or even the budget. This reveals the necessity of studying JRRAP optimization with a specific focus on realistic modeling of the resource constraints.
From the redundancy point of view, two strategies, namely, active-standby, and cold-standby, are mainly considered by researchers. In the cold strategy, the redundant component is preserved in healthy conditions while not in use, whereas an active redundant component follows the same failure scheme as the one in use [1], [2], [3], [4], [7], [8], [9], [10], [11]. Most of the past studies in both RAP and JRRAP contexts have chosen one pre-known strategy for all subsystems, and the active standby strategy has always been more popular among researchers due to its unsophisticated formulations through Markov chains.
The first study to consider the cold standby strategy in RAP was the one published by Coit [9] in 2001, where he provided a tractable formulation and a lower bound for the overall system reliability [11]. Following this study, in 2003, he decided to give the optimization model more freedom to choose the optimal redundancy strategy for each subsystem [10]. By comparing the results of the new model with that of its predecessors, the significant improvements of the novel approach were visible, and this sparked a new stream in the RAP/JRRAP (for short xRAP) modeling. Table 1 summarizes the significant studies in the field of xRAP with the choice of redundancy strategy and compares their characteristics from multiple aspects.
Tavakkoli-Moghaddam et al. [12] was the first study after the original studies of [9] and [10] to investigate the effect of choice of strategy on the RAP. They shed more light VOLUME 11, 2023 on the crucial of the new stream for system designers and explained how, in real systems, different subsystems can be distinct from each other. Consequently, subsystems may be assigned different standby strategies. Their model was a single objective integer programming in which the redundancy of each subsystem was also included as an additional decision variable. With a reference to the inherent NP-hardness of the new model, it was solved by a robust genetic algorithm (GA). Safari and Tavakkoli-Moghaddam [13] solved the same problem with a memetic algorithm, and the same model was later reconsidered by Chambari et al. [7] in a multi-objective form which was solved by the well-known non-dominated sorting genetic algorithm (NSGA-II) and Multi-objective particle swarm optimization (MOPSO). Similar studies were also published by Safari [14] and Soltani et al. [15] using NSGA-II and Entropy-based algorithms for optimization purposes, respectively. The latter, however, considered a limited version of the model, and only the series-parallel structure was covered in the formulations.
An interesting extension to [12] was recently proposed by Sharifi et al. [16] where technical and organizational actions were added to the original model to give the model the freedom to alter the reliability of the components. The integer programming model of [16] included component cost and weight as constraints and neglected the volume. Technical and organizational actions were adopted earlier by Attar et al. [2] and Hamadani and Khorshidi [17] for multistate repairable systems. However, for the non-repairable binary-state systems, this indirect method of determining the component reliability is significantly restrictive, and for this reason, we embrace a direct approach (from [18]) with a set of real variables to represent component reliability.
In general, studies in [7], [12], [13], [14], [15], [16], and [19] all have provided numerical results that further supported the benefits of the extra freedom given to the model (i.e., redundancy strategy choice). Another extension to the original study of [10] was provided by Ardakan & Hamadani [18] in which the JRRAP was introduced with the cold strategy. They adopted the formulations from [10], considered exponential failure distributions, and optimized the model to achieve maximum reliability. Subsequently, some other studies suggested innovative methods for the xRAP problem with free distributed failure rates based on some available simulation modeling techniques, among which the customized optimization methods of [2] and [8] have achieved significantly acceptable results. The former explicitly focused on the RAP while the latter studied a more general JRRAP assuming that repairs are also possible. Recently, these simulation-based methods were further developed by Chambari et al. [20] to include mixed and k-mixed strategies for the series-parallel topology. Yeh and Wei [21] presented another simulation optimization approach for allocating resources under reliability concerns and discussed the financial aspects of unreliability of the system. The optimization part of their work included two algorithms, namely a revised particle swarm optimization (PSO) and a revised GA.
Since the initial introduction of the cold standby strategy for the JRRAP, it has been shown to be superior to the active strategy by many papers including [1], [18], [22], [23], [24], [25]. Among these, Ardakan and Hamadani [18], Juybari et al. [22], and Yeh [25] used the common approximation method from [10], but Kim and Kim [23] and Ardakan and Rezvan [1] adopted a more complex approach based on continuous Markov chains (CTMC) methodology. Although CTMC was successful in calculating the reliability values, the improvements were not attractive enough for other researchers like Juybari et al. [22] to replace the simple (yet efficient) lower-bound equation. That mainly seems to be because of the significant complexity of the CTMC concepts in comparison to the very good and well-known lower-bound of [10]. It is worth mentioning that all studies we mentioned have only considered the reliability of the switch and other characteristics of this important element have mainly been overlooked.
Aside from the new reliability calculation method, an interesting attempt has been done by Kim and Kim [23] that has been neglected by all recent studies. For the first time, they reported numerical results on a new restricted policy for the choice of redundancy strategy in the JRRAP. In their policy, the strategy for each subsystem was a dependent variable and was determined based on (i) the chosen redundancy, and (ii) the relative reliability of the components within the subsystem. The numerical results, however, were not comparing well with those of the pure cold standby from [18], and the superiority was still reserved for the traditional cold competitor. This raises a question on the reason behind this observation, i.e., why increasing the freedom of the model for choosing the appropriate strategy did not improve the overall reliability of the JRRAP, while it was completely successful in the traditional RAP context (given that RAP is clearly a special case of the JRRAP)? Hereinafter, this phenomenon in the JRRAP will be referred to as the freedom issue.
To the best of our knowledge, no study in the literature has addressed the above question. The system studied here is comprised of multiple subsystems with a simple or complex structure/topology. The goal is to define the system characteristics (i.e., number of redundant components, reliability level of the components, and the optimal redundancy strategy) while considering realistic characteristics of the switching components. Therefore, the main contributions of this paper can be summarized as follows: I. Introducing the joint reliability-redundancy-strategy allocation problem (JRRS-AP). II. Providing a general mathematical model for the JRRS-AP with series, series-parallel and bridge system structures under incomplete switching mechanisms and Erlang time to failure. III. Utilizing a set of real decision variables for the component reliability and forcing no direct dependency between reliability of component in the subsystem and its redundancy strategy. IV. Discussing the possible underlying reasons for the above-mentioned freedom issue and providing improved definitions for the cost and other resource functions to resolve the issue. V. Embracing all specifications of the switch units (reliability, weight, volume, and cost) in the decision-making process. With the case under study being an extension to the original RAP, it is already known to be NP-hard, and thus: VI. A customized version of the simplified particle swarm optimization (SPSO) is utilized for solving the mathematical model. The rest of the paper is organized as follows: The mathematical formulation of the reliability functions is presented and discussed in Section II. This section additionally includes some discussions related to the freedom issue and our proposed fixes. In Section III, we explain the solution algorithm for solving the mixed-integer nonlinear programming problem based on the SPSO. In Section IV, the overall performance of the proposed model and algorithm is numerically examined using new and existing benchmark instances. Another algorithm is also demonstrated here for performance comparisons. Eventually, some concluding remarks and future research directions are provided in Section V. VOLUME 11, 2023

II. MATHEMATICAL MODEL
We consider a system with M subsystems that have been arranged in either of the following three topologies: (i) Series, (ii) Series-parallel, and (iii) Complex/Bridge. Figure 1 demonstrates a schematic overview of three typical systems each with five subsystems under the mentioned topologies.
Before modeling the problem for the above topologies, we need to set out the assumptions of the current study as follows: • All subsystems are mutually interdependent. • Components of each subsystem are completely identical.
• At the start epoch, all components are new.
• Failure of each component in the system does not damage other elements of the system/subsystem.
• The statistical distribution of the Time-to-Failure (TTF) of the components is a k-Erlang with the k values predetermined and fixed.
• No repairs are possible for the components in the mission period.
• The process of switching from active to backup components is imperfect/incomplete. The general JRRAP model has widely been considered in the past [2], [18], [23], [25]. Here, we present a variant of the JRRAP with an extra set of independent variables, namely chosen strategy for each subsystem.

Maximize R (t; Er, n, s)
(1) Subject to : The objective R (·) is the overall reliability function of the system which is related to the mission time t; V (·), C (·), and W (·) are the respective volume, cost, and weight functions of the system under each topology and they are bounded by the amount of available resources (i.e., V max , C max , W max ); n = (n 1 , n 2 , . . . , n M ) is the redundancy vector for the M subsystems; Er = (Er 1 , Er 2 , . . . , Er M ) is the vector of failure distributions parameters; and s = (s 1 , s 2 , . . . , s M ) represents the strategy vector. In the rest of this section, we concentrate on defining the four new functions of the current model.

A. RELIABILITY FUNCTION OF JRRS-AP
First, we consider the reliability of each subsystem as a black-box and represent it with R i (t; Er i , n i , s i ) that takes the mission time, the chosen distribution, level of redundancy, and strategy of the subsystem as input and returns its reliability value. With this notation, calculation of the overall reliability of the system is very straightforward for each of the topologies under study. These calculations are summarized in Table 3 [7], [12], [13], [14], [15], [16], [18], [19], [23].
Note that, in the equations of Table 3, R (t) and R i are short forms of R (t; Er, n, s) and R i (t; Er i , n i , s i ), respectively. As seen in this table, the reliability of a system with simple series topology (Figure 1-a) is the product of the reliability of its components. For a series-parallel system (Figure 1-b), parallel parts work as alternative routes of flow. Thus, reliability of parallel subsystems is derived by multiplying the unreliability of them. Then, for the subsystems that are connected in series, we use the product rule we used for the pure series systems. However, the reliability formula of complex topologies like bridge (Figure 1-c) is not straightforward. For such systems we may need to use other techniques like reliability block diagram (RBD) and fault tree analysis (FTA) (see for instance [26]). For more details on the reliability formulas given in Table 3, one may refer to [27] and [28].
What remains for the reliability function to be complete is defining the above noted black-box for each subsystem. Since the failure pattern of the active component in each subsystem is independent of the backup strategy and number of redundant components, we can define its time-related function beforehand. Based on the assumed probability distribution for the TTF, the reliability distribution of a single binary-state non-repairable component is [7], [9], [12], [13], [14], [15], [16], [19]: In this equation, Er i = (λ i , k i ) represents the pair of scale and shape parameters of k-Erlang TTF distribution of components in subsystem i. For each subsystem in this paper, we choose one of the following strategies: Cold-standby (C), Active (A), and No-redundancy (N); The reliability function of each strategy is defined as follows:

1) STRATEGY (1), NO-REDUNDANCY
A subsystem with no-redundancy (i.e., s i ∈ {N }) is comprised by only one component, and thus its reliability function is 51904 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. equivalent to the function of a single component that we defined earlier: The reliability distribution function of a subsystem with homogenous active components has widely been used in the literature and is given by [12] and [25]: where n i is the redundancy level for subsystem i.

3) STRATEGY (3), COLD-STANDBY REDUNDANCY
The reliability function of a subsystem with cold standby components connected in parallel was originally derived and described by [9]. In general, when switch failure is possible (just as assumed in this paper), two scenarios are anticipated for the switching process: 1) Continuous failure detection and switching, 2) Switch activation only in response to a failure. According to these scenarios, the respective reliability function of such subsystem is given by [7], [12], [13], [14], [15], [16], and [19]: where ρ i (t) and ρ i are failure-detection probability of the switch unit of subsystem i at time t under scenario 1 and 2, respectively. Probability density function (pdf) of the j-th failure arrival for subsystem i is represented by f i , which is in fact sum of j iid component failure times. When the failure-detection probability is close to 1 (which is usually the case in real systems), it was proven that (8) can be efficiently approximated by [7], [10], [16], and [25]: and Given the Erlang distribution that the reliability function follows, we may further simplify (9) as shown in (10) [7], [10], [16], [25]: In order to summarize the subsystem reliability function, we introduce the following comprehensive equation: Here, 1 (x) function equals 1 if the argument condition x is true and 0 otherwise. Eventually, substituting the subsystem reliability function from (11) in (4) results in the overall reliability of the system under study with the desired topology.

B. RESOURCE CONSTRAINTS
In this subsection, we concentrate on introducing the definition of the resource constraints of the JRRS-AP with respect to the freedom issue. Reference [11] elaborated on an issue that designers may face when cold standby redundant components are utilized in the system. They explained that, in real systems, such components will gradually encounter degradation which will eventually lead to component failure (even without putting any load on it). From the results of this study, we may conclude that a pure cold standby strategy like the one assumed in [1], [9], [18], [23], and [25] cannot be reached in practice unless either very high-quality components are purchased, or some extra maintenance procedures are put into practice to preserve the conditions of the backup components while they are not in use and to avoid their deterioration.
Undoubtedly, the first option requires components that are significantly higher in price than those simple components with a normal depreciation scheme that can only deliver the active strategy functionality. The second option, on the other hand, suggests that the cold strategy will cost considerably more than its active counterpart. In either case, the extra cost must be addressed in the cost function of the problem. Otherwise, the results of the trade-off between the active and cold strategies will be biased in favor of the cold strategy.
The selection of a strategy may have an impact on not just the cost function but also other resource functions. Studied like [29] covered faults of actuators or cyber-attacks that may lead to failures in remote unmanned aerial vehicles or remote robotic sites. Zhang and Dong [29] have shown the necessity of taking these failures into account when designing such systems. Unless we completely eliminate the switch component (like in the prescheduled redundancy strategy [30]), this unit may occupy a portion of the current available space and its weight may not be negligible. The performance of the switch system undoubtedly relies on the fault detection sensors [31]. Thus, ignoring the sensors characteristics keeps the model far from realistic applications. VOLUME 11, 2023 A redundancy allocation problem in a satellite unit and the allocation challenges in airborne integrated avionics systems are two instances of such constrained systems [32]. In a satellite or an unmanned aerial vehicle, every added circuit counts for the total weight / volume. As a result, the model deviates greatly from reality by assuming the same weight function for the active and cold-standby strategy, and in turn, this renders the final design unusable. All considered, we propose constraints (12)- (14) for the volume, cost, and weight of the system by which we mitigate the above-mentioned issues in our proposed model. (14) The first part in each summation in the new constraints is the typical expression for the respective constraint considered by a number of Active and Cold standby studies in the literature [1], [9], [10], [18], [23], [24], [25], [33]. However, the second part of the summation in (12) represents the weighted volume of the switch where a one-function was used to assure that this part only gets involved when the chosen strategy requires a switch, i.e., cold standby strategy. Compared to the typical mathematical formulation in the literature, the weight function in the constraint (14) has also undergone a similar modification to incorporate the choice of redundancy strategy. In the proposed cost function, on the other hand, we followed a scheme close to the one originally defined by [9] for the main components, but this time we applied it to address the cost of switch components. Here, γ i and θ i are non-negative real coefficients and 1 ≤ θ i ≤ 1.5 for all subsystems. Similar to the original cost function in the literature, the added part is correlated with the reliability of the switch unit of each subsystem, that is because η i (t) is assumed to equal ρ i (t) and ρ i for scenario 1 and 2, respectively.

III. SOLUTION METHODOLOGY: PARTICLE SWARM OPTIMIZATION
It is clear that the basic RAP is a simplified case of both JRRAP and JRRS-AP. As a result, both offspring inherit the NP-hard characteristic of the original simple RAP [1], [4], [12], [18], [34], [35]. In addition, the nonlinear objective function in the current model is considerably more complex than the basic RAP which makes it practically unsolvable using the traditional exact methods.
Consequently, we have chosen a metaheuristic approach for this part of our study. The xRAP has been solved by a wide range of evolutionary algorithms, among which GA (and its variants) is by far the most used method. However, it was shown by [25] and [34] that swarm-based algorithms can compete well and even outperform GA-based competitors in RAP and JRRAP contexts, respectively. In this section, we adopt and customize the variant of PSO algorithm introduced by [34] to handle the proposed model.
PSO is one of the most well-known nature-inspired, population-based optimization algorithms with a broad range of applications [36]. Since the initial introduction of PSO by [37] it has undergone many rounds of modifications to decrease its complexity. One of the early attempts was the fully informed version introduced by [38] which was tagged by the authors as ''simpler, maybe better''. The general idea behind the algorithm is to simulate the movement of intelligent particles or entities (e.g., a group of birds) in order for finding the best position in the surrounding environment by assuming particle awareness of own history of positions and the quality of the position of other members in the population. In the traditional PSO, a swarm is the population with a fixed number of particles (N p ) each with two main properties, namely, velocity and position that are updated in two phases. First, given the best personal position (P Best ) and the best position found globally by the swarm (G Best ), the velocities of all particles are updated using the following equation [34], [37], [39]: where v ij (t) and x ij (t) represent the velocity and position of the j th dimension of particle i, respectively; rand and rand ′ are two distinct uniformly distributed random variables between 0 and 1; And, c 1 , c 2 , and w are fixed parameters of the algorithm. In order to avoid far movements from the current position, lower/upper limit (i.e., V max j ) is also imposed for the velocity of each dimension. Afterwards, the new position of each particle is determined based on its new velocity [34], [39]: We can highlight a few points to clarify the differences between this classic version of PSO and the new version in [34]. The process of updating particle location is where this approach and the new one in [34] diverge the most, with [34] performing it all in one step. According to (15), in the classic version, each particle is aware of the global best and hence picks the new velocity in such a way that all particles' new positions tend toward the G Best of the previous iteration. In contrast, the new version in [34] employs a new dynamic pattern for the particle movements and distributes the information across the particles differently. In this particular pattern, sub-populations of size k p N p (0 < k p < 1) are randomly selected and the new positions for particles of 51906 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.   In the extreme case of k p → 1, the new pattern identically mirrors the one we explained earlier and will suffer from the same propagation issues mentioned for PSO. For this reason, Kong et al. [34] advocate for a relatively small subpopulation size parameter. Kong et al. [34] have experimentally shown that k p ∈ [0.3, 0.5] brings about the best balance between the exploration and exploitation of the algorithm. As mentioned earlier, generating solutions in the PSO version adopted in this study from [34], is done all in one step. Just like the original algorithm, here, each particle knows its current position. And, after the grouping stage in each iteration, it is also informed about the best position discovered by its peers in the group (i.e., K Best ). Another difference between the current pattern and the classic PSO is in the history kept for the movements. Here, the velocity characteristic and the personal best values are abandoned and thus take up no memory. Equation (17) mathematically illustrates this new solution updating method: where x L j and x U j represent the lower and upper boundaries of the j th dimension of the search space, respectively. Here, K Best i = (K Best i,1 , K Best i,2 , . . . , K Best i,J ) is the best position visited by particles in the sub-population selected for particle i. So as to make sure that even the best particle in the group explores a position different from its own, K Best i is replaced by a random particle when calculating the new position for the best particle itself. The first case in (17) is added to the new pattern for improving the diversity of the swarm. Parameter p d is a very small disturbance probability (p d ∈ [0.02, 0.1]) by which the particle visits a random position in the solution space that is completely unrelated to the positions explored so far. This solution generation method significantly avoids the traditional propagation of the local optima in PSO by keeping an acceptable amount of diversity among the particles. Figure 2 schematically compares the movement of particles in the traditional PSO and the presented SPSO. The following subsections include problem-specific aspects of the algorithm, such as the fitness function, constraint handling mechanism, and the particle definition that are proposed in this paper for the presented JRRS-AP problem.

A. PARTICLE DEFINITION
In the current problem, each particle has three properties (i.e., allocated redundancies, chosen strategies, component reliability distribution specifications) each of which has M dimensions. Accordingly, a matrix of size 3 × M is used to represent each particle. Note that the first two from the above listed properties are integers and PSO movement equations (both (16) and (17)) are in general defined for continuous variables. To overcome this limitation, we utilize the workaround proposed by the original paper [34] and replace the integer variables with restricted real variables. However, when saving the outcome of (17) into the new variables, a round function is used to make sure the stored values are integers. For the strategy variable, x L and x U are set to be 1 and 2; where the former denoted Active strategy while the latter is the Cold Standby. The third strategy choice (i.e., Noredundancy) is allocated to subsystem i when and only when n i = 1. Thus, it can be implied from the redundancy level and does not need to be considered directly in this range.

B. FITNESS FUNCTION AND CONSTRAINT HANDLING MECHANISM
The particles we defined in the last subsection, may have a position outside the feasible zone of the problem. The most popular method for guiding the algorithm to the feasible zone is imposing penalties on constraint violations [18], [34], [40]. This is mainly done by substituting the objective function with a fitness function which takes into account both the objective function and the penalty functions. The fitness function for the proposed JRRS-AP is defined as in (18) using a relatively big penalty coefficient = 1e11: where g(·) is the slack value of the cost, volume and weight constraints, and is calculated as g (c) = C max − C (t, Er, n, s), , s), respectively. Given that the slack value is the unused amount of each resource, the penalty is imposed only when g is negative, i.e., that resource is used more than the maximum limit and thus the constraint is violated.

IV. NUMERICAL EXPERIMENTS
In this section, we evaluate the performance of the proposed model and SPSO algorithm on some benchmark problems from the literature. The experiments include one instance for each of the topologies defined in Section II. The specifications of these problems are given in Table 4.
Note that in this table scale factor σ was added to the original instances. That is to make sure that the results of [1], [18], [23], and [25] are still feasible under the new resource constraints and can be used for comparisons. This factor was experimentally fixed on 1.62, 1.41 for the series and complex problems, and σ = 1.49 was considered for the series-parallel topology. Moreover, to comply with the experimental results reported by the references, k parameter is set to 1 for all subsystems. The majority of studies in this field have considered a constant function for the switch reliability that equals 0.99 for all values of t. In this study, we propose a more realistic time correlated function for ρ(t) which increases the generality of the experiments. Here, the failure-detection probability function of the switch is an exponential distribution with λ s = 9e-6. For the typical mission time of 1000, ρ(t) ≃ 0.99 and this keeps our numerical solution comparable to literature. The algorithm is coded in the C++ programming language and was compiled and run using Microsoft Visual Studio 2017 on a Core i7 (1.6 GHz) Windows 7 machine with 4GB of RAM. Algorithm parameters N p , K p , and p d are experimentally set on 200, 0.4, and 0.05, respectively.
Although [34] reported the results for 1000 generations, our experiments showed that (with the current swarm size) the proposed algorithm approaches the global optimum fairly well even with 500 iterations. Given these settings, Table 5 provides the reliability values of the final best solution by our model and SPSO algorithm in 30 independent runs under each of the benchmark problems defined in Table 4.
For each benchmark problem, Tables 6-8 compare the best and worst solution found in the above 30 runs with the best solutions found by the references. Here, the maximum possible index (MPI) is used to quantify the superiority of the proposed model against the best of its existing counterparts. This index is calculated by: We have summarized the parameter settings of the optimization algorithms of each reference paper in Table 9. For more details about each algorithm in this table, one may  refer to [1], [18], [23], and [25]. Furthermore, to provide better comparison on the performance of SPSO in the defined problem, we reported the results of solving the proposed model using another algorithm as well.
Differential Evolution (DE) is one of the most well-known metaheuristic algorithms which has compared well with many of its counterparts [41]. For this problem, we chose the standard DE/rand/1/bin from [42] and used the same solution representation and fitness function that we defined for SPSO. Figure 3 illustrates the pseudo-code of this benchmark algorithm with its three steps, namely, mutation, crossover, and selection. We have set the crossover rate (CR) and scale factor (F) empirically to 0.2 and 0.85, respectively. We also observed comparable results of this algorithm when using a population of size 400 and running it for 500 iterations. Thus, these settings are used for all results reported for DE in Tables 6-8. For more detailed explanations on the process of this benchmark algorithm one may refer to the comprehensive review provided by [42].
The provided numerical results suggest that the proposed joint approach (together with SPSO algorithm) is far more effective than its predecessors in enhancing the reliability of the systems. The MPI of our model ranges from about 63 to slightly more than 96 percent. Unlike the restricted policy in [23], our model did not require lowering the reliability of the switch unit, nor did it force restrictions on the strategies of subsystems, and the algorithm has freely chosen a mixture of strategies for the problem. By checking the slack values of this study and those reported in the literature, we can observe that the most important resource for the JRRAP was the budget of the system which is the restricting bottleneck for the current JRRS-AP as well. The STD deviation of the results, on the other hand, is considerably low which reflects the high performance of VOLUME 11, 2023    the presented SPSO algorithm in solving the mathematical model.
It is important to note that the three discission vectors (i.e., redundancy vector n, the strategy vector s, and reliability 51910 VOLUME 11, 2023   Pseudo-code of the benchmark differential evolution algorithm [42]. vector) should all be interpreted together. For instance, comparing the redundancy vector of SPSO with that of [25] in Table 8 shows a noticeable increment in the number of redundant components in the system. However, the reason behind the fact that both solutions are feasible become evident only when we involve the corresponding strategy vectors in our comparison. In fact, SPSO has chosen an active strategy for the first three subsystems which eliminated the necessity of adding switch components. In other words, this observation reveals that, in some cases, simply adding more active redundant components might be more effective for increasing the overall reliability of the system. So as to further examine the performance of the proposed approach, we define a new set of benchmark problems with the Erlang distribution considered (Table 10). Here, unlike the previous benchmarks, we assume a non-exponential k-Erlang failure distribution with the shape parameter, k = 2. For this set of benchmarks, the α i , v i , w i , and β i are set to their corresponding values in Table 4. Additionally, except for the γ coefficient, switch components' specifications are as we defined them in the first set of benchmark problems (i.e., Table 4). The optimum solutions to these new benchmark problems are presented in Tables 11-13 using SPSO and DE  algorithms. To provide a comparison between our model and the traditional cold standby approach, we solved the current model with no active strategy (i.e., Cold or No-redundant only) and reported the relative results of this restricted problem in Tables 11-13. It is worth noting that to make these tables more concise, the cold standby column only includes the results obtained by SPSO because, in our experiments, they were significantly better than those of DE. These results strongly support the conclusions we drew from the initial set of benchmarks. The optimal solutions all have a combination of Active, Cold, and No-redundant strategies. This observation,   one more time, discloses the effectiveness of the proposed approach.
In addition, the numerical comparison against the restricted (cold) model shows a considerable MPI for all three topologies in the Erlang version of the benchmarks, too. Comparing the MPIs achieved by DE with that of SPSO reveals that the latter performs significantly better. In addition, the slack values of these algorithms indicate that SPSO is superior to DE in utilizing the available resources. Therefore, both sets of benchmark problems that we provided in the numerical experiments suggest that the novel approach is significantly more efficient than the traditional cold standby under both Exponential and Erlang failure distributions. Figure 4 graphically summarizes this study from multiple aspects including model specification, novelty, solution methodology, and some potential use cases of the proposed approach.

V. CONCLUSION
This paper presents a new approach in reliability optimization called joint reliability-redundancy-strategy allocation in which redundancy strategy is considered as a decision variable alongside the traditional reliability and redundancy factors of the system. Another contribution of this study is its realistic presumption on the reliability and strategy factors, in which reliability is represented by a set of real variables while an independent set of variables is utilized for the strategy choice.
The proposed model also addresses some weaknesses in the existing models in the literature that lead to the inefficiency of JRRAP with a choice of redundancy strategy in previous attempts of other researchers; These new constraint definitions, incorporate a comprehensive viewpoint toward the incomplete switching components. Here, we suppose that, in addition to affecting the overall reliability of the system, the characteristics of the switch are determinant factors of the required budget, total weight, and volume of the system. The presented model is capable of handling three strategies (active, cold-standby, and no-redundancy) for each subsystem and was designed to support the typical series topology, the series-parallel, and some complex structure for the system. From the reliability distribution perspective, the model supports the general k-Erlang rather than the simple exponential distribution.
With the current model being NP-hard, a metaheuristic algorithm from the particle swarm intelligence category was chosen for the solution part of the study. This algorithm offers a simple movement procedure rather than the one in traditional PSO and makes an effort to prevent the algorithm from early convergence to the local optima. The experimental results proved the superiority of the current model and demonstrated the acceptable performance of the proposed algorithm compared to other studies in the literature. For comparison purposes, we solved the proposed model using a DE algorithm and showed that SPSO is superior from both solution quality and resource utilization aspects. With SPSO, our results revealed attaining a Maximum Possible Index of up to 96 percent when compared to the best pure cold standby solutions from the past works.
Additionally, we defined a new set of benchmarks to cover broader use cases for which the MPIs were even slightly higher. This study focused on maximizing the overall reliability of the system. As future research, one may consider multiple objectives simultaneously to examine the performance of the current approach and the new resource functions in other contexts, e.g., when the weight of the system should be minimized as well. Another possible extension to this research may include a new exact or heuristic optimization algorithm.