Virtual reference feedback tuning with robustness constraints: A swarm intelligence solution

The simplified modeling of a complex system allied with a low-order controller structure can lead to poor closed-loop performance and robustness. A feasible solution is to avoid the necessity of a model by using data for the controller design. The Virtual Reference Feedback Tuning (VRFT) is a data-driven design method that only requires a single batch of data and solves a reference tracking problem, although with no guarantee of robustness. In this work, the inclusion of an $\mathcal{H}_{\infty}$ robustness constraint to the VRFT cost function is addressed. The estimation of the $\mathcal{H}_{\infty}$ norm of the sensitivity transfer function is extended to maintain the one-shot characteristic of the VRFT. Swarm intelligence algorithms are used to solve the non-convex cost function. The proposed method is applied in two real-world inspired problems with four different swarm intelligence algorithms, which are compared with each other through a Monte Carlo experiment of 50 executions. The obtained results are satisfactory, achieving the desired robustness criteria.


Introduction
The complexity of certain processes usually requires simplification in the mathematical modeling to allow proper controller design [1], e.g., for power systems [2,3]. For dc-dc converters, for example, the majority of the control techniques assume the existence of an accurate model [4,5], which can be difficult to obtain since power converters have nonlinear dynamics. Low-order controllers are commonly used in the industry, such as Proportional Integral (PI) and Proportional Integral Derivative (PID) controllers [6,7,8,9], but a robust design can still be a challenge. Oversimplified modeling due to the process complexity, and/or a limited performance of the chosen controller structure are possible reasons for achieving poor performance and robustness of the controlled system [10].
On the other hand, data-driven control design techniques are used to overcome common problems related to system models, such as the dilemma on representativity and complexity, or even the unavailability of those [11,12,13]. Some of the data-driven approaches require several plant experiments and iterative acquisition of data, like Iterative Feedback Tuning (IFT) [14], Correlation-based Tuning (CbT) [15], and more recent methods such as the deep neural network-based implementation of data-driven Iterative Learning Control (ILC) [16], and data-driven optimization-based ILC [17]. While other methods like Virtual Reference Feedback Tuning (VRFT) [18], Direct Iterative Tuning (DIT) [19], Optimal Controller Identification (OCI) [20], Virtual Disturbance Feedback Tuning (VDFT) [21], Data-Driven Linear Quadratic Regulator (DD-LQR) [22], and data-driven predictive model control [23] only require a single batch of data (one-shot) in order to tune the controller parameters. If data acquisition is difficult, its repetition is prevented by a tight deadline, or memory constraints are an issue, one-shot solutions are usually preferred over iterative solutions. For more details, a recent and systematic review of data-driven control methods can be found at [24].
Robustness considering low-order controllers is a frequent topic of discussion [25,26] since specific processes can present uncertainties, as well as disturbances that might occur over time. Another point to be observed is that a poor choice of reference model or limited controller class in, e.g., the VRFT design, may result in poor performance or robustness [27]. Since robustness can be measured by the H ∞ norm of the sensitivity transfer function S(z) of a closed-loop system [28], its inclusion in the data-driven design of controllers can be considered, allowing for a more robust design when necessary. A recent methodology proposed in [29] has suggested the inclusion of a robustness criterion in the VRFT design, at the expense of: i) more experiments since the proposed design procedure iterates in a trial-and-error fashion until the desired robustness is achieved and, essentially removes one of the greatest advantages of the VRFT -being a one-shot method; and ii) this type of iterative procedure requiring more background knowledge from the designer for choosing reference models and specifying requirements.
Also, a data-driven one-shot approach for multivariable systems regarding robust solutions of H 2 and H ∞ criteria, and loop-shaping specifications, has been recently addressed [30]. But the method relies on an initial solution that influences the final one since it is convexified by linearization around the initial stabilizing controller, which is undesirable. In the case of datadriven one-shot approaches for Single-Input Single-Output (SISO) systems, considering an H ∞ robust performance criteria, a recent work [31] proposes a technique where the order of the controller is increased as the solution converges and only deals with noise-free data. Both methods [30,31] use only frequency-domain data. Other data-driven robust solutions are achieved in an online fashion and require higher computational processing than offline techniques, and also present the demand of measuring signals in real-time. Some of those techniques are the use of the modified Riccati equation with online data-driven learning [32] and the application of a data-driven Model Predictive Control method with robustness guarantees [33].
Among the aforementioned data-driven design techniques, the VRFT has been more broadly applied to several classes of problems, a fact that can be useful to attest to the feasibility of the proposed method of this paper, as it requires less computational cost than, e.g., OCI and online methods. Therefore, this work is based on the VRFT technique, as presented in [18] and [34], proposing the inclusion of a robustness constraint, in terms of the H ∞ norm of the sensitivity transfer function of the system, in the VRFT design, but maintaining one of its most attractive features: the necessity of only one batch of data. The constraint is included in the VRFT cost function as a penalty [35], which compromises its convex behavior. To deal with the non-convex optimization procedure, the proposed method is tackled in two main steps: i) the design, in a data-driven fashion, of a controller using the VRFT approach, if a previous controller is unexistent; and ii) the application of a metaheuristic optimization algorithm to minimize the cost function considering an ||S(z)|| ∞ norm constraint, using the controller from the previous step as an initial solution.
Notice that metaheuristics, more specifically the Particle Swarm Optimization (PSO) algorithm, have been used on the VRFT design for enhancing the performance of the controller in terms of reference tracking on hybrid systems [36]. Also, more closely related to this work, PSO has been used for solving the VRFT with a modified cost function which includes the optimization of the reference model based on a reference tracking metric [37]. The work focuses on reducing the mismatch between the reference model and the actual controlled system but does not mention robustness.
Since metaheuristic algorithms may work well for a certain class of problems, but might fail over other problems, according to the No Free Lunch (NFL) theorems [38], more than a single metaheuristic optimization algorithm should be evaluated. Looking over the available types of metaheuristics, three can be highlighted: evolutionary algorithms [39]; physics-based algorithms [40]; and swarm intelligence algorithms. Although some authors group evolutionary algorithms with multiple agents and swarm algorithms together [41], this paper considers the two classes separately, as done by other authors in the metaheuristic optimization subject [42]. This work focuses on swarm intelligence algorithms since they usually have fewer parameters to be tuned by the user or designer [41]. Four swarm intelligence algorithms are considered: Particle Swarm Optimization [43] and Artificial Bee Colony (ABC) [44] since those are two of the most used in literature; and the two swarm intelligence algorithms with the least number of hyperparameters, Grey Wolf Optimizer (GWO) [42], and its most recent version, the Improved Grey Wolf Optimizer (I-GWO) [45].
In summary, the main contributions of this work are: • in theoretical terms, the derivation of the expressions of the signals for the data-driven estimation of the impulse response of the sensitivity transfer function of a system, in one-shot, at Subsection 4.1.1; • the proposal of a modified cost function to the VRFT method regarding the inclusion of a robustness constraint, at Section 5, and the use of swarm-intelligence algorithms for solving the proposed cost function; • at Section 6, the use and comparison of four different swarm-intelligence algorithms -PSO, ABC, GWO, and I-GWO -in the proposed method for solving two real-world inspired problems based on the structure of widely used DC-DC converters.
This paper is structured as follows: Section 2 describes the system that is considered in this paper for the theoretical formulation; Section 3 presents the basis of data-driven controller design and details the basic VRFT design procedure; Section 4 explains the method for the H ∞ norm estimation of the sensitivity transfer function; Section 5 presents the proposed method and details the four considered swarm intelligence algorithms; Section 6 illustrates and validates the method by showing its application in two real-world inspired examples; and finally, Section 7 concludes this work.

Preliminaries: description of the system
The system considered in this paper for the theoretical formulation is a discrete-time, causal, linear time-invariant, and Single-Input Single-Output (SISO) system G(z). It is considered that z is the forward discrete time-shift operator such that zx(k) = x(k + 1). The output y(k) of the system can be described as where u(k) is the input signal and v(k) is the process noise -stochastic effects that are not represented by G(z), i.e., not captured by the input-output relation of u(k) and y(k). The closed-loop system taken into account in this work is regarded as a controller C(z) with the process G(z) and a unit gain feedback, as shown in Figure 1, where r(k) is the reference signal and e(k) is the error signal. The closed-loop control law is with being a controller with parameter ρ ∈ R p ,C(z) a vector of transfer functions belonging to the controller class C (e.g., PI or PID controller classes) and r(k) the reference signal. The output of the closed-loop system is given as where the reference signal r(k) is applied to the transfer function from the reference r(k) to the output y(k), T (z), with and S(z) is the sensitivity transfer function such that S(z) + T (z) = 1 and In the next section, the Model Reference Control (MRC) is introduced. The VRFT method is described for the cases where the process has minimum or non-minimum phase.

Virtual Reference Feedback Tuning (VRFT)
The Model Reference Control (MRC), which is the basis for the VRFT, more generally called model matching control [46], concerns the problem of reference tracking of the closed-loop system's response, disregarding the effects of noise at the output [27].
In order to obtain a controller, the MRC requires the designer to elaborate a target transfer function for the controlled closed-loop system, called a reference model (T d (z)), which generates the output y d (k) = T d (z)r(k). A reference tracking performance criterion evaluated by the two-norm tracking error is then obtained by solving the optimization problem which can be solved by considering (5), resulting in the solution controller for the MRC, called the ideal controller C d (z). The VRFT is a one-shot optimization data-driven controller design technique based on the MRC. It is defined as one-shot since only a single batch of input-output data is required to solve the model reference control problem (7), which can be done by the use of least squares when the controller is linearly parametrized as in (3), resulting in the parameter ρ of a controller with predefined class. The VRFT design depicted in this paper follows the procedures of [27,11].
Consider an experiment, in open-loop or closed-loop, that results in a batch of collected data {u, y} N k=1 . A virtual reference signalr(k) is defined such that T d (z)r(k) = y(k). A virtual error can be obtained asē(k) = r(k) − y(k) = (T −1 d (z) − 1)y(k). In summary, a controller C(z, ρ) = ρ ′C (z) is considered satisfactory if it generates u(k) when fed byē(k). The closedloop of such block diagram for the VRFT controller design is illustrated in Figure 2.
The VRFT solves the optimization problem having the same minimum as (7) if the ideal controller C d (z) in (5) belongs to the same controller class C = {C(z, ρ), ρ ∈ R p } as C(z, ρ). To compensate for the fact that the ideal controller rarely belongs to the chosen controller class, a filter L(z) is applied to the data to approximate the minimum of J V R to the minimum of J M R , where the amplitude should satisfy [27]: where x(e jΩ ), with x representing any signal or system, represents the Discrete Fourier Transform of x(k), Φ r (e jΩ ), Φ u (e jΩ ) are, respectively, the power spectra of the signals r(k), u(k).
Instrumental variables can be used in order to suppress estimation bias caused by the noise in data [47], requiring the use of a second data batch. In practice, the input signal can be formed by two identical sequences in the same experiment, if memory restrictions do not impose a problem. Then, the signals can be synced together afterward, resulting in two batches of data from one single experiment.
In the presence of a Non-Minimum Phase (NMP) zero at the process, a flexible reference model can be used, as presented in [27]. The optimization problem (8), then, becomes where η ∈ R m , and F (z) is a vector of transfer functions such that T d (z, η) = η ′ F (z). The step-by-step design for the VRFT with a flexible reference model, from data collection to the algorithm design, is detailed in [11].
To be able to include a robustness criterion at the VRFT cost function, a means of evaluating this parameter is required. Nonetheless, this is approached in the next section.

Robustness index estimation
Depending on the choice of T d (z) or the controller class C, as well as the response of the plant G(z), the VRFT-designed controller can result in poor robustness of the controlled process. For such cases, a robustness constraint can be included in the form of the H ∞ norm of S, also called maximum sensitivity (M S ), which can be used as a measure of robustness [28]. Typically, a system that presents M S > 2 is considered to have poor robustness [28]. In this context, and considering the use of data-driven design approaches, it is necessary to estimate the value of ||S(z)|| ∞ in a data-driven way since it is assumed that no plant model is available to the designer. By this reasoning, the estimation of M S is explained in the following subsection.

Estimation of M S
The H ∞ norm estimation procedure developed in this work is based on the Impulse Response (IR) of the system, as presented in [48], modified from [49] to a SISO impulse response identification procedure, which allows for a regularized estimation according to the existing literature [50]. Also, in order to maintain the one-shot characteristic of the VRFT, the estimation of the H ∞ norm of S based on impulse response is addressed as follows.
Consider the linear discrete-time causal and SISO system S, represented by its transfer function S(z, ρ), such that its output signal ψ(k) is given by the equation where ζ(k) is the input signal of S, whose impulse response is s(k).
Since (11) requires infinite data to be obtained, an order M is defined such that it is assumed that any IR term greater than M is negligible, which is valid for stable systems since lim k→∞ s(k) = 0. Nevertheless, equation (11) can be truncated to M terms, leading to: On the other hand, the definition of the H ∞ norm, when applied to the system S, can be written as [28] H ∞ : which requires the whole set of possible inputs {ζ(k) ̸ = 0}. Therefore, expression (13) cannot be directly calculated. An alternative strategy is to obtain a matrix relation for S, which allows for the use of induced norm properties. Expanding (12) to the M first terms gives and the following matrix relation truncated at M elements is obtained: From the assumption that the order M is sufficiently high, it can be said that matrix S M characterizes the IR s(k) of S. A useful matrix property is the induced norm [28, A.5], which can be applied to (15), such that where the subscript i stands for induced. In short, (16) is a matrix form of representing the system gain considering a set of possible input signals Z M . From the induced-2 norm, the following is obtained whereσ and λ max stands for largest singular value and largest eigenvalue, respectively, and comparing (13) with (16), it can be seen that Since the ||S(z, ρ)|| ∞ norm can be estimated based on its IR via (18), an expression for the input signal ζ(k) and the output signal ψ(k) of S(z, ρ) has to be derived in order to estimate its impulse response in the VRFT design context.

Input-output signals of the sensitivity transfer function
Considering the system presented in Figure 2, its sensitivity transfer function in (6) can be rewritten as Assuming that u(k) is sufficiently informative to capture all relevant characteristics of S(z, ρ), multiplying both sides of (19) by u(k) achieves It is known that G(z)u(k) = y(k). Substituting such relation in (20): From (21), the signals can be derived. Expressions (21) and (22) mean that when a signal ζ(k) formed by u(k) + C(z, ρ)y(k) is applied to S(z, ρ), an output ξ(k) = u(k) is obtained. Therefore, the impulse response of S(z, ρ) can be estimated considering the data set {ξ, ζ} N k=1 . In this work, the IR estimation is made through identification with regularization techniques since: i) the variance of the estimates increases with the order M , which is suppressed with the use of regularization [50]; and ii) knowing that IR is a sparse signal for sufficiently high M , the use of regularization is known to improve sparse signal estimates [51]. The algorithm for regularized estimation of the impulse response is described in [50,52], and is available in MATLAB® [53], Python [54], and R [55].
The inclusion of the H ∞ constraint in the cost function spoils the convexity characteristic of the VRFT and the solution can no longer be obtained through the least squares algorithm. A strategy to deal with local minima and other characteristics that may arise from a non-convex cost function is to use metaheuristic optimization [56]. Nevertheless, the VRFT with robustness constraints is formally proposed in the next section, which also addresses the use of swarm intelligence algorithms to achieve a feasible solution.

VRFT with robustness constraints
The proposed method regards a two-step procedure. The first step follows the design of a controller using the VRFT, as commented in Section 3. The same data from the first step is used in the second step. The need for a second experiment is avoided since the estimation of M S , represented in this context asM S (ρ), does not require new data, as derived in Subsection 4.1. In this case, the cost function J V R is modified by the addition of a robustness constraint, regarding the value of the estimated (M S (ρ)) and desired (M Sd ) ||S(z, ρ)|| ∞ norm, leading to a new optimization problem: which can be applied directly to the cost function as a penalty [35], resulting in the Swarm Intelligence optimization cost function: where c is a positive constant, usually with c ≫ 1, and the H ∞ penalty can be written as The controller C(z, ρ) in (24) can be chosen as belonging to any controller class C, linearly parameterized in ρ as C = {C(z, ρ), ρ ∈ R p }. The robustness indexM S (ρ) is estimated at each iteration of the swarm intelligence algorithm optimization following the procedure described in Subsection 4.1.
Notice that an important insight can be drawn from (24) regarding the choice of M Sd . If the choice is too ambitious, i.e., M Sd much lower than M S (ρ 0 ) estimated with an initial VRFT-obtained controller with parameter ρ 0 , the obtained parameter ρ could be found too distant from the VRFT solution. Such a difference could considerably increase the value of J V R (ρ) to a point where the performance of the swarm intelligence-obtained controller is drastically affected. On the other hand, if the constant c is chosen with an exorbitantly high value, only the cases where the optimization algorithm falls into a local minimum and is not able to minimize H(ρ) would be affected, resulting in a final cost much higher than with a lower value of c. If the penalty element H(ρ) is solved by the optimization algorithm, as seen in (25), its value becomes zero, indifferent of the value of c, not affecting the in order to accelerate the convergence of the metaheuristic algorithm, the initialization of the search agents can inherit the first step solution ρ 0 ∈ R p as a central point, as expressed in with R being the initial population spawn radius, and − → X (0) ∈ R p a random position vector such that − → X (0) ∼ U (0, 1). An inherent step of the method is to collect input-output data from the process, as suggested in [27,11]. Remember to take into account system identification theory [47] for data to be sufficiently informative. Then, the two proposed design steps can be applied: 1. Use the VRFT to design a controller for the process. Use a flexible reference model if the plant is NMP, as presented in [34]. Check the obtained robustness index and proceed to the second step if it does not satisfyM S ≤ M Sd , or else, take the controller from Step 1 as final. Such a main step can be divided into the following specific steps: • acquire a data set {u, y} N k=1 from the closed-loop system with an initial stabilizing controller; • use the data set to design a controller with the VRFT method, as detailed in Section 3. Controller parameters ρ are obtained after the minimization procedure of the VRFT method. In the NMP case, parameters for the reference modelη are also obtained; • estimate the robustness index according to the method in Subsection 4.1. IfM S > M Sd proceed to the second step, or else, use the VRFT-obtained controller with no further modification.
2. Apply a swarm intelligence algorithm, as presented next in Subsection 5.1, considering the optimization problem described in (24) according to the desired value of M S , with restriction applied in the form of a penalty as (25), with the initial spawn of agents following the recommendation of (26). The second step can be divided in: • implement the VRFT cost function with the penalty as in (25) regarding the desired maximum value of M S ; • change the initialization procedure of the chosen swarm intelligence algorithm to consider a center spawn ρ 0 , i.e., the VRFTobtained solution at the first step, and a spawn radius as suggested in (26) to accelerate convergence; • execute the algorithm and obtain controller parameters that satisfy the robustness restriction.
The swarm intelligence algorithms considered for obtaining the results of this work are described next.

Swarm intelligence algorithms
Swarm intelligence algorithms consist of stochastic optimization algorithms that are based on the collective intelligence of groups composed of simple agents, usually based on the behavior of animals in nature [57]. In order to cope with the NFL theorems [38], four algorithms are chosen to be used: Particle Swarm Optimization (PSO) [43] and Artificial Bee Colony (ABC) [58], which are well known and widely used [56,59]; Grey Wolf Optimizer (GWO) [42] and Improved Grey Wolf Optimizer (I-GWO) [45], more recent approaches with fewer hyperparameters than the aforementioned. In the following, the four algorithms are briefly presented.

Particle Swarm Optimization
Particle Swarm Optimization (PSO) involves populations (or swarms) in which each element is called a particle that represents a form of directed mutation [59,43]. The swarm is composed of ℓ particles searching in a Ddimensional space, which are initialized randomly within the search space with lower and upper bounds l b and u b . Each particle has its own position ( − → X i ) and velocity ( − → V i ) and is considered as a possible solution for the problem. The best solution found locally by a particle i is represented by .., G D } is the best solution found globally. As for the standard algorithm, each particle is initialized at a random location with random velocity.
The pseudo-code for PSO is given in Appendix A, in which max it is the maximum number of iterations and f is the cost function. The constant w 1 is an inertia weight, C 1 is the cognitive learning factor, and C 2 is the social learning factor. Details for the implementation are available in the literature [43].

Artificial Bee Colony
The Artificial Bee Colony (ABC) algorithm simulates the behavior of bees performed during their foraging process, conducting a local search in each iteration. Possible solutions are represented by food sources, while the quality of each solution is proportional to the nectar amount in each source [56,44,58]. There are three types of bees: scout, employed, and onlooker. At the initialization, the scout bees randomly find possible food sources (solutions). Each food source receives an employed bee. By roulette wheel selection, onlooker bees choose food sources to be exploited based on their quality, but both types perform local searches in their neighborhood.
The pseudo-code in Appendix B describes the ABC optimization procedure, where − → X i with D dimensions is the location of food sources. In this form, ℓ is the number of possible solutions (food sources) and the number of scouts, employed, and onlooker bees are taken as the same as the number of food sources. L is the abandonment criteria, defined by the designer. Implementation details are available in [44].

Grey Wolf Optimizer
The Grey Wolf Optimizer (GWO) is an algorithm based on the hunting behavior of grey wolves, which have a strict social dominant hierarchy. The leaders are the alphas (α), who are responsible for making decisions. At the second level are the betas (β), subordinates to the alphas who help in decision-making and other pack activities. The third level wolves are the deltas (δ), representing scouts, sentinels, elders, hunters, and caretakers. The rest of the pack is called omega, which must submit to the higher-ranking wolves [45]. In the GWO algorithm, all wolves follow the mean position of the α, β, and δs. At each iteration, the new three best wolves are re-defined, with the α position being the final solution. The pseudo-code for the GWO algorithm is presented in Appendix C. Details for the implementation can be checked at [42].

Improved Grey Wolf Optimizer
There are three main problems noted in literature around the GWO algorithm [45]: i) lack of population diversity; ii) imbalance between exploitation and exploration; iii) premature convergence. The Improved Grey Wolf Optimizer (I-GWO) tries to solve those issues by changing the search strategy of the GWO algorithm, including the Dimension Learning-based Hunting (DLH) strategy, which defines a new (possible) update position for each wolf based not just on the position of α, β, and δ wolves, but also on the position of its neighbors. Appendix D briefly describes the I-GWO. Details of the implementation are given at [45].

Validation results
In order to validate and illustrate the proposed method, two real-worldinspired examples are considered. The method is applied as suggested in Section 5 with all four swarm intelligence algorithms commented in Subsection 5.1. The results are compared in terms of: i) cost value obtained for best solution -best (lowest) cost; ii) ||S(z,ρ)|| ∞ value obtained for best solution; iii) convergence speed. Notice that the system model is only used to generate data in simulation. The knowledge of the model is neglected at any stage of the design, maintaining a pure data-driven fashion.

Example 1: a second-order non-minimum phase plant
The first system to be considered is with a time step of 1 second, which is similar to the discrete-time model of a Boost/Buck-Boost converter operating in continuous conduction mode, regarding the transfer function of the output voltage by the duty cycle [60]. The presence of a non-minimum phase zero makes it necessary to use the VRFT with a flexible criterion [27] at the first step of the proposed method. Assuming that the system model (27) is unknown, there is no previous knowledge about its zero being Non-Minimum Phase (NMP). In this sense, it is possible to analyze the estimated impulse response since the IR of NMP systems initially moves in the opposite direction (downwards) related to the steady-state one [51]. Therefore, a Pseudo-Random Binary Signal (PRBS), which is persistently exciting of high order [47], containing N = 2000 samples is applied to G 1 (z) in simulation, generating an output signal. Additive white Gaussian noise with a Signal-to-Noise Ratio (SNR) of 20 dB was added to the system at the output, representing measurement noise. With the inputoutput dataset, the IR of G 1 (z) can be identified with an IR identification algorithm available in the literature [52,54,53,55], resulting in the signal presented in Figure 3. Clearly, the IR initially goes downwards, indicating the presence of an NMP zero, justifying the VRFT with a flexible reference criterion [27].

Data collection
The data for estimation is obtained in a closed-loop with a proportional stabilizing controller [11] since its presence in the system avoids signal divergence. By the small gain theorem [28], a stabilizing controller can be obtained following Therefore, the stabilizing controller k p is chosen as In order to obtain the H ∞ norm of G 1 (z), its impulse response is estimated according to [54] and the norm is calculated as proposed in Subsection 4.1.
The excitation signal considered for the data acquisition is a PRBS with N = 2000 samples. The signal is applied to the control reference of the closedloop system formed by G 1 (z) with stabilizing controller k p . The control output signal u(k) and the system output signal y(k) are acquired, forming the input-output set {u, y} N k=1 .

Step 1 -VRFT with flexible criterion
Assume a situation where the control requirements are: i) null error in steady-state; ii) settling time approximately 2.5 times faster than in closedloop with the stabilizing controller k p ; iii) null overshoot for a step reference. A reference model that meets such requirements, following the guidelines of [11], is chosen as Notice that the initial zero of T d (z) is set as greater than 1, as suggested in [27], allowing for the VRFT with a flexible criterion to identify the plant's NMP zero. The chosen controller class chosen to be used is the PID class of controllers, which givesC After solving the cost function (10) according to the VRFT method with flexible criterion, the following solution pair for η, ρ is obtained: resulting in a new reference model T d (z,η), and in the controller C(z,ρ), respectively: The non-dominant pole of the reference model, now T (z,η), is updated together with the minimization of η and ρ, as suggested in [11]. By estimating the H ∞ norm of S(z,ρ) of the closed-loop system based on the solution (33), anM S = 2.1952 is obtained, which may be too high for applications that require lower robustness indexes since it is greater than 2 [28]. The next subsection presents the application of the second step of the proposed method to reduce M S for the obtained VRFT solution.

Step 2 -Swarm intelligence algorithm
The use of swarm intelligence algorithms for solving the proposed problem is straightforward. At their implementation -see Appendix A, Appendix B, Appendix C, and Appendix D for the pseudo-codes -the cost function f is the proposed J SI (ρ) (24). At each iteration, the input to f is the parameter ρ, obtained in the previous iteration for each search agent since it is required to parameterize the controller C(z, ρ) and to estimate the H ∞ normM S (ρ).
The computation of J SI (ρ) is done by directly calculating the cost (24). The normM S (ρ) can be obtained through (18), estimating the impulse response of S(z, ρ) with the signals proposed in (22) and an IR estimation algorithm available in the literature, such as [54].  The upper search bound for the swarm intelligence algorithms is defined as u b = 10, which should be sufficient considering that the maximum ρ value of the VRFT-obtained controller is 6.9713 and, taking into account a choice of M Sd that is not too ambitious, the resultant ρ should not be too distant from the initialized value. The lower search bound is chosen as l b = 0 to avoid negative controller gain, making the obtained controller passive [61].
The initialization of all agents is done randomly with a uniform distribution within the suggested spawn radius in (26), R = u b /2 = 5, with the central point equal to the VRFT solution at Step 1 (32b). The reference model for the cost function (24) is considered to be the VRFT with a flexible criterion reference model, T d (z,η), obtained in Step 1 as (33a). The desired M S to be achieved, M Sd , is set to 1.8, which is a sufficient value in terms of robustness, satisfies M Sd ≤ 2 and should not compromise substantially the performance of the system, which could happen if M Sd << 2.
To make the comparison between algorithms possible, the number of agents was fixed to 50 and the number of iterations was limited to 100. In order to obtain a satisfactory number of realizations for the analysis of results each algorithm was executed 50 times. The hyperparameters for the PSO and ABC algorithms, except for the number of agents and the maximum number of iterations, are presented in Table 1. The PSO parameters were chosen as the MATLAB® default parameters of the Global Optimization Toolbox [53], while the ABC parameters were used according to the algorithm implementation of [62]. GWO and I-GWO do not contain any hyperparameter aside from the number of agents and the maximum number of iterations. Figure 4 shows the average convergence curve of all algorithms for 50 runs, considering the system G 1 (z) as aforementioned. Table 2 presents the time that each iteration took and the number of iterations to converge, con-  sidering the average value for all 50 realizations and a convergence criterion of δ = 1 × 10 −3 from an iteration cost to the subsequent one. The results were obtained with an Intel Core i5 4670 3.40 GHz processor. The I-GWO algorithm took a longer time to converge, followed by ABC, PSO, and finally, GWO. Comparing similar algorithms, I-GWO obtained a duration for one iteration that is more than twice the same duration for the GWO. But since the optimization of the proposed cost function is executed offline, where the obtained parameters (ρ) are applied to the controller without any further modification, this is not considered an issue for the current application. Considering all 50 executions per swarm intelligence algorithm, Figure 5 presents the best (lower) cost statistics for all algorithms at example 1 in the form of a box plot. Clearly, I-GWO achieved the most desirable performance in terms of cost since it contains fewer outliers and a very low dispersion if compared to the other algorithms' solutions. PSO, ABC, and GWO, in general, result in higher cost values than I-GWO for the considered cost function. Table 3 shows the quantitative values related to the best cost of all algorithms at each run, confirming the conclusions taken from Figure 5. The obtained values for the parameter ρ = [k p k i k d ] ′ are presented in Figure 6 and show how the lower cost dispersion obtained with the I-GWO algorithm can influence the values of the controller parameters and their tendency. For the case of example 1, the I-GWO is the most robust algorithm in relation to the random initialization of the parameters. Such a characteristic is desired in a real-world scenario, reducing the number of trials in the design procedure to obtain a satisfactory solution. Also, notice that the median solution obtained with the I-GWO is not too far from the VRFT solution (32b) obtained at Step 1, which agrees with the previous hypothesis that a value of desired robustness M Sd chosen as not too ambitious when compared with the initial VRFT solution (32b) should result in a final solution that is close to the initial (VRFT) one. Figure 7 presents the box plot for the obtained ||S(z,ρ)|| ∞ by the best  solution of each algorithm at each run, in a closed-loop with G 1 (z). I-GWO obtained the most desired result in terms ofM S considering the lack of outliers and the low dispersion. PSO had one outlier withM S > 2, while ABC obtained three outliers of higherM S , and the performance by the GWO algorithm for this problem was not satisfactory since there were too many solutions that achieved anM S higher than 2.  Table 3: Quantitative results from the box plot in terms of best cost for example 1. tive data of the box plot presented in Figure 7, in agreement with what is commented over the results.
To briefly demonstrate that, although the robustness is increased with the proposed method, the reference tracking is not overly penalized, a step reference was applied to the controlled plant of example 1 using a controller designed by the proposed method with the I-GWO algorithm, with parameters ρ IGW O = [1.090 0.2194 5.4018], and the VRFT controller (32b) ob-   tained at step 1. As shown in Figure 8, the VRFT-controlled system achieves a settling time of 54 seconds, a step overshoot of 20 % and an undershoot of 42 %. For the proposed method, a settling time of 39 seconds is achieved, with 9 % of overshoot and 33 % of undershoot. Notice that, in this specific case, the proposed method could even enhance the reference tracking performance in terms of settling time, however, this is not expected in most of the systems since the inclusion of a robustness constraint will penalize the VRFT cost function. The reduced overshoot and undershoot are a consequence of the increased robustness, with anM S of 2.2079 for the VRFT and 1.8030 for the I-GWO-designed controller.

Example 2: fourth-order plant
The fourth-order plant consists of with a time step of 1 second, which has the same structure as the discrete-time transfer function of a SEPIC converter, from duty cycle to output voltage [63]. Since the plant's zeros have minimum phase, which can be evaluated with data as aforementioned in Subsection 6.1, the VRFT method is used without the flexible reference model [27].

Data collection
For plant G 2 (z), the data is obtained in the same way as described, for example 1, in Subsection 6.1, with a PRBS signal of N = 2000 samples applied to the closed-loop system with stabilizing controller considering additive white Gaussian noise with an SNR of 20 dB to represent measurement noise. The input-output set is formed by {u, y} N k=1 .

Step 1 -VRFT
After the data is acquired, the next step is to use the VRFT to design a controller, which solves the cost function (8). For this example, the following control requirements are assumed: i) null steady-state error; ii) settling time of approximately 6.5 times faster than the closed-loop settling time with stabilizing controller k p ; iii) null overshoot for a step reference. Considering such requirements, the choice of the reference model is made as suggested in [11], obtaining Supposing a limited situation where only a PI controller is available, e.g., because of hardware limitations on a certain product. Therefore, the controller class to be considered is the PI class of controllers, resulting in The obtained VRFT solution results in the controller parameter Via (3), the controller is obtained. Considering the VRFT-obtained solution (38), the robustness index of the system can be estimated according to Subsection 4, obtainingM S = 2.2767. As aforementioned, an M S ≤ 2 is desired to ensure sufficient robustness [28], which leads to the application of the second step of the proposed method.

Step 2 -Swarm intelligence algorithm
The swarm intelligence algorithms PSO, ABC, GWO, and I-GWO are applied to the problem (24) for the fourth-order plant case, according to the pseudo-codes presented in Appendix A, Appendix B, Appendix C, and Appendix D, where f is the cost function (24), as already commented at Subsection 6.1.3. The upper search bound is kept at u b = 10 and the lower bound at l b = 0, in order to increase the passivity of the controller Alg.  as mentioned in Subsection 6.1. An upper bound of 10 should be sufficient, considering that the maximum desired robustness is not too far from the estimated robustness index at the end of step 1. The initial population spawn radius follows (26), R = (|u b | + |l b |)/2 = 5, with the solution of the VRFT at Step 1 (38) as the central point. The desired ||S(z,ρ)|| ∞ is set to 1.5, which satisfies M Sd ≤ 2 and is not much lower than 2.
The number of agents of all algorithms is set to 50, with a maximum of 100 iterations per execution. Each algorithm is executed 50 times for different noise realizations, such that the results are sufficiently representative for further analysis. For PSO and ABC algorithms, parameters are set as presented in Table 1. Aside from the number of agents and the maximum number of iterations, no other parameter is set by the user with the proposed GWO and I-GWO algorithm.
The average convergence curve of all algorithms for this case is presented in Figure 9. Table 5 presents the time for one iteration and the number of iterations each algorithm took to converge, considering a convergence criterion δ = 1 × 10 −3 . The hardware configuration is the same as in example 1. The ABC algorithm was the slowest algorithm in this example, followed by I-GWO, PSO, and at last, GWO. In comparison to example 1, GWO is still the fastest algorithm to converge, while ABC took I-GWO place as the slowest convergence. Figure 10 shows the box plot regarding the best fitness value for each algorithm, considering all executions. PSO, ABC, and I-GWO did not present far outliers, as those seen in the case of GWO. The quantitative values of the box plot are shown in Table 6. In this case, the most desirable result is obtained by the PSO algorithm. Although, notice that the scale of the y-axis (best cost) only varies at the third decimal place, which means that PSO, ABC, and I-GWO solutions should perform very similarly in practice.
The ||S(z,ρ)|| ∞ norm obtained for the best solution at each run is shown in Figure 11, with its quantitative values presented in Table 7. PSO, ABC,  and I-GWO present very similar values for the obtained H ∞ norm, which shows that the performance of those solutions is close, as also noticed from the cost analysis. GWO is the only algorithm that presents outliers in this case, which, in comparison to I-GWO, indicates that the local minima problems pointed out in literature [45] Table 6: Quantitative results from the box plot in terms of best cost for the example with system G 2 (z). Additionally, similar conclusions can be drawn from the controller parameter's values obtained with each optimization algorithm for the proposed problem. In Figure 12, outliers are only obtained with the GWO algorithm, where the gain of the integral part of the controller, k i , achieves values close to zero, which has, as a consequence, a lower performance for the controlled system, even though the robustness criteria is met. The obtained ρ for PSO, ABC, and I-GWO are very similar for all executions, indicating that any of the three algorithms could be used for this example without significant loss of performance or robustness when compared to one another.
In the same way as for example 1, a step reference is applied to the controlled plant of example 2 using a VRFT-designed controller and a controller designed with the proposed method using I-GWO, as shown in Figure 13. In this case, as one would expect, the reference tracking was penalized in terms

Conclusion
This work proposed a data-driven one-shot technique to increase the robustness of a closed-loop discrete-time system by changing the controller parameters using swarm intelligence algorithms. The considered optimization problem (24) is the VRFT cost function with the addition of a penalty regarding the value of the ||S(z, ρ)|| ∞ norm, which can be directly used as a measure of robustness. Such a value is estimated via an impulse response at each iteration of the metaheuristic algorithm.
Four swarm intelligence algorithms -PSO, ABC, GWO, and I-GWOhave been considered to illustrate the proposed technique with two real-world inspired plants. For the example of a second-order non-minimum phase plant, I-GWO obtained the best results in terms of dispersion, outliers, and cost, with acceptable values of ||S(z, ρ)|| ∞ . PSO achieved similar results in this case, while ABC and GWO algorithms had a higher occurrence of outliers -in terms of cost andM S . In the case of the fourth-order minimum-phase plant, ABC, I-GWO, and PSO obtained very similar results, all satisfactory in terms of cost and H ∞ norm of S(z, ρ). GWO, on the other hand, obtained a few outliers that, although the criterion of robustness was met, those cases achieved a very low integral gain, which could drastically affect the performance of the system. In terms of speed of convergence, I-GWO and ABC were the slowest algorithms, while PSO converged faster and GWO was the fastest.
As for future works, it is suggested: the inclusion of other constraints (e.g., for control effort) simultaneously with the robustness constraints; the use of other types of metaheuristics, as evolutionary of physics-based algorithms; the inclusion of a robustness constraint to the OCI, VDFT, or DD-LQR methods; extension of the current work for MIMO systems.