Fast algorithms for constrained generalised predictive control with on-line optimisation

This paper proposes fast algorithms for constrained generalised predictive control based on two ﬁrst-order methods, namely accelerated dual gradient projection method and fast alternating minimisation algorithm. Theoretical bounds on the number of iterations, which play an important role in the context of real-time model predictive controllers, are provided for both algorithms. Also, some implementation issues in parallel architectures are discussed. The methods are ﬁrstly validated by simulation and their results are compared with the ones presented by commercial solvers. A three-phase grid-connected LCL-ﬁltered inverter was used as a case study. The algorithms were evaluated in an FPGA with the quadratic program computed in microseconds.


INTRODUCTION
Nowadays, model predictive control (MPC) is still the most relevant advanced control method in industry [1]. However, MPC is mostly used to control slow dynamics processes with large sampling times and sufficient computational resources [2]. This can be explained, in part, because the control signal computation of MPC, in its conventional formulation, requires a quadratic programming optimisation to be solved. This issue motivated researches to find alternatives to accelerate the computation time of MPC algorithms and implementations. More recently, with the increase in the availability of high-performance computational systems, the application of MPC algorithms in processes with fast dynamics is becoming a reality. Some researches on fast constrained MPC exploit the quadratic program (QP) particularities to compute it very efficiently. One approach is to solve a set of QP offline and store the optimal values in a look-up table, such as explicit MPC in [3]. However, this kind of solution is limited to small problems due to memory limitations [4]. Another approach is to exploit the structure of the resulting problem. Among on-line QP solvers, the first-and second-order methods deserve to be highlighted. Some works propose second-order methods for fast MPC, such as the ones based on active-set (see [5,6]) and interior-point methods (see [2]). However, second-order methods require complex linear algebraic routines that result in complex algorithms, which make use of a large amount of code memory [7]. On the other hand, first-order methods are cheaper on the computational requirements and have become a good choice for embedded fast MPC. Also, for large problems, they can be used in distributed MPC applications [8]. The most widely used firstorder algorithms for MPC applications are the operator splitting method [9] and the gradient projection method [10]. In general, first-order methods must be used when a solution of low to medium accuracy is sufficient [11]. However, their use in practical closed-loop control systems is suitable due to the feedback action, and physical limitations such as the resolution of the sensors and the analog to digital converter. Another drawback of first-order methods is that accelerated versions, based on Nesterov's method [12], can become unstable because of the error accumulation if fixed-point arithmetic is used. To overcome this limitation, the number of integer and fractional bits must be carefully chosen. Some guidelines to choose the fixed-point word size, for a dual gradient method, are presented in [13].
The research in fast MPC is mostly focused on state-space formulations. However, generalised predictive control (GPC) is one of the most used MPC formulations in industry and makes use of discrete-time transfer functions to calculate the predictions [14]. Transfer-function formulations are widespread in industry, since concepts such as dead time, gains and time constants are more familiar in industrial context than the ones associated with state-space formulations [14]. Another important issue is that dead time, common in industrial processes, can be easily represented in the transfer function form, which is not true for other approaches [15]. Some recent works for fast GPC applied to frequency inverters are presented in [16] and [17], however only the unconstrained case was treated. Some examples of MPC used for three-phase power converter control are presented in [18], where a robust unconstrained MPC with resonant controllers are used to achieve harmonic rejection, and in [19], where a reduced model unconstrained MPC with an embedded integrator and a feed-forward term is introduced to minimise the harmonic content of the grid currents.
The main objective of this paper is to propose a GPC algorithm integrated with first-order methods to solve the resulting QP at each sampling instant. The first-order methods used were the accelerated dual gradient projection (GPAD) method and the fast alternating minimisation algorithm (FAMA). The resulting algorithms are called GPCGPAD and GPCFAMA, and they can be used to control systems with fast dynamics. In addition, they are tailored for embedded applications with characteristics such as simplicity to generate a software able to be verified, validated and certified; limitation on the memory space to keep data that define the code which implements the solution; and determinism with a predictable worst case execution time (WCET), in order to meet hard real time requirements. An advantage of the algorithms in comparison with state-space based solutions with first-order methods, such as [7][8][9][10], is the smaller number of sensors and variables required by the prediction model used by GPC. GPCGPAD was previously presented in [20], and in this work its results is extended for multi-input multi-output (MIMO) processes. As the formulation used in the proposed approach is based on transfer function models, the extension to the MIMO case is not as straightforward as it is in the case of state-space models. In addition, the generalisation proposed in this paper allows considering a different prediction horizon for each process variable and a different control horizon for each manipulated variable. Also, analysis of theoretical bounds on the number of iterations, some details of implementation, and a preconditioning method are discussed. Moreover, a comparison with GPCFAMA is presented, which is also a novel contribution of this work. The five main contributions of this work are: • the proposition of efficient constrained GPC algorithms with GPAD and FAMA, tailored for implementation using fast speed hardware with on-line optimisation; • a comparison of the proposed strategy with other MPC approaches and general-purpose solvers; • deterministic analysis of WCET for the proposed implementations in an FPGA based on the theoretical bound on the number of iterations for the algorithms given in the literature; • validation with a three-phase grid-tie LCL-filtered inverter case study, which is a system with fast dynamics that plays a fundamental role in renewable energy applications; • discussion of implementation features such as parallel matrix multiplication and a preconditioning approach that allow to reduce the computation time for the FPGA architecture considered in this study.
The rest of the paper is organised as follows: Section 2 presents the proposed GPC formulation with the accelerated dual gradient projection method, and with the fast alternating minimisation algorithm. Also, the algorithms for their implementations are presented. Section 3 describes some implementation details of the proposed algorithms. Section 4 presents a case study modelled in Simulink/MATLAB ® and simulated with the proposed algorithms in order to validate and compare them with commercial solvers. In Section 5, an implementation in a field programmable gate array (FPGA) is presented and computation time requirements are discussed. Section 6 concludes the paper.
Notation: let ℝ, ℝ − , ℝ n , ℝ n×m denote the sets of real numbers, the set of negative real numbers, the column real vectors of length n and, the real matrices of length n × m, respectively. For a vector v ∈ ℝ n , v i denotes the i th element, and for a matrix M ∈ ℝ n×m , M i, j denotes the element at the i th row and j th column. An identity matrix of length n × m is denoted by I n×m . A vector where all elements are zero is denoted by 0, with appropriate dimensions. The transpose of a matrix M is denoted by M T . For a positive definite matrix M, max (M) and min (M) denote the largest and smallest eigenvalue of M, respectively. For vectors, the operators max, min, ≥ and ≤ are defined to be element-wise. The Euclidean norm of a vector v is denoted by ‖v‖, while the spectral norm of a matrix M is denoted by ‖M‖. For a sampled signal x, x(k) is the value at the k th sampling instant. For a variable v, v (i ) denotes the i th instance of the same variable. For algorithm purposes, the symbol := denotes a value assignment to a variable.

PROPOSED ALGORITHMS
A MIMO system with n c inputs and n o outputs can be represented by the controlled autoregressive integrated moving average (CARIMA) model as follows: where y ∈ ℝ n o is a vector of the system outputs, u ∈ ℝ n c is a vector of the system inputs, A(z −1 ) is a diagonal polynomial matrix of dimensions n o × n o , polynomial matrix B(z −1 ) has dimensions n o × n c and polynomial matrix C(z −1 ) has dimensions n o × n o . D(z −1 ) is a diagonal matrix of dimensions n o × n o with each element representing the smallest dead time between the inputs related to the corresponding output, e(k) is a zeromean white noise, and Δ = 1 − z −1 is used to improve the modelling of non-stationary disturbances. The typical cost function of a MIMO GPC is given by: whereŷ (p) (k + h|k) represents the prediction of the system output p at a future discrete-time instant k + h, determined at k, w (p) (k + h) is the reference trajectory at k + h, Δu (l ) is the control increment for input l . N (p) = N u is the control horizon for input l . Assuming constant weights over the horizons, q (p) and q (p) are error and control effort weights, which can be grouped in positive definite diagonal block matrices Q (p) and Q (l ) , respectively.
If we define the control increment generalised vector as: the typical GPC QP can be represented by: where R ∈ ℝ n rin ×N u is the constraints matrix, r ∈ ℝ n rin is the constraints vector, and n rin is the number of inequality constraints. The condensed form to represent the constraints RΔu ≤ r is generic and allows the representation of many types of constraints, such as magnitude, slew rate, and overshoot of both input and output variables [14]. The details to obtain (2) and the GPC formulation for MIMO systems are presented in [21] and [22]. The set of QPs from GPC defined in (2), considering all control iterations, can be reformulated as a parametric optimisation of only one parameter with a proper change of variables. This representation is used to obtain the sub-optimality bounds on the number of iterations for the proposed algorithms. Considering a constant set-point w, the GPC QP, for all control itera-tions, can be viewed as a parametric optimisation problem: where the feedback variable e, as a way to represent the state of the system, is composed of the present output and finite series of past inputs and outputs and is given by (2), the vectors b and r, which are functions of e, now are represented as b T = 2(Ee − w) T Q G and r =r +Ēe. The matrix E can be obtained from the matrix of coefficients from the polynomial matrices F (z −1 ) and S(z −1 ) as E = [F S]. F (z −1 ) and S(z −1 ) can be obtained from a solution of a Diophantine equation as explained in [14, p. 49]. The matrixĒ is used when there are constraints which depend on the state of the system, such as output or control signal constraints. If only constraints on control increments are considered, thenĒ is set as a matrix of zeros. It is important to note that the matrices and vectors which define the constraints, costs and dynamics in (3) do not vary during the execution of the GPC controller and can be precomputed off-line.

GPCGPAD algorithm
The gradient projection method (GPM) is similar to the descent gradient method and can be used to solve constrained convex optimisation problems. The GPAD method presented in [10], applies the accelerated dual form on the gradient projection method. Considering the GPC optimisation problem (2), the first step to obtain the dual form is to define the associated Lagrangian: where ∈ ℝ n is the Lagrange multipliers associated to the inequality constraints. The dual optimisation problem, presented as a minimisation, can be stated as: which is a QP with strong duality, from the Slater's condition, since there is a strictly feasible point in the primal problem [23, The main step of the GPM applied to (5) becomes: where j is an iteration index, P is an orthogonal projection and L ≥ 0 is a Lipschitz constant, which ensures convergence with a feasible step in descent direction [10]. An approach to obtain L is the Frobenius norm of the Hessian: where n is the size of the square matrix and i, j are the indices of the Hessian elements.
Since the constraints associated with (5) are only ≥ 0, the projection is equivalent to a saturation of to ensure positive values [10], which can be represented mathematically as: Thus, (6) can be rewritten as: which can be divided in two steps: ) .
The acceleration step was proposed by Nesterov [12] and has been widely used in the context of fast MPC. The main point for the acceleration of the method is the non-observance of the relaxation sequence property and its replacement by the socalled estimated sequence. In this way, an a priori operation is inserted in the calculation: where is defined as = For the stop criteria of the algorithm, primal feasibility and primal optimality can be used. For primal feasibility it is possible to test if the constraints are met within constraints tolerance f , that is: For the primal optimality criterion, a property derived from the strong duality, can be used, which implies: A solution is considered o -optimal if: so by substituting f d ( ) from definition (5) and the cost function f (Δu) from (2) in (14) the solution will be o -optimal if: From the presented developments and the recursive algorithm to obtain the GPC matrices detailed in [21], GPCGPAD can be summarised as in Algorithm 1, where j max is the maximum number of iterations andū i is a unit step signal The sub-optimality bounds on the number of iterations for GPAD were proven in [10]. If they are applied to the problem in (3), considering 0 = 0, the upper bound for the number of iterations can be written as: that guarantees the averaged primal solution Δu j to be feasible within a prescribed threshold f . Another upper bound is which guarantees the averaged primal solution Δu j to be ooptimal. From (16) and (17) one can define an upper iteration bound It is important to notice that, given a GPC problem, all quantities to obtain the upper iteration bound are known except for * (e). The upper iteration bound for a given polyhedron of past conditions, , depends on the value of e which produces the largest optimal dual solution * (e). In order to find an upper bound for the dual solution Δ ( ), where: it is possible to solve [24]: ALGORITHM 1 Generalised predictive control with GPAD As shown in [24], (20) can be reformulated as a mixed integer linear programming (MILP) by introducing a binary vector such The resulting MILP to obtain the upper bound for ‖ * (e)‖ of the proposed algorithm is given by:

GPCFAMA algorithm
The Fast Alternating Minimisation Algorithm (FAMA) was proposed in [25] and was developed to solve optimisation problems as follows: s.t.Cx +Dz =c The acceleration is obtained by over-relaxing the Lagrange multiplier vectors after each iteration. FAMA can be viewed as an accelerated scheme equivalent to performing fast iterative shrinkage-thresholding algorithm (FISTA) on the dual [25]. Thus, by applying the Nesterov-type acceleration step to the dual variable , FAMA is obtained and its steps can be summarised as: In order to write the optimisation problem (2) in the form of (22) so that the results of FAMA can be used, the first step proposed is to represent the inequality constraints using an indica- can be rewritten as: A new variable z = RΔu − r and a new equality constraint can be defined to ensure the coupling between the variables: From the standard problem (25), the Lagrangian  and the augmented Lagrangian  can be defined as: With the Lagrangian and the FAMA basic steps (23), GPCFAMA can be recursively computed as: The step in (26) can be obtained analytically in a direct way: In a typical GPC problem, the Hessian H can be computed off-line and its inverse can be stored in memory. So, this step is obtained with simple matrix vector multiplication.
To compute step (27), notice that this kind of problem can be rewritten as a proximal operator [26]. The proximal operator of a scaled function h(x) is defined as: So, applying the proximal operator to an indication function I _ and a step size , step (27) can be represented as Given that I _(z) is an indicator function of the negative orthant ℝ n − , which is a closed nonempty convex set, the proximity operator becomes a projection onto ℝ n − [27]. So, (32) can be obtained by taking the negative part of each component or zero: The residuals of the optimality conditions can be used as stopping criteria. The primal residual is given by: and the dual residual is: So, a reasonable termination criterion is that the primal and dual residuals must be smaller than desired tolerances: To ensure convergence in the dual objective with rate O(1∕k 2 ), the parameter must be ρ < σ H ∕ max (R T R) with H being the strong convexity constant of H and max the largest eigenvalue of the matrix [25]. Assuming H strongly convex, can be obtained as = min (H )∕ max (R T R).
From the presented developments and the recursive algorithm to obtain the GPC matrices detailed in [21], GPCFAMA can be summarised as in Algorithm 2.
The convergence rate for FAMA was proven in [25] and which guarantees the solution j to be optimal within a prescribed threshold d . In Theorem 5.3 from [28] the convergence rate for the squared Euclidean distance on the primal variable ‖Δu j − Δu * ‖ 2 was presented with O(1∕k 2 ). Thus, another upper bound on the number of iterations that can be used is N p (e, p ), defined as: which guarantees the squared Euclidean distance on the primal variable to be optimal within a prescribed threshold p . The upper iteration bound depends on e and the optimal dual solution * (e). In order to find an upper bound for the worst case of * (e) with e in a given polyhedron , two approaches are proposed in [28]. The first one uses sum of squares relaxations and the second one uses sample-based estimation. The sample-based estimation approach can be easily applied for all problem dimensions. The procedure is to choose a confidence parameter ∈ [0, 1], a level parameter ∈ [0, 1] and take N s satisfying N s ≥ 1 − 1. The next step is randomly drawing N s samples {e 1 , … , e N s } ∈  and run the Algorithm 2 to calculate the corresponding {‖ * (e 1 )‖, … , ‖ * (e N s )‖}. Finally, one can compute max 1≤i≤N s {‖ * (e i )‖} which is an upperbound of an optimal solution with -level robust feasibility for * (e).

IMPLEMENTATION ISSUES
This section describes some implementation details of the proposed algorithms, which can be used to accelerate the computation time.

Parallel matrix multiplication
One of the most expensive operations in the computation time of the proposed algorithms is the matrix multiplication. Classical matrix multiplication algorithms, for dense matrices and one simple processor, are executed in three nested loops. The use of parallel computation can dramatically improve the time needed to execute this operation. In this work it was used an approach presented in [21]. The general idea is to group the independent operations, run them in parallel and pass the intermediate results sequentially. A schematic of the proposed architecture can be seen in Figure 1. In this example, a solution is presented for two 10 × 10 matrices, which presents a latency of 2 clock pulses (for fixed-point arithmetic). The proposed parallel algorithm presents an excellent complexity property given by O(⌈log 10 (n)⌉ + 1). For comparison, a multiplication between two 10 × 10 matrices in the parallel algorithm requires 2 pulses of clock, whereas the sequential algorithm requires 1000.

GPCGPAD implementation architecture
A finite state machine was proposed to implement the GPCG-PAD algorithm in a parallel architecture. The solution was centered on the sequential control of the actions in the state machine, so that the intermediate values of computation are coordinated while the operations of addition, subtraction and matrix multiplication execute in parallel. In Figure 2 the implementation of the state machine and the corresponding operations are represented schematically. It can be seen that multiplication operations require 2 pulses of clock, assuming all matrices of dimension 10 × 10, while sums and subtractions are executed in a single pulse. In state 11, the stop criterion is checked and, if it is satisfied, after step 12, the process is then terminated, signalling the control increment signal found. In states 2 and 9, the multiplication is done in only one pulse of clock, since these are multiplications of vectors by scalars. Thus, it is possible to t =T clock (2 + n iter (8 + 2(⌈log 10 (N u )⌉ + 1) where T clock is the period of clock used by the hardware, n iter is the number of iterations required for the algorithm to reach the stopping criterion, N u is the control horizon and n rin is the number of inequality constraints.

GPCFAMA implementation architecture
The parallel implementation of GPCFAMA used the same strategy as GPCGPAD and can be seen in Figure 3. In this case, three matrix multiplications depending on n rin are required and only one depending on N u . In state 12, the stopping criterion is verified and, if met, the process is terminated and the control increment signal found is signalled. It is possible to estimate the execution time of the algorithm as: t =T clock (2 + n iter (8 + (⌈log 10 (N u )⌉ + 1) + 3(⌈log 10 (n rin )⌉ + 1))), with the same definitions of (36).

Preconditioning
A fact that is well known for first-order methods is that preconditioning can offer significant speedup in the convergence rate [28], [10]. In [28], an optimal preconditioning is proposed for FAMA to solve MPC problems with polytopic constraints. The main idea is to enlarge the step size of the algorithm by decreasing max (R T R). In this paper, a simple preconditioning approach is presented. The first step is to obtain the Cholesky decomposition of the Hessian H = KK T , which allows (2) to be written as: (38)

FIGURE 4
Three-phase grid-connected inverter with LCL filter From a variable substitution, Δú = K T Δu, a reshaped problem is obtained: Although this preconditioning approach is not optimal, it can enlarge the step size in some cases, decreasing the number of iterations. Also, the Hessian becomes an identity matrix, hence the multiplications with H in the algorithms (step 6 in Figures 2 and 3) do not need to be implemented, which translates into a decrease in the computation time. As an example, a problem with a control horizon N u = 10 and with 10 constraints (n rin = 10), the computation time can be reduced in approximate 11% per iteration according to equations (36) and (37). It is important to emphasise that the Cholesky decomposition can be computed offline since the matrix H does not change.

SIMULATION CASE STUDY
The case study considered in this paper is a three-phase gridtie LCL-filtered inverter controlling the power injected to or consumed from the grid. These systems have a fundamental role in renewable energy applications and the schematic circuit is shown in Figure 4. PI controllers are widely used in the synchronous rotating dq frame for regulating the dand q-component currents. However, a PI controller loses its stability when the controller bandwidth is high. Alternatively, proportional-resonant (PR) based control strategy modelled in the stationary frame is a solution, but the performance of the PR controller is sensitive to the frequency variations in the grid [29]. Most MPC applications in power converters use the finite control set MPC (FCS-MPC) [30], [31]. In FCS-MPC, the possible control actions are finite and are the switching states. However, the FCS-MPC has a variable switching frequency that brings problems in the quality of the generated signal [18]. Besides, FCS-MPC algorithms use search methods to compute the optimisation and, thus, very small prediction horizons are used (usually 1 or 2). Another method is the continuous control set MPC (CCS-MPC), which has a fixed switching frequency. CCS-MPC has few applications in power converter controllers, such as the one presented in [19], [18] and [32], due that the number of calculations needed to solve the constrained optimisation problem on-line, which is incompatible with the small sampling times. Therefore, this case study was chosen for this paper to show how the proposed algorithms can be implemented in real-time and contribute to obtaining satisfactory results with fast computation time.
The dynamical equations of the system (Figure 4) in the synchronously rotating dq frame are: where the d -and q-axis components {i 1d , i 1q }, and {i 2d , i 2q } are the inverter currents, and the grid currents, respectively. The capacitor voltages components are {v cd , v cq }. The grid voltages are represented by {v gd , v gq } and can be assumed as disturbances for the control problem. The components {u d , u q } denote the voltages before the LCL filter and they are the manipulated variables. The variable represents the grid frequency in rad/s and the gate signals of the inverter are represented by {S a , S b , S c ,S a ,S b ,S c }. From (40), the discrete-time equivalent transfer matrix used was: The transfer functions were obtained from the Laplace transform of the dynamic equations. The discretisation was done assuming a zero-order hold model for the inputs and the sample period was chosen as T s = 100 s. The sample period is the same as the switching period used in the PWM. Thus, a time delay of one sample was included in the model to compensate for the computation time delay [31].
A simulation of the current control system of the three-phase inverter was implemented in Simulink with the SimPowerSystems library. The computed control signal, converted in dutycycle, was used with a carrier-based pulse width modulation (PWM) technique. Also, a two-level bridge based on insulated gate bipolar transistors (IGBTs) was implemented. A phase locked loop (PLL) was used to generate the sine and cosine references from the grid voltages. The simulation circuit parameters are shown in Table 1, where the LCL filter parameters are the same used in [29].
The measured load current was used as a reference for the control system. Thus, the inverter provides the power required by the load. A 5 kW load was used at the beginning of the simulation and, at time t = 0.1 s, another 5 kW load with a reactive portion of 1 kVAr was switched on. The simulation was run for 150 ms. Simulations were performed on a computer with a 2.40 GHz Intel Core i3 processor and 12 GB of RAM. The runtimes were measured using the tic-toc command of MATLAB. The proposed controllers were tuned with control horizons N (2) = = 1, and unknown future references w (1) (t + j ) and w (2) (t + j ). The weights and horizons were defined with equal values because the importance factor and the dynamics of the variables are equivalent.
In abc frame, the control signal can vary between [−V s , +V s ], which can easily be represented as a linear inequality constraint in the form RΔu ≤ r. However, in dq-coordinates, this constraint becomes a nonlinear inequality: . To use in the algorithms, the following approximation is proposed: Converter output currents and control signal voltages in the abc reference frame with j = 0, … , This is a valid approximation if the control signal value does not change too much from one sample to the other. The optimisation problem results in a matrix H ∈ ℝ 10×10 and 10 constraints. The proposed algorithm were compared with an interior point solver from MATLAB (Quadprog), and with the Gurobi solver. The same controller parameters, horizons and cost function were used for all algorithms. The Gurobi solver was tested with dual simplex (GurobiSD) and barrier method (GurobiBar) algorithms [33]. GPCGPAD was tunned for a primal residual tolerance o = 0.001 and feasibility tolerance f = 0.001. GPCFAMA was tuned with the same primal and dual residual tolerances given by p = d = √ N u 0.001. Gurobi and Quadprog tolerances were also changed to 0.001.
The results can be seen in Figure 5, where the current output of the converter and the control signal applied are presented in the abc reference frame. In Figure 6 the dq components of current output of the converter and the control signal are presented. It can also be seen in this figure that both references are tracked and that the constraints are active and limit the control signal when the setpoint changes. The behavior of the control system was shown only once because it is the same for all the algorithms.
An unconstrained state-space based MPC (uMPC) strategy [18] and a linear quadratic regulator (LQR) [34], which are alternative approaches found in the literature, were implemented. They both use state-space models of the system in the dq reference frame, and the nominal model is augmented with two integrator modes, resulting in a 10-state state-space model, so that the closed-loop system can track step references in each of the dq components. An anti-windup technique was used with both controllers to avoid saturation issues related to the windup of the integrator modes. The control signals profiles are presented in Figure 7 for the uMPC and in Figure 8 for the LQR. The performance index used for comparison was the root mean square error (RMSE). The reference tracking was analysed in dq-coordinates and the results can be seen in Table 2. The RMSE obtained for the GPC can be considered equivalent as the one obtained with the uMPC, with the d -axis component being 21% smaller for the GPC and the q-axis component being 13% smaller for the uMPC. Also, the RMSE obtained from GPC was 2.19 times smaller in the d -axis component than the LQR RMSE, and they present approximately the same value for the q-axis component. In addition to the good results obtained, the constrained GPC has the advantage of requiring fewer sensors and implicitly avoiding the windup issue.
A comparison between the active and reactive powers at the inverter output and at the load is shown in Figure 9. A load power track can be perceived, which means that the power required in the load is provided by the inverter output. Two Simulink PQ measurement blocks were used to measure the powers at the inverter output and the load.
The performance comparison of the algorithms is illustrated in Figure 10. The execution time of each algorithm is presented in Figure 10(a). It is important to mention that the time is only valid for comparison between the algorithms because the simulation was not done in real time. It is clear that the execution time is shorter for GPCGPAD and GPCFAMA than for Quadprog by more than one order of magnitude and for more than two orders for GurobiSD and GurobiBar. In Figure 10(b) the number of iterations are shown. Even though GPCGPAD and GPCFAMA require a larger number of iterations than the other algorithms during reference changes, they still present shorter computation times, since each iteration is much cheaper from a computational point of view. Table 3 shows the comparison of the average and maximum execution times, obtained from 10 simulations with each method. From the collected execution time data, the GPCFAMA maximum observed time, with 76 iterations, was 5.90 times faster than Quadprog, 94.69 times faster than Guro-biSD, and 84.72 times faster than GurobiBar. When compared with GPCGPAD with 240 iterations, GPCFAMA was 1.43 times faster.    The certification analysis experiment was done to analyse the tightness of the bounds and the increase on the number of iterations with different control and prediction horizons. The certification analysis was applied to the case study problem. The control horizon ranged from N u = 5 to N u = 10 for each control variable and the prediction horizon was kept fixed at N = 20 for the output variables. The theoretical bounds for GPCGPAD were obtained from (18) and the solution of the MILP defined in (21). Experimental results for GPCGPAD can be seen in Figure 11(a). The theoretical bounds for GPCFAMA were obtained from (35) and the sample based estimation approach with 10 000 samples. Experimental results for GPCFAMA can be seen in Figure 11(b). From the experiments it can be seen that the theoretical bounds on the number of iterations increase with the horizons and that they are relatively tight. For GPCGPAD the difference was less than one order of magnitude (4.92 times) and for GPCFAMA was less than 2 orders of magnitude (35.36 times). These theoretical bounds are very important to know, a priori, the WCET. With this information, it is possible to determine if the real-time requirements of the control system can be met.

FPGA IMPLEMENTATION
In order to evaluate the computation time of the algorithms in a high-performance hardware, Algorithms 1 and 2 were implemented in an FPGA. The FPGA used was an Altera, MAX 10 family, part number 10M50DAF484C7G. MAX 10 family presents embedded analog to digital converters and 50 000 programmable logic elements. The on-line optimisation part of the algorithms were implemented directly in a custom logic with a hardware description language called Verilog. To represent the numbers a 32-bit fixed-point arithmetic was used, with 16 bits for the decimals. A state machine was used to synchronise the algorithm and all the matrix operations were implemented for parallel computation.   (7) GurobiBar 178.97 ms 326.18 ms (10) The worst case observed in the simulation of Section 6 was replicated for implementation in the FPGA. The results obtained for the optimisation computation time can be seen in Table 4. A 50 MHz clock signal was used, and due to the parallelism implemented the execution time is quite fast, with a maximum value for the control signal computation of 76.84 s for GPAD and 24.36 s for FAMA. The execution times obtained are consistent with the expected values from (36) and (37). They are compatible with the case study switching frequency, which is 10 kHz even if we add the time used to compute the GPC and the free-response in parallel computation in the same way presented in [21] with 0.14 s and 0.80 s, respectively. Also, even a switching frequency of 20 kHz can be achieved with the GPCFAMA algorithm. Conversely, the theoretical WCET for both algorithms is not compatible with the sample time. It is important to mention that the theoretical WCET tends to be conservative, and in most cases, the control system can still be used. However, if hard real-time guarantees are required, the FPGA could be changed to another one with faster clock The results are also compatible with state-space solutions presented in similar hardware, as in [35] where an state-space MPC was used to control the vibration in flexible beam. The plant was modelled with 14 states and a prediction horizon of 14 samples and an interior point method was implemented in an FPGA with 70 MHz of clock frequency, which resulted in an optimisation process executed in 30.0 s. In [36], an MPC with active-set method was applied to control a motor servo system modelled with 2 states. The control horizon used was of 3 samples and the computation time, in an FPGA with 100 MHz, was 20.0 s.

CONCLUSION
This work proposed two fast computation GPC algorithms based on the accelerated dual gradient projection method and the fast alternating minimisation algorithm. Theoretical bounds on the number of iterations for the algorithms were presented

FIGURE 11
Certification analysis for the bound on the number of iterations and are quite important to define the control system components to meet real-time requirements. The proposed algorithms and the commercial solvers considered in this study showed similar results in terms of the resulting control signal, but the proposed methods are at least two orders of magnitude faster when the same hardware is used. The proposed methods allow straightforward implementation in an FPGA because they use only basic operations for on-line computation. The implementation in an FPGA proved to be quite fast for the case study presented and a switching frequency of 10 kHz was considered. Also, even a switching frequency of 20 kHz can be achieved with the GPCFAMA algorithm. These contributions make the constrained GPC suitable for embedded systems and may eventually increase its usage to control processes with fast dynamics.