Journal of Process Control On combining self-optimizing control and extremum-seeking control – Applied to an ammonia reactor case study

In this paper, we combine self-optimizing control and extremum-seeking control in the context of real- time optimization. Self-optimizing control is based on controlling a single measurement or measurement combination whose optimal setpoint is insensitive to the expected disturbances. This gives a fast reaction to disturbances, but the optimal setpoint may change over time because of larger deviations from the nominal optimal point or due to unmodelled disturbances. Extremum-seeking control, on the other hand, belongs to a class of model-free methods that optimizes the system based on directly measuring the cost. However, it converges slowly. In this paper, we propose to use an extremum-seeking controller to provide setpoint adjustments to the self-optimizing controller in order to improve the convergence rate. The key idea here is to keep the process near the optimal region on a fast timescale using self-optimizing control and ﬁne tune the setpoint on a longer timescale using extremum-seeking control. We verify the proposed method using an ammonia reactor case study.


Introduction
We want to optimize the operation of a process by minimizing a cost function J subject to process constraints. Often a real-time optimizer (RTO) or a model-based controller such as economic model predictive control (E-MPC) is proposed. Both approaches use a process model together with measurements from the process to determine the optimal operating conditions by solving a numerical optimization problem online. However, this is often computationally expensive, even with today's computing power. In addition, model-based controllers may be sensitive to unmodelled disturbances and structural mismatch, but this is not the main issue in this paper.
Several different measurement-based alternatives to RTO have been developed in the past two decades that avoid the online optimization by simply transforming the optimization problem into a feedback control problem. These are sometimes classified under direct input adaptation-based methods. Such methods are computationally cheap compared to numerical optimization-based methods, since optimization is done via feedback. Extremumseeking control, necessary conditions of optimality (NCO) tracking, extremum-seeking control has received much attention and several developments have been made [5].
Model-free extremum-seeking control methods solely rely on cost measurements based on exciting the process. Consequently, a clear advantage is that they are not affected by plant-model mismatch and unknown disturbances. However, since the steady-state gradient from the inputs to the cost is estimated from the cost measurements, it requires timescale separation between the process dynamics, excitation signal and convergence rate [6]. Thus, unless the process dynamics are very fast, the convergence to the optimal point is usually prohibitively slow [7,8]. The use of transient measurements may result in erroneous steady-state gradient estimation. In addition, external disturbances may affect the gradient estimation, and lead to deviations from the optimal operation [9]. Disturbance handling in extremum-seeking control may be improved using measured disturbances as proposed in [9,10], and [11].
The self-optimizing control and extremum-seeking control methods have been developed relatively independently since 2000. The authors in [12] successfully combined self-optimizing control with NCO tracking in a hierarchical structure and demonstrated that the measurement-based optimization techniques and the model-based self-optimizing concepts are complementary. However, the gradient was estimated using finite differences which gave relatively poor NCO tracking. The authors suggested that more advanced gradient estimation and input adaptation methods may give a better overall performance as a future research direction. The use of extremum-seeking control on top of self-optimizing control was briefly discussed in [13] using the classical extremum-seeking method. However, the authors only considered measured disturbances for a single-input single output system. Additionally, based on the simulation results presented, the authors in [13] did not consider a clear time scale separation between the extremum-seeking and self-optimizing controllers.
In this paper we extend the work from [12] and [13] and provide a detailed description on the combination of the model-free extremum-seeking controller with model-based self-optimizing control. We propose to use an improved gradient estimation method using least squares estimation and provide a framework for a multivariable system. We then apply the proposed control structure to a multivariable ammonia reactor case study with 3 control inputs and consider both unmeasured and unmodelled disturbances. We compare the performance of the proposed method with just the self-optimizing control and just the extremum-seeking control implemented independently and demonstrate clear performance improvements due to the time-scale separation between the extremum-seeking and self-optimizing controllers.
The paper is structured as follows. Section 2 briefly describes the ideas of self-optimizing control as described in [3] and [14] and extremum-seeking control as described in [15] and [16]. Section 3 describes the framework in which we combine the two methods in a hierarchical structure. We then exemplify the proposed method using an ammonia synthesis reactor in Section 4. The results are discussed in Section 5 before concluding the paper in Section 6.

Background
Consider a process where the steady-state optimal operation of the process can be formulated as, where J ∈ R is a scalar cost, x ∈ R nx denote the vector of state variables, u ∈ R nu denotes the vector of manipulated variables and d ∈ R n d denotes the vector of disturbances and f represents the n x independent steady-state model equations. Note that we have no inequality constraints. More precisely, we assume that any active constraints are satisfied and the n u manipulated variables u are the remaining unconstrained degrees of freedom which are available for optimization [3].

Assumption 1.
There exists a smooth function l : R nu × R n d → R nx such that, f(x, u, d) = 0 if and only if x = l(u, d).
We use the steady-state model equations f(x, u, d) = 0, enforced as equality constraints to formally eliminate the states. The steadystate cost can then be expressed just in terms of the inputs u and disturbances d, The steady-state version of the optimization problem (1) is then equivalent to This says that the input u should be manipulated to optimize the steady state performance for any given disturbance d. We make the following additional assumption: Assumption 2 ensures that the problem is convex with a unique minimum at u = u opt (d) for a given disturbance d.

Self-optimizing control
Self-optimizing control is a strategy of selecting an optimal measurement combination c as controlled variables, such that the impact of known but unmeasured disturbances d on the optimal operation is minimized. This is achieved by using the system model offline to compute an optimal measurement combination. The ideal self-optimizing variable would be the gradient J u which should be controlled to a constant setpoint of 0. However, in most applications, the gradient cannot be measured. An alternative is to identify a controlled variable c ∈ R nc (with n c = n u ) as a function of the available measurements y ∈ R ny . The simplest approach to select a linear combination of measurements is given by, where, y m = y + n y is the vector of available measurements, which generally is corrupted by measurement noise n y , and H ∈ R nc ×ny is the measurement combination or selection matrix. In addition to finding H, we must also decide on the setpoint c s which is typically chosen as the nominal optimal value, c s = Hy opt 0 , where the subscript 0 denotes the nominal operation point with d = d 0 .
Several approaches can be used to calculate the optimal measurement combination c = Hy. The reader is referred to [17] for a comprehensive review. Most approaches are based on local linearization around the nominal optimal point. In this paper, we consider the exact local method as introduced in [14] and further developed in [18] and [19]. In this method, the optimization problem (3) is approximated by a quadratic approximation and a linearized measurement model. Let the linearized measurement model be represented by, where G y ∈ R ny×nu and G y d ∈ R ny×n d are the gain matrices from u to y and d to y respectively. The optimal selection matrix H, in terms of minimizing the loss for J ss with respect to the expected disturbances and measurement noise, is then given by [18] H T = (YY T ) −1 G y (8) where, and W d and W n y are diagonal scaling matrices for the expected magnitudes of the disturbance and the measurement noise, respectively. F = ∂y opt ∂d is the optimal sensitivity matrix which describes how the measurement vector that correspond to the optimal operation y opt , change with unit change in the disturbance. The optimal sensitivity matrix may be determined analytically using or it may also be determined numerically by perturbing the disturbances and re-solving the optimization problem as described in [20]. Note that the optimal H in (8) is not unique, but the nonuniqueness may be absorbed into the controller. As seen from the equations above, the optimal selection matrix H in (8) is based on the plant model G y and the optimal sensitivity matrix F for the expected disturbances. Due to the linearization around the nominal optimal point, the controlled variables combination is only locally valid around this nominal optimal point. If a disturbance moves the process far from the nominal optimal point, the local model approximation may be poor, resulting in higher steady-state loss. Over time, as the plant-model mismatch increases, the increase in the loss may no longer be acceptable. This requires re-optimization and computation of new optimal setpoints c s . Additionally, any unmodelled disturbances that are not accounted for in the optimal sensitivity matrix cannot be handled efficiently.

Extremum-seeking control
Extremum-seeking control is a model-free optimization method, where the steady-state performance of the system is optimized purely based on measuring the cost. The objective is to drive the estimated steady-state gradient of the cost J u to zero. The main advantage of extremum-seeking control compared to many other real-time optimizers is that no plant model is required. This enables extremum-seeking control to optimize the performance of complex systems where the process model is not known accurately. The main disadvantages are that it requires that the cost function is measured and that the convergence can be very slow.
Unlike self-optimizing control, which is based on local linearization of the model around the nominal operating point, extremum-seeking control is based on local linearization of the measured cost around the current operating point. The input and the cost measurements are used to continuously estimate the steady-state gradient J u around the current operating point. The estimated gradient is then controlled to a setpoint of zero.
There are different ways of estimating the gradient based on the input and cost measurements. The classical approach is based on exciting the system with a sinusoidal signal and using a correlation based on high-pass and low-pass filters to retrieve the steady-state gradient information [6]. An alternative extremumseeking scheme method was proposed in [16], where a linear least square estimation method was used to estimate the steady-state gradient which allows for a more general class of excitation signals. The least square method is simple to implement and has fewer tuning parameters than the classical method. The least squares approach also provides a natural platform extending to multivariable systems. Improved performance using a recursive least squares approach was also reported in [21]. Therefore, we use the least-square based extremum-seeking control in the rest of the paper.
In this work, we extend the least-square based gradient estimation presented in [16] to a multivariable system. The goal is to estimate the gradient from the inputsũ to the measured cost J. In the least-squares based extremum-seeking control, the last N samples of data is used to fit a local linear cost model of the form, where Jũ ∈ R nu is the vector of gradients fromũ to J and m ∈ R is the bias. At the current sample time k, letJ = [J k · · ·J k−N+1 ] T ∈ R N be the vector of the last N samples of the measured cost andŨ = [ũ k · · ·ũ k−N+1 ] T ∈ R N×nu be the vector of the last N samples of the input. A moving window of fixed length N is then used to estimate the gradient using the linear least squares method [22] where Â ∈ R (nu+1) is the vector of parameters to be estimated and is given by and ∈ R N(nu+1) is the regressor vector given by The analytical solution to the least squares problem is given by The application of ordinary least squares requires that N > n u . Note that in theory, it is not necessary to use a dither signal when this approach is used, but for practical purposes it is recommended, and in our case study we use a sinusoidal dither signal with an appropriately chosen amplitude. Once the gradient vector Jũ is estimated, n u integral controllers can be used to drive the gradients to zero using as degrees of freedomũ (setpoints to the lower level controllers). The integral controller in general can be written as, where K I ∈ R nu×nu is the gain matrix and T s is the sample time.
In this paper we use decentralized control where K I is diagonal. A schematic representation of the least squares-based extremumseeking control is shown in Fig. 1 An additional dither signal is added to the control input to provide sufficient excitation in the input and cost measurements for accurate gradient estimation. The frequency of the dither signal must be significantly slower than the plant dynamics such that the plant can be approximated as a static map as described in [5,21] and the references therein. For a multi-input system, each input should have a unique perturbation frequency in order to estimate the gradient of the cost measurement with respect to each input.
In order to estimate the static gradientĴũ using dynamic data, the integral gain K I must be chosen small enough such that the time-scale of the gradient estimation is even slower than that of the  dither signal. Since the linear model assumption is valid only locally around the current operating point, the gradient is estimated using only recent samples of data (i.e. a moving window of fixed length N4). It was shown in [16] that the least-squares based extremumseeking control is stable and that the error is small for a sufficiently small product of the adaptation gain and sample size K I N.
Since the gradient estimation relies entirely on the cost measurements, it requires accurate cost measurements. The convergence to the optimum will also be slow for a dynamic process.

Proposed method
In this paper, we propose an hierarchical implementation with separate optimization and control layers as proposed in [14] and shown in Fig. 2.
Due to the timescale separation required between the optimization and control layers [3], the extremum-seeking controller is in the slow optimization layer and thus replaces the conventional RTO. Self-optimizing control is in the faster setpoint control layer below and tracks the updated setpoint given by the extremum-seeking controller. In other words, the extremumseeking controller uses the measured cost J to compute the setpoint c s which is provided to the self-optimizing control. The con-troller output from the extremum-seeking controller isũ = c s in (11)- (15). 1 It may be argued that the self-optimizing control layer is redundant since an extremum-seeking scheme can directly manipulate the process to optimize the objective function. However, by using a purely data-driven approach, we ignore any a-priori knowledge about the system and the effect of disturbances. In addition, the extremum-seeking controller does not make use of measurements besides the cost measurements. Finally, the convergence to the optimum is slow. The proposed hierarchical combination of extremum-seeking control and self-optimizing control avoids the shortcomings of the extremum-seeking scheme and improves the convergence to the optimum. This is primarily due to a faster initial reaction of the self-optimizing layer to known (modelled) disturbances. Following a disturbance, the self-optimizing control quickly brings the operation point close to the optimal region, and on a slower timescale, the extremum-seeking control fine-tunes the setpoint and removes any loss associated with the self-optimizing control.
The extremum-seeking layer handles the plant-model mismatch and unmodelled disturbances and removes any steady-state loss by adjusting the setpoint c s . This also avoids re-optimization e.g. using real-time optimization.
In summary, we use the knowledge about the system to stay in the near-optimal region using self-optimizing control in the presence of disturbances. The extremum-seeking control helps to remove, or at least reduce the losses due to plant-model mismatch, it handles any unexpected disturbances, and it fine-tunes the optimal operating point. The key properties of the two methods are summarized and compared in Table 1, which shows that the self-optimizing control and extremum-seeking control are complementary rather than competing.
Note that, the proposed method is not just restricted to the least squares-based extremum-seeking control, and other extremumseeking control methods, such as classical extremum-seeking control, or recursive least squares based extremum-seeking control etc. [6,21,5] may be used instead on top of the self-optimizing control layer in Fig. 2.

Stability issues
In this section, we provide some discussions on the stability of the combined self-optimizing control and extremum-seeking Table 1 Properties of self-optimizing control and extremum-seeking control. control layout presented in Fig. 2. As mentioned earlier, extremumseeking controller has three timescales, namely, • fast -controlled plant dynamics • medium -dither frequency • slow -convergence to the optimum A good extremum-seeking controller is tuned such that there is a clear timescale separation between these three timescales. The self-optimizing controller is included in the controlled plant and belongs to the fast timescale [6]. By introducing the self-optimizing control below the extremum-seeking control, the perturbation frequency and the adaptation (integral) gain K I must be chosen such that the timescale separation enables the static map assumption. Therefore, when seen from the slow timescale of the extremum-seeking controller, the closed-loop system comprising of the self-optimizing controller and plant is a static map J = h(c s ).
The stability results for both the classical extremum-seeking controller provided in [6] and the least squares based extremumseeking controller provided in [16], both assume a smooth stabilizing control law parameterized by a "performance parameter". This performance parameter is used as the handle by the extremum-seeking controller. In our paper, the control law is given by the self-optimizing control layer and the "performance parameter" is equivalent to the setpoint for the self-optimizing variable c s .
To summarize, existing stability results from [6] for the classical extremum-seeking control and [16] for the least squares based extremum-seeking control also hold for the combined hierarchical structure in Fig. 2, if the following two conditions are met: 1 The self-optimizing setpoint control layer is closed-loop stable. 2 The perturbation frequency of the ESC is sufficiently small compared to the timescale of the controlled plant which includes the stabilizing self-optimizing controller.
Choice of tuning parameters. According to the stability analysis results in [16], the product K I N is the only important quantity that must be chosen small enough. This is a reasonable measure, since a small product of the adaptation gain K I and the time window N intuitively means that the measurements used for gradient estimation are in the small neighbourhood of the current operating point u. As for the choice of the window length for the past measurements, N must be sufficiently small such that the error in the gradient estimate is bounded as described in detail in [16]. Alternatively, instead of using a fixed moving window of the past N samples, a forgetting factor may be used in a recursive least squares estimation framework, as adopted in [21]. As for any extremumseeking control framework, the adaptation gain K I must be chosen sufficiently small, such that there is a clear time scale separation between the dither and the convergence to the optimum. This is required in order to validate the assumption of the plant being a static map, which is required for the extremum-seeking control. For more detailed guidelines on tuning the adaptation gain, the reader is referred to [5].

Effect of disturbances
Very little of the literature on extremum-seeking control consider explicitly the effect of disturbances. Disturbances typically trigger fast dynamics, which may invalidate the assumption of the plant operating close to a static map. Consequently, the introduction of the fast dynamics leads to erroneous gradient estimation, especially with the least squares gradient estimation method used in this paper. In other words, if the cost measurement in the gradient estimation time window is in transients due to the disturbances, then the least squares method fits a wrong gradientĴ u . The effect of abrupt disturbances on the extremum-seeking scheme has been well motivated in [9,11], and [10], along with some modifications to improve disturbance rejection. However, all these modifications require the disturbances to be measured. Measured disturbances may also be handled by the least squares-based extremum-seeking scheme described in Section 2.2, by explicitly including the measured disturbances as a part of the regressor in (15) and replacing (11) with where J d ∈ R n d is the vector of gradients from the measured disturbances d to the cost J. Unfortunately, unmeasured disturbances may still result in erroneous gradient estimation. Given that the extremum-seeking control problem at hand is essentially a static optimization problem, the only way to avoid this problem is to use a steadystate detection, as used in the traditional steady-state RTO. The extremum-seeking scheme can be triggered only if the cost measurement in the gradient estimation window is close to steady-state operation. Dynamic changes in the cost measurement resulting from a disturbance will be flagged by the steady-state detection routine and the extremum-seeking scheme is temporarily halted until the cost measurement in the gradient estimation time window comes close to steady-state operation. By doing so, the static map assumption used by the extremum-seeking scheme is always valid. This halt is typically known as the steady-state wait time and is commonly used in the traditional steady-state RTO paradigm. In fact, the method proposed in [11] is precisely a steady-state wait time routine implemented using a supervisory state machine.
In many processes with long settling times, the steady-state wait time can lead to a very slow convergence to the optimum. Alternatively, in some process systems, one can make use of some heuristics to avoid the steady-state wait time, such as to bound the magnitude of the individual gradients (Ĵ∼ u i ) in (16) to a value J u i ,max The bounding limits aggressive input usage. Based on (16), we propose to introduce a maximum change in the input between samples, ũ i,max , and choose, Note that although this approach is used in our case study as illustrated in Fig. 7, this additional heuristic of bounding the gradient is not part of the core methodology presented in this paper, but should be viewed as an alternative approach to the steady-state wait time.

Case study -ammonia synthesis reactor
In this paper, we apply the proposed method to a three-bed ammonia reactor with heat integration. A flowsheet, including the control structure for the proposed method, can be found in Fig. 3. The model was first described in [23] for stability analysis of the ammonia reactor. It was recently utilized in [24] for the application of economic non-linear model predictive control (E-NMPC). The E-NMPC approach was able to reject disturbances and to avoid limit-cycle behaviour while achieving optimal operation. However, plant-model mismatch and the non-linear problem may limit its practical application. Furthermore, in order to avoid repeated numerical optimization and to handle disturbances and plant-model mismatch more efficiently, we consider in this paper self-optimizing control, extremum-seeking control and a combination the both approaches. The incorporation of the reactor into the ammonia synthesis loop requires adjustments to the setpoints of the self-optimizing controllers as well [25].
The objective is to maximize the extent of reaction for a given feed, that is As a maximized conversion per pass (or alternatively a maximized extent of reaction for a given feed) results in less reactants recycled, we have a smaller feed to the reactor resulting in an even higher conversion per pass. Consequently, the positive feedback of the extent of reaction results in minimizing the recycle flow, and therefore, the cooling cost and the recycle compressor duty.

Summary of the model
The model consists of three sequential reactor beds and one heat exchanger. The inlet stream to the reactor system (denoted by subscript in) is split into four streams; one quench flow to each bed and a preheated flow to the first reactor bed. The quench split ratios correspond to the three manipulated variables u 0 = u 0,1 u 0,2 u 0,3 T . The three reactor beds are discretized into a cascade of continuously stirred tank reactors (CSTR). We use the Temkin-Pyzhev kinetic expression for the reaction rate. The heatexchanger is modelled using the number of transfer units (NTU) method. The resulting model without controllers corresponds to a differential-algebraic system with x ∈ R 30 as dynamic state variables corresponding to the temperatures in the beds, z ∈ R 30 as algebraic state variables corresponding to the ammonia mass fractions in the beds, and u 0 ∈ R 3 as manipulated variables. A detailed model description can be found in [24]. The system is modelled using CasADi [26]. The nominal optimal point and the optimal sensitivity matrix F for self-optimizing control were computed using the IPOPT non-linear problem solver [27]. The plant model was simulated using IDAS [28].

Controller design
The potential instability in case of disturbances as described in [23] requires a stabilizing "slave" control layer below the self-optimizing control layer. It was shown in [24], that if the reactor is operated close to the nominal optimum and without control, reactor extinction may result even from small disturbances compared to the large disturbances investigated by [23]. Hence, also for the case when extremum-seeking control is utilized without self-optimizing control, a stabilizing "slave" control layer is required resulting in a cascade control structure in all investigated control structures. The chosen "slave" controllers are temperature controllers and will be explained in the following subsection. The slave temperature loops as well as the master self-optimizing control loops were tuned using the SIMC rules [29]. These controllers were directly included into the differentialalgebraic model increasing the number of differential variables by three for the extremum-seeking control or six for the combined self-optimizing and extremum-seeking control. The extremumseeking controllers were implemented in discrete time resulting in a discrete-continuous representation.

Slave temperature controller pairing and tuning
The slave controllers use the splits (bypass) u 0,i to control the corresponding bed inlet temperature. This is a pure mixing process with instantaneous dynamics, and an integrating controller is recommended [29]. The desired closed loop time constant for the three controllers was chosen to be c = 10 s. The resulting integral gain K I for the three temperature loop controllers can be found in Table 2.

SOC controller pairing and tuning
In our case, the SOC controllers give the setpoints to the respective slave temperature controllers. The measurements y for self-optimizing control are selected to be the inlet and outlet temperature of each reactor bed; i.e.
Hence, only two measurements were used for the calculation of H i in (8). This local treatment of each bed does not necessarily result in overall optimal selection matrices H i . It would be possible to increase the number of measurements, e.g. using all 6 measurements for the calculation of the selection matrices. This will reduce somewhat the steady-state loss in self-optimizing control [19]. However, it may also lead to undesired dynamic behaviour through coupling and delays in the self-optimizing variables c.
The disturbances are the inlet conditions; The chosen scaling matrices in (9) To get a fast response in the self-optimizing control layer, (as it will be combined with an upper extremum-seeking layer), the closed-loop time constant c for each of the three controllers is equal to its respective time delay, which is in the range of 300 s. The resulting PI parameters (K p and K I ) can be found in Table 2.
The introduction of "slave" temperature controllers reduces the coupling between the self-optimizing controllers as is shown by the relative gain array [30]. The relative gain array of the selfoptimizing controllers including the "slave" temperature loops are given by a value close to 1 in the diagonal elements. Without the "slave" temperature loops, we receive diagonal elements close to 2 which indicates a stronger coupling in-between the controlled variables c.

Extremum-seeking controllers tuning
The upper layer in the control structure in Fig. 2 consists of the extremum-seeking controllers. These slow integral controllers give the setpoints to either the base layer temperature (T s , denoted T+ESC) or the self-optimizing controllers (c s , denoted T+SOC+ESC). The estimation of the gradientÂ according to (15) is performed usingũ as the setpoint of the respective slave controller (T or SOC). It is assumed that the disturbances are unmeasured. Hence, is given by where m is the present value of the cost and Jũ the gradient. As the disturbances are not corrected for, this will result in wrong gradient estimation when disturbances occur. One way to rectify this problem is to temporarily turn off the extremum-seeking controllers Table 3 Controller tuning parameters for the extremum-seeking controllers in the case of only temperature controllers (T) and also self-optimizing control (SOC) layer providing setpoints to the temperature controller.  (18). The tuning of extremum-seeking controllers depends on many factors. We need to choose the number of past measurements N, the periods and amplitudes of the sinusoidal dithers, as well as the integral gains. All these parameters have an influence on each other resulting in a difficult selection. The parameters were chosen based on trial and error to achieve satisfactory performance and are given in Table 3. The time horizon for the past measurements was chosen to be 1 h in all cases. This corresponds to N = 240 samples with a chosen integrator step time of t int = 15 s. Equal effort for both T+ESC and T+SOC+ESC tuning was attempted to achieve comparable performance.

Results
In order to compare the proposed methods, two disturbances were investigated; a disturbance in the inlet mass flow rateṁ in , corresponding to a modelled disturbance in self-optimizing control, and an unmodelled disturbance in the reaction rate r. These disturbances were chosen as they correspond to the largest losses for the self-optimizing control structure (not shown). Hence, the improvement using extremum-seeking control is most pronounced. In addition, both disturbances would result in reactor extinction, if the stabilizing temperature controllers would not be present. The integrated loss (cost difference), is used to compare the proposed methods. The first considered disturbance is a +20% step change in the inlet mass flow rate as this disturbance results in the highest steady-state loss for self-optimizing variables [25]. This disturbance is considered in the calculation of the SOC variables. The cost J = and the integrated loss (28) are shown in Fig. 4. The cases with extremum-seeking control (solid lines) settle to the new optimum in contrast to pure self-optimizing control (dashed red line). The combination of self-optimizing control and extremum-seeking control gives a large reduced loss in produced tons of ammonia. As seen in Fig. 4, this reduction corresponds to 4.95 t ammonia in the investigated time-frame of 18 h. One could argue that this is caused by suboptimal tuning parameters in the pure extremum-seeking control. By taking a look at the time the disturbance is occuring, we claim that this is not the case. Fig. 5 shows a close-up of the response in the cost function for the first 1.2 h after the disturbance occurs. From this figure, it can be clearly seen that both ESCs (solid lines) initially follow their respective slave controllers, before deviating when the ESCs start changing the setpoints to the slave controllers. Both ESC control structures are in fact moving initially in the wrong direction, that is, to a reduced extent of reaction. This can be explained by the past measurements, before the disturbance,  which are still used at this point. One approach to circumvent this behaviour is to use a smaller time horizon (smaller N). This results on the other hand in a drift away from the optimal setpoint on a long time scale. Hence, it is preferable to have a slightly suboptimal initial performance. A disturbance in the reaction rate r is an unmodelled disturbance which is not considered in the calculation of the optimal selection matrices according to (8). It can be considered a plant-model mismatch. The simulation results for a −20% step change in the reaction rate r are shown in Fig. 6. Similarly to a disturbance in the inlet mass flowṁ in , the control structure based on the proposed method with both self-optimizing and extremum-seeking control settles to the new optimum after 7 h whereas extremum-seeking control alone requires around 13 h. During the time the controllers require to settle to the new optimum, the loss is reduced in the proposed control structure with SOC. Over 18 h, the proposed control structure has a reduced loss of 6.71 tons of produced ammonia. Here it has to be noted, that despite this disturbance was not included in the design phase, the self-optimizing control structure has a reduced loss. This can be explained by general favourable properties of selfoptimizing feedback with regard to disturbances and plant-model mismatch.
In Section 3.2 we proposed a bounding heuristic in (18) forĴ∼ u i to reduce the effect of erroneous gradient estimates caused by disturbances, which was used in the above simulations. Fig. 7 shows the bounds and the gradient estimate for gradient 1 as well as the corresponding manipulated variable, c s,1 , for a +20% step change in the inlet mass flow rate. As we can see, the gradient estimate at the time of the disturbance (t = 3 h) is indeed outside the bound in (18). At t = 3.025 h it reaches a minimum value ofĴ∼ u i = −93.42 kg s −1 K −1 , a value 300 times as large as the bound. The estimate is back within the bound one hour after the occurrence of the disturbance (t = 4 h). This corresponds to the time horizon of the past measurements according to (15). If no bounds are introduced, the wrong estimate of the gradient would require a very small integrator gain K I for the extremum-seeking controller resulting in a very slow convergence. In Fig. 8 we compare the proposed correction scheme with steady-state detection and with no correction. The controller tunings were adjusted in both additional cases. As we can see in Fig. 8, no correction on the gradient results in worse performance. The extremum-seeking controllers have to be tuned slower to avoid moving too far into the "wrong" direction. (See Fig. 8a). As a result, the convergence to the new steady-state optimum is much slower. Steady-state detection as an alternative was implemented by deactivating the extremum-seeking controllers if the variance of the N past measurements of controlled variables of the self-optimizing controllers is above a certain threshold (as commonly used in traditional RTO schemes). We see that the convergence rate to the new optimum is similar to bounding the gradient. However, the loss is increased in the first hour while the extremum-seeking controllers are deactivated.

General discussion
As shown, the hierarchical combination of self-optimizing control with extremum-seeking control improves the rejection of disturbances in the ammonia-reactor case study. This is caused by the (fast) rejection of the disturbance through self-optimizing control combined with the final adjustment of the setpoints c s by the extremum-seeking controllers. Is it still possible to speak of selfoptimizing control in the context, when the setpoint is adjusted? Yes, the idea of self-optimizing control is to allow for less frequent changes in the setpoint. Skogestad [3] mentions in the original paper on self-optimizing control explicitly the possibility to adjust the SOC setpoints using an optimizing layer. This is especially important considering the incorporation of the reactor into the synthesis loop in which the recycle is neglected [25]. The proposed method adjusts then the setpoints, if it is not possible to solve an optimization problem for the overall process Self-optimizing control is model-based. However, the proposed method is less reliant on the accuracy of the model than many other model-based approaches. This is because it only uses the model offline for the calculation of the optimal selection matrices H i and the extremum-seeking controller accounts for any plantmodel mismatch that is not accounted for in the self-optimizing layer below, hence making it more resilient to modelling errors. The setpoints of the of the self-optimizing controllers are handled by the extremum-seeking controllers in a model-free approach.
There remains however one limitation to the proposed methodology; it is necessary to measure (or estimate) the cost function for the extremum-seeking controller. This is an inherent limitation of extremum-seeking controllers.

Impact of the scaling matrices
The performance of the self-optimizing controllers depends on the scaling matrices given in Eqs. (24) and (25). As outlined in [14], these scaling matrices will scale the combined disturbance and measurement noise vector to unit length. If the magnitude of a disturbance or measurement noise changes, then the exactlocal method will wrongly weigh the disturbance and the resulting measurement selection matrix is not optimal. Correspondingly, the steady state loss of the overestimated disturbance will be lower whereas the steady state loss of the other disturbances can be both higher or lower [25].
The combination of self-optimizing control with extremumseeking control alleviates however the problem associated with the higher steady-state loss as the extremum-seeking controllers are adjusting the setpoints to the self-optimizing controllers. Consequently, the integrated loss may be increased with wrongly chosen scaling matrices but there will be still an improvement over pure extremum-seeking control.

Conclusion
We have shown that extremum-seeking control and selfoptimizing control are complementary rather than competing. This is caused by the different timescale at which the control strategies are operating. By combining self-optimizing control and extremum-seeking control, we are able to utilize the advantages of each method and improve the convergence to the optimum. Using a three-bed ammonia reactor case study, we demonstrate that the combined system can handle unmeasured disturbances and at the same time correct for plant-model mismatch.