Computers Real-Time optimization as a feedback control problem – A review

Feedback optimizing control aims to achieve asymptotic optimal operation by directly manipulating the inputs using feedback controllers, without the need to solve numerical optimization problems online. The main question when designing feedback optimizing control is then what to control such that the economic objectives are translated into control objectives. Over the past two decades, active research has resulted in several different approaches to feedback optimizing control stemming from different application areas, leading to a rich source of literature. The different methods differ on the choice of the controlled variables, the need for detailed process models, type of measurements used, convergence speed, accuracy, ease of implementation etc. This paper aims to provide a survey of these different approaches under the uniﬁed umbrella of feedback optimizing control , and provides an overview and comparison of the different feedback-based real-time optimization approaches for continuous processes. )


Introduction
In the face of growing market competition and increased focus on sustainability, energy efficiency and safety, online optimization of process operations are becoming increasingly important. Realtime optimization (RTO) explicitly deals with the optimal economic operation of the process. A widely accepted definition of real-time optimization is that it is a work flow where the decision variables are iteratively adjusted using the system model and/or real time process measurements, in order to minimize the operational cost and satisfy constraints ( Marlin and Hrymak, 1997;Chachuat et al., 2009;Krishnamoorthy, 2019 ).

Conventional real-time optimization (RTO)
Traditional real-time optimization (RTO) framework computes the optimal steady-state setpoints by solving a steady-state numerical optimization problem that comprises of the following components 1. Rigorous nonlinear steady-state process models, 2. Process, equipment, and environmental constraints, 3. Economic objective function that constitutes the cost of raw material, value of the products, operational costs, environmental taxes etc.
Traditional RTO implementations are thus based on steady-state nonlinear models. The model parameters are updated using data from time periods corresponding to steady-state process operation, in order to match the plant measurements and the model predictions. The updated model is then used to re-optimize for the new setpoints ( Seborg et al., 2010 , Chapter 19), ( Marlin and Hrymak, 1997 ). The traditional RTO framework is thus a two-step approach: • Step 1: Data reconciliation -Update nonlinear model using steady-state measurements. • Step 2: Numerical optimization -Compute new optimal setpoints using the updated model.
Most of the commercially available RTO software packages are based on the repeated identification and optimization scheme using steady-state models ( Câmara et al., 2016 ). The justification for this approach is that for most continuous processes, the economic operation of the plant often occurs at some steady-state. The exception are processes with frequent grade changes, cyclic operations, and batch processes. As such, the objective in most continuous processes is simply to find the economically optimal steadystate operating point as a function of the given operating conditions. To this end, steady-state process optimization for continuous processes will be the main focus in this paper. Step The decision variables for the RTO layer are typically the setpoints for the controlled variables, which are then given to a setpoint tracking control layer below. The control layer adjusts the manipulated variables in order to keep the measurements at the optimal setpoints computed by the RTO layer above. Fig. 2 shows the typical hierarchical decomposition into optimization and control layers.
Despite the economic benefits and promises, traditional realtime optimization is not widely used in practice as one would expect. Consequently, the full potential of RTO is not exploited in process industries ( Darby et al., 2011;Krishnamoorthy et al., 2018a;Krishnamoorthy, 2019 ). The main technical challenges that limit industrial use of steady-state RTO for process operations include: Challenge 1 Cost of developing the model (offline model development) -Developing good first principle-based models is often challenging and expensive, especially for new application areas with limited domain knowledge. In addition, lack of knowledge or model simplification leads to mismatch between the physical models used in the optimization problem and the real system. With increasing complexity of many industrial processes, simplified firstprinciple models are insufficient to accurately capture the system behavior.
Challenge 2 Model uncertainty, including wrong values of disturbances and parameters (online update of the model) -Since traditional RTO uses steady-state models, the model parameters must be updated using measurements that corresponds to steadystate conditions. To this end, steady-state detection algorithms are employed to detect if the process is operating at steadystate conditions, see for example Kelly and Hedengren (2013) , Câmara et al. (2016) and the references therein. The time between the steady-state conditions is known as steady-state wait time. In a recent review paper on current practices of RTO, Darby et al. (2011) conclude that a fundamental limiting factor of RTO implementation is the steady-state wait-time associated with the online model update. Câmara et al. (2016) also recently pointed out some of the issues with steady-state detection routines used in real industrial RTO systems.
Challenge 3 Numerical robustness, including computational issues of solving optimization problems -Solving numerical optimization problems to compute the optimal setpoints, leads to high computational effort. Although the computational cost is considerably less for solving steady-state optimization problems than dynamic optimization problems, the optimization problem may still fail to converge for large-scale processes due to numerical robustness and convergence issues ( Marlin and Hrymak, 1997 ).
Challenge 4 Conflicts with the planning layer -In process applications, the planning and scheduling layer (which typically sits on top of the RTO layer in the decision-making hierarchy) deals with decisions such as feed purchases, production capacity, storage, and other such operational planning over a time horizon of weeks and months, based on average plant behavior ( Darby et al., 2011 ). RTO, on the other hand, deals with real-time decisions based on the current plant conditions. Since the economics are considered at both the RTO and planning levels, this can lead to inconsistencies, as what is optimum for a process unit or section may not be optimum when all production aspects are considered over a large horizon.
An alternative to the traditional RTO paradigm is the so-called "feedback optimizing control", where the optimization objectives are translated into control objectives, thereby moving the optimization functionality into the control layer ( Morari et al., 1980 ). In a recent review paper, Darby et al. (2011) also considers the question of moving the RTO functionality to the control layer (more specifically MPC) as an alternative RTO formulation, thus resulting in a one-layer solution that "performs a type of hill-climbing optimization". Achieving optimal operation via feedback control circumvents challenge 3, since numerical optimization problems are not solved online. Feedback optimizing control also avoids the steady-state wait-time (challenge 2), since optimal operation is achieved using real time process measurements as feedback. Finally, as will be shown later, many feedback optimizing control approaches may not require detailed process models, hence addressing challenge 1.
Challenges related to human aspects In addition to the technological challenges mentioned above, human aspects such as technical competence and corporate culture are some of the main limiting factors of industrial practice. Conventional real-time optimization requires regular maintenance and monitoring. This is often provided by a team of skilled engineers and a dedicated group with expertise in modeling, control, and optimization, which the company may be lacking. Consequently, the expected benefits of such advanced tools may not be fully realized and this often leads to the application being turned off ( Shook, 2006;Forbes et al., 2015;Darby et al., 2011 ). As Marlin and Hrymak (1997) pointed out, "the greatest demand from optimization users is a system that can be easily understood". For this reason, many traditional process industries still prefer to use simple feedback control tools available in standard digital control system (DCS) to optimize process operations. In such cases, feedback optimizing control provides a systematic procedure that enables asymptotic optimal operation using feedback control.
As shown in Fig. 1 , online process optimization approaches can be broadly categorized into two classes, based on whether one needs to solve numerical optimization problems online, or not. Traditional steady-state RTO and dynamic RTO (including economic MPC) belongs to the former category, where numerical optimization problems are solved online, in order to compute the optimal operating conditions. Feedback optimizing control, on the other hand, belongs to the latter category where asymptotic optimal operation is achieved via feedback control. Here, the main idea is to achieve optimal operation by directly manipulating the inputs u based on feedback measurements y such that the process is operated optimally at steady-state. In other words, the objective is to eliminate the RTO layer by translating the economic objectives into control objectives. This is commonly known as "feedback optimizing control ", "direct input adaptation ", or "implicit RTO " ( Morari et al., 1980;Chachuat et al., 2009;Srinivasan and Bonvin, 2019 ).
Feedback optimizing control for the unconstrained degrees of freedom is an active area of research and has been receiving significant interest over the past two decades. Different feedback-based RTO methods have been proposed not only in the process control literature, but also in electrical and mechanical engineering literature ( Ariyur and Krstic, 2003;Tan et al., 2010;Hauswirth et al., 2017 ). The main objective of this paper is to provide an overview and understanding of the different approaches to achieving asymptotic optimal operation using feedback control, without the need to solve numerical optimization problems online. Different modelbased and model-free methods, and a combination of these will be reviewed in this paper.
Much of the research in this field has been devoted to using steady-state cost gradients J u as the self-optimizing variable, with a few exceptions such as Alstad et al. (2009) , Larsson and Skogestad (20 0 0) , Halvorsen et al. (20 03) . Naturally, this survey paper could also be seen as a review of different gradient estimation methods for real time optimization, where we also discuss and compare the performance of different approaches in terms of accuracy, convergence time, ease of implementation, scalability etc. It is important to note that the objective of this paper is not to convince that simple control structures are superior to model-based optimization routines, but rather to demonstrate that there exists a wide array of methods that can be used to asymptotically drive a process to its optimum using feedback control, as an alternative to model-based steady-state real-time optimization.
The reminder of the paper is organized as follows: The underlying concept of feedback optimizing control relating to the firstorder optimality conditions is shown in Section 2 . Section 3 provides a detailed survey of the model-based approaches. More precisely, Section 3.1 reviews the methods where rigorous nonlinear process models are used offline to select a linear combination of measurements as self-optimizing variables, whereas Section 3.2 reviews model-based gradient estimation methods. Section 4 reviews the model-free gradient estimation methods, where the steadystate cost gradient is estimated directly from the cost measurement. The combination of model-based and model-free approaches are reviewed in Section 5 . Section 6 deals with optimal operation with changing operating conditions. The performance of the different methods are compared using a benchmark Williams-Otto reactor example in Section 7 . Detailed discussions on the key features of the different methods are provided in Section 8 before concluding the paper in Section 9 .
Notational remark In the rest of the paper, we use the following notations.
˜ u -Physical manipulated variables (primal variables) u ⊆˜ u -unconstrained degrees of freedom d -parameters and process disturbances y m -vector of process measurements Morari et al. (1980) first introduced the concept of feedback optimizing control , where the main aim was to translate the economic objectives into process control objectives. This can be done by choosing the "right controlled variables", which when controlled to a constant setpoint, leads to economically optimal operation at steady-state. The most important question is then, What variables to control, such that the economic objectives are translated into control objectives?

Feedback optimizing control
To answer this, consider the steady-state optimization problem, (1) where ˜ u ∈ R n ˜ u denotes the vector of manipulated variables (MV) and d ∈ R n d denotes the vector of disturbances, J : R n ˜ u × R n d → R is the scalar cost function and C : R n ˜ u × R n d → R n c denotes the vector of constraints. The Lagrangian of the optimization problem is given by, where λ ∈ R n c are the Lagrange multipliers of the inequality constraints, and the first-order optimality conditions for this optimization problem is given by: Assumption 1. Linear independence constraint qualification, strict second order sufficient condition and strict complementarity holds for the optimization problem (1) .
The above assumption implies that the optimization problem (1) has a unique primal and dual solution ( ˜ u * , λ * ) , and the objective is to asymptotically drive the plant to this unique local optimum by using feedback control. Note that feedback optimizing control only deals with asymptotic optimal operation and not with optimizing the transient behavior. Transient behaviour is typically determined by the controller tuning parameters. There are two main paradigms that can be used to translate the economic objectives into control objectives: 1. Region-based control : In this paradigm, the idea is to first identify the different active constraint regions a priori and use a subset of physical manipulated variables (primal variables) to control the set of active constraints (constraints that are optimally at their limiting value) in each region. The remaining unconstrained degrees of freedom in each region are then used to satisfy the first-order optimality condition of the reduced unconstrained optimization problem. This is discussed in detail in Section 2.1 . 2. Primal-dual feedback optimizing control : In this paradigm, both the primal and the dual variables are considered as degrees of freedom, which are adjusted using feedback control to satisfy the first-order optimality condition of the original constrained optimization problem. This approach does not require the different active constraint regions to be identified a priori , and is therefore more general than the region-based approach. This is discussed in detail in Section 2.2 .

Region-based control
2.1.1. Active constraint control If optimal process operation corresponds to the case when some of the constraints are active (i.e. at its limiting value), then the easiest choice for the controlled variables are the active constraints itself.
Let n a ≤ n c denote the number of active constraints C A ( ˜ u , d ) at the optimum for a given disturbance realization d . Since the constraints are typically measured in the process, for each active constraint, we usually associate the constraint itself as the controlled variable, i.e. CV = C A , which are controlled to its limiting value using feedback control. Active constraint control is a well known idea and has been used in many examples, see Maarleveld and Rijnsdorp (1970) , Arkun and Stephanopoulos (1980) , Morari et al. (1980) , Fisher et al. (1988 , Jacobsen and Skogestad (2012) , Reyes-Lúa et al. (2018) ,  , Krishnamoorthy et al. (2019a) to name a few.
The first step to designing active constraint control is to identify the potential combination of active constraints that may be encountered. The rigorous approach to identifying the different active constraint regions as a function of disturbances is by offline optimization, which requires good process models ( Jacobsen and Skogestad, 2012 ). However, if reliable models are not available (cf. Challenge 1), then one can identify the relevant active constraint combinations based on good process understanding and "engineering intuition" ( Skogestad, 2004 ). To be systematic, one can start by writing down all the 2 n c active constraint combinations, and eliminate the constraint combinations that are not feasible or not likely ( Reyes-Lúa and Skogestad, 2019b ). For example, one can eliminate certain constraint combinations using the following guidelines: • The maximum number of active constraints n a ≤ n ˜ u for the problem to be feasible. • Certain constraint combinations are not possible, e.g maximum and minimum constraint on the same variable cannot be active at the same time. • Certain constraints are always active, e.g. purity constraint on most valuable product to avoid product giveaway ( Govatsmark and Skogestad, 2005 ), ( Skogestad and Postlethwaite, 2007 , Chapter 10).
Note that it is sufficient to identify the expected active constraint combinations, and one does not need to know exactly where the switching occurs. It is reasonable to assume that all the constraints are measured for monitoring purposes, and one can therefore detect whether the constraints are approaching its limiting value or not.
At the same time, it is also important to note that eliminating certain active constraint combinations without solving the optimization problem offline requires good process understanding and insight, and therefore it is not guaranteed that all the relevant active constraint regions will be considered. As such, this is more suited for small-scale unit operations with few constraints, where it is possible to identify the expected active constraint combinations easily.
Due to imperfect control and noise it may be desirable to add a safety margin to the active constraint controllers. To avoid dynamic constraint violations, we may choose to implement a backoff, where the setpoint for the active constraints C A are offset by a margin ε . This gives rise to a loss, which increases linearly, and is quantified by the corresponding Lagrange multipliers λ A as, See Appendix A for the proof. Quantifying the loss due to back-off provides two important insights: 1. It tells us which constraints need to be tightly controlled. 2. It allows us to simplify the control structure design. That is, if the Lagrange multiplier for a given constraint is sufficiently small such that the loss is negligible, then we can allow a large back-off ( Govatsmark and Skogestad, 2005 ).
To this end, by tightly controlling the active constraints at their limiting value, one then only needs to find what to control using the remaining n ˜ u − n a unconstrained degrees of freedom. In other words, the total n ˜ u available degrees of freedom are partitioned into two subsets ˜ u = [ u u ] T , where u ∈ R n a is used to control the active constraints and u ∈ R n ˜ u −n a is used to control the selfoptimizing variables.

Unconstrained optimum
For the remaining n u := (n ˜ u − n a ) unconstrained degrees of freedom denoted by, u ⊂˜ u , we need to choose what to control . This is not obvious, and requires more knowledge about the process, either in the form of process models or direct cost measurement, in order to translate the economic objectives into feedback control objectives. This additional information can be used to identify suitable "self-optimizing" controlled variables c ∈ R n u , and their corresponding setpoints c sp . Skogestad (20 0 0) defined self-optimizing control as when we can achieve (near) optimal operation (acceptable loss) with constant setpoint for the self-optimizing controlled variables. This can either be a single measurement ( Section 3.1.1 ), a linear combination of measurements ( Section 3.1.2 ), or some other optimization specific feature such as the steady-state cost gradient, that translates the economic objectives into control objectives. The ideal selfoptimizing variable is the steady-state cost gradient ( Sections 3.2 and 4 ), which when kept at a constant setpoint of zero, satisfies the necessary conditions of optimality ( Halvorsen et al., 2003;François et al., 2012;2005 ). For instance, consider the gradient of the reduced system (after controlling the active constraints), min u J(u , d ) (5) where u ∈ R n u is the vector of unconstrained degrees of freedom. The KKT optimality conditions for (5) state that the necessary condition of optimality is when the reduced gradient of the cost function is zero.
Equivalently, one can also consider the full system (1) , in which case the n u self-optimizing variables are given by a linear combination of the cost gradient w.r.t. all the manipulated variables ˜ u , where N is the nullspace of the active constraint gradients (i.e. . This equivalent approach results in a more generalized framework that can be used for different active constraint combinations. Note that this is the same as the gradient of the reduced system (see for example Krishnamoorthy and Skogestad, 2020a;Jäschke and Skogestad, 2012;Marchetti et al., 2016;Marchetti et al., 2020 to name a few).
Theorem 1 (Generalized framework for region-based feedback optimizing control Krishnamoorthy (2019) , Krishnamoorthy and Skogestad (2020a) ) . Given Assumption 1 , the steady-state optimization problem (1) for a given disturbance realization d can be transformed into feedback control problem by controlling (in this order): and c sp = 0 Proof. From Assumption 1 , the Lagrangian of (1) is given by where we have partitioned the system into set of active constraints (denoted by (·) A ) and inactive constraints (denoted by (·) I ). The stationarity condition gives, Pre-multiplying (11) by N T gives and since N is chosen such that To summarize, we have thus far considered the case, where the controlled variables were chosen by partitioning the set of constraints into active and inactive sets. This resulted in different active constraint regions as a function of disturbances. In each region, n a degrees of freedom were used to control the active constraints, and n ˜ u − n a remaining unconstrained degrees of freedom where used to control the self-optimizing variables (such as the steady-state cost gradient of the reduced system). When the set of active constraints change, this requires us to switch the set of controllers accordingly. This can be achieved by using multivariable control such as MPC that tracks the constraints and self-optimizing variables in neighboring regions, or by using classical control elements such as selectors, split-range etc. This is discussed in detail in Section 6 . In other words, by identifying and controlling the active constraints tightly, we only need to transform a simpler unconstrained optimization of the reduced system, and switch between the different controllers in each active constraint region accordingly. The general control structure under this paradigm is schematically represented in Fig. 3 a.

Primal-Dual feedback optimizing control
Alternatively, one can also transform the optimization problem into a feedback control problem without partitioning the set of constraints into active and inactive sets a priori . This approach is especially useful if there are many active constraint regions to switch between and/or if it is not possible to identify the different active constraint regions a priori (e.g. due to lack of good models). The main idea here is to consider the Lagrangian of the optimization problem (2) as an unconstrained optimization problem with n ˜ u + n c degrees of freedom. More precisely, the degrees of freedom for the feedback optimizing control are now given by both the physical manipulated variables ˜ u (which are the primal variables in (1) ) as well as the Lagrange multipliers λ (which are the dual variables in (1) , also known as shadow prices). Consider the relaxed optimization problem (2) , for which the stationarity condition w.r.t. to the n ˜ u + n c degrees of freedom (denoted jointly as [ ˜ u λ] T ) is given by, Therefore, the n ˜ u degrees of freedom are used to control the stationary condition such that for any given λ. Additional n c feedback controllers are used to update λ such that the constraints C ( ˜ u , d ) are controlled to a constant setpoint of zero (i.e. active constraint control). We also have the additional dual feasibility constraint λ ≥ 0 (cf. (3c) ). However, at any given time, either only λ ≥ 0 or C ( ˜ u , d ) ≤ 0 will be active, thanks to the complementary slackness condition (3d) . Therefore, dual feasibility and complementary slackness is taken care by using a max-selector as shown in Fig. 3 b. In other words, the "active constraint control" in this paradigm is achieved using the Lagrange multipliers λ (i.e. the dual degrees of freedom), instead of a subset of the primal degrees of freedom u ' ⊆ u . Considering only integral action, the primal-dual feedback law in this case can be expressed as which is also commonly known as saddle-point flow ( Venets, 1985;Feijer and Paganini, 2010 ).
To this end, we have (n ˜ u + n c ) feedback controllers: • n ˜ u primal controllers where ˜ u is used to control the gradient of the Lagrangian ∇ ˜ u L (λ, ˜ u , d ) to a constant setpoint of zero thereby satisfying the stationarity condition (3a) • n c dual controllers where the dual variables λ are used to control the constraints C ( ˜ u , d ) at their limiting value while ensuring that λ ≥ 0 by using a max-selector, thereby ensuring primal feasibility (3b) , dual feasibility (3c) , and complementary slackness (3d) .
Given Assumption 1 , this ensures the satisfaction of the firstorder optimality conditions by using feedback controllers. The general control structure under this paradigm is schematically represented in Fig. 3 Unlike the region-based approach, primal-dual control avoids the need for identifying the different active constraint regions a priori . Consequently this approach also avoids the need to explicitly design a switching strategy to handle changes in the active constraint regions as the operating conditions vary. The choice of MV-CV pairing is also rather straightforward for this case, where ˜ u is used to control ∇ ˜ u L (λ, ˜ u , d ) , and the Lagrange multiplier corresponding to a constraint is used to control that constraint. Note that the dual controllers control the constraint indirectly by updating λ, which occurs at a slower time scale relative to the primal controller (due to the cascade structure). Thus, in some cases it may be difficult to achieve tight control of the constraints and to avoid dynamic constraint violation one may need to use back-off, resulting in an economic loss, quantified by (4) . It can be seen that this approach requires online estimation of the steady-state cost gradient ∇ ˜ u J( ˜ u , d ) , as well as the constraint gradients ∇ ˜ u C ( ˜ u , d ) . Any of the model-based gradient estimation method reviewed in Section 3.2 or any model-free gradient estimation method reviewed in Section 4 can be used for this purpose. Note that unlike the region-based approach, primal-dual approach cannot use other self-optimizing variables such as a single measurement or a linear measurement combination. Therefore, primaldual approach cannot be used if the cost and constraint gradient estimates are not available. It is also important to note that the primal and dual controllers work together by gathering all the constraint and cost gradients to compute the gradient of the Lagrangian, and one is not independent of the other. This makes it less robust to faulty control loops. For example, if any control loop is broken, then this will affect the overall system, unlike in region-based control, where for example any given active constraint controller will still work perfectly fine in controlling that constraint even if some other control loop is broken. This is because in region-based control, each constraint is paired directly with an input. On the other hand, this makes region-based approach less suitable for large-scale systems than the primal-dual approach.
Distributed Feedback Optimizing Control Primal-dual approach is more amenable to distributed systems and large-scale processes that is made up of several unit operations or sub-processes, that are coupled in one form or the other. In many process applications, feedback controllers are typically designed in a decentralized fashion for small subprocesses /unit operations. If the overall process is coupled in one form or the other, then optimal operation of the local subsystems do not necessarily contribute to the overall optimal operation. The loss due to lack of coordination was also pointed out and quantified by Morari et al. (1980) .
To address this issue, one can decompose the large-scale process into several smaller subprocesses or unit operations, and formulate local optimization problems with coupling constraints that couples the different subsystems together. This enables one to design feedback optimizing control structures for the local subsystems, and use a central coordinator to coordinate the different subsystems. Decomposing a large-scale system into smaller subsystems and considering the different subsystems locally makes it easier to design feedback control structures, which is also often done in practice. A distributed feedback optimizing control scheme based on the primal-dual feedback control approach was recently proposed by Krishnamoorthy (2021a) , where the "optimizing" controlled variable for local subsystems are given as a function of the Lagrange multiplier of the coupling constraints. The primal controllers are designed locally for each subsystem based on the local cost and constraint gradient estimation. The dual controllers corresponding to the coupling constraints acts as the central coordinator and updates the Lagrange multipliers, as shown in Fig. 4 . Together, this leads to overall system-wide asymptotic optimal operation. The coupling constraints typically arises from shared resources among the different unit operations or physical couplings (e.g.the outflow from one unit is the inflow to another). In the former case, the optimization problem is formulated as an optimal exchange problem ( Boyd et al., 2011 , Section 7.3), whereas in the latter case, the optimization problem is formulated as a consensus problem ( Boyd et al., 2011 , Section 7.1). The interested reader is referred to Krishnamoorthy (2021a) for more detailed description and analysis of the distributed feedback-based RTO scheme.

Model-based methods
In model-based techniques, we assume that we have access to a model of the process given by, where x ∈ R n x denotes the states, u ∈ R n u denotes the unconstrained degrees of freedom, d ∈ R n d denotes the disturbances, and y ∈ R n y denotes the vector of measurements. f ∈ C 2 : denotes the smooth functions that describe the system dynamics, measurements and the cost respectively. Note that the cost predicted by the model is denoted as whereas the true cost measurement from the plant is simply denoted as J, without any subscript. The unmeasured disturbances and model parameters are jointly denoted by d . Furthermore, we assume that we have more measurements than disturbances, i.e. n y > n d .
As shown in Fig. 1 , the model-based methods can be further classified into two categories, based on whether the models are used online or offline.

Models used offline for obtaining self-optimizing variables
In this class of methods, the controlled variables c for the unconstrained degrees of freedom are chosen offline based on process insights extracted from the process models (15). Using offline analysis, the objective is to design the controlled variables c , as well as their corresponding setpoints c sp .
As already mentioned, the ideal self-optimizing variable is the cost gradient c = J u . However, obtaining an analytic expression for J u may be tedious, and even if an expression is found, one may not have measurements of all variables in the expression for J u . In addition, the cost gradient J u may also be susceptible to measurement errors. An alternative approach is therefore to start from the available measurement y , and look for single measurements or measurement combinations that can be kept constant. To simplify the mathematical treatment, we usually only consider linear measurement combinations, c = Hy , where H is a constant measurement selection matrix. Note that also the optimal setpoint c sp must be obtained in this case.

Direct loss evaluation
When deciding what to control using the unconstrained degrees of freedom, the simplest is to choose a subset of the available measurements c ⊂ y as the self-optimizing controlled variables (CVs), which are controlled to some constant setpoint. The main idea here is to select all or a subset of available measurements as candidate controlled variables, and repeatedly evaluate the cost for the expected disturbance scenarios, when the different candidate controlled variables are kept constant. This analysis is performed offline for all the candidate CVs, and for all expected disturbances. Since there are n u unconstrained degrees of freedom, a subset of n u measurements are evaluated as candidate CVs for expected disturbances. As such, this approach involves simulating the process several times for each set of candidate CV to evaluate the cost, and replacing the incumbent CV with the one that gives a lower cost. The candidate measurements with the lowest economic losses are then used as self-optimizing controlled variable for online control. This method is also known as brute force approach . Economic loss here may either be the average loss, or the worst case loss.
Note that, with this method one does not know how far one is from the true optimum, since we only evaluate the cost, and select the set of candidate CVs that provides the lowest observed cost. This is one of the earlier methods for finding the self-optimizing controlled variables for unconstrained optimization problems, that was used by Skogestad (20 0 0) , and was also studied by Larsson and Skogestad (20 0 0) and Govatsmark and Skogestad (2005) .
For processes with many candidate measurements, it may be tedious and time consuming to evaluate the economic loss for the all the possible candidate controlled variables. A more numerically efficient approach is to use the "maximum-gain rule", which uses the steady-state gain (denoted by G ) from the unconstrained degree of freedom to the candidate controlled variable ( Halvorsen et al., 2003 ). Here, the gain matrix is scaled as shown below where c i is a set of candidate controlled variables. The maximum gain rule states that self-optimizing variables should be selected that maximizes the scaled gain matrix G , or more precisely maximize the minimum singular value of the scaled gain matrix ( Halvorsen et al., 2003 ). The maximum gain rule can be used to select a few good alternative candidate self-optimizing variables. Brute force method can then be used to analyze the economic losses for the short-listed CV candidates. The interested reader is also referred to a recent survey paper on self-optimizing control by Jäschke et al. (2017) for a more detailed discussion of this approach.

Linear measurement combination
In this approach, instead of a selecting a single measurement, the controlled variable is chosen as a linear combination of the available measurements

c = Hy
The objective is then to find the best measurement selection matrix H using the process models offline. Most approaches are based on local linearization around some nominal optimal point. The two most popular approaches to find the optimal measurement combination are the nullspace method proposed by Alstad and Skogestad (2007) and the exact local method proposed by Halvorsen et al. (2003) and further developed by Alstad et al. (2009) and Yelchuru and Skogestad (2012) . Remark 1. Note that in the case of single measurements selected using direct loss evaluation (cf. Section 3.1.1 ) the elements of the H matrix would comprise of 0s and 1s.
Nullspace method The nullspace method, is the simplest approach for finding the optimal selection matrix H for the case with no implementation error. This method assumes that the number of measurements n y ≥ n u + n d .
Let the disturbance variation around some nominal point d 0 be denoted as δd := d − d 0 , and the corresponding change in the optimal measurement be denoted as δy * := y * − y * 0 . The optimal sensitivity matrix, which describes how the optimal value varies with the disturbance, is given by The optimal selection matrix H is selected such that it is in the nullspace of the optimal sensitivity matrix F , i.e. HF = 0 (17) Alstad (2005 , Chapter 3) proved that computing the selection matrix H such that HF = 0 leads to zero loss for small disturbance change d . The setpoint for the optimal measurement combination c is often computed using the nominal optimal measurement, i.e. c sp = Hy * 0 . Exact local method : This method is an extension of the nullspace method which takes into account the measurement and implementation error, and avoids the limitation that n y ≥ n u + n d . The starting point is to approximate the nonlinear economic optimization problem (5) with a constrained quadratic programming (QP) problem that minimizes the average loss. This is based on Taylor series expansion of the cost around the nominal optimum u * 0 for a given disturbance d from which the loss function can be written as Using the linearized measurement model, and c = Hy m = Hy + H e y we can express where d = d W d is the normalized disturbance, e y = e y W e y is the normalized implementation error, and W d and W e y are diagonal scaling matrices for the expected values of the disturbance and implementation error, respectively. Since Loss = 1 2 z 2 , from (22) , we can see that z is minimized if HY is minimized with J uu 1 / 2 ( HG y ) −1 = I . Therefore, the minimiza-  Remark 2. It is worth noting that exact local method is the only approach that explicitly takes into account measurement noise and implementation error.
Note that the optimal sensitivity matrix (16) may be determined numerically by perturbing the disturbances and reoptimizing. Alternatively, one can use the analytical expression. Assuming that we in y * include in addition to outputs y also the inputs u , the analytical expression for the sensitivity matrix becomes The nullspace and the exact local methods are based on local linearization around some nominal optimal point. As a result, the steady-state loss increases, as the process is operated far away from the nominal optimal conditions. In order to extend the economic performance to be globally acceptable, global selfoptimizing control methods have also been proposed, see for e.g. Ye et al. (2015Ye et al. ( , 2017 . Ye et al. (2012) also proposed to approximate the necessary conditions of optimality using regression techniques such as least squares or artificial neural networks, which can then be used as self-optimizing controlled variables. In recent years, there have been several developments that focus on using measurement combination as self-optimizing variables, leading to its own survey article ( Jäschke et al., 2017 ).

Neighboring extremal control (NE)
In this approach, the cost gradient around the nominal optimal conditions is estimated from the variations in d around the nominal operating point where the gradient is zero. The change in the cost gradient δJ u resulting from the parametric variations δd can be expressed as δJ u = J uu δu + J ud δd (26) Since the parametric variations δd may be unknown, it is inferred from the input and measurement variations δu := u − u 0 and δy := y − y 0 respectively. This is obtained by using the linearized measurement model (20) . Hence, Substituting (27) in (26) which is the gradient at nominal conditions. The estimated gradient (at nominal conditions) is then controlled to a constant setpoint of zero.
Comparing the optimal sensitivity matrix F in (25) with (28) , it can be seen that the nullspace method and the neighboring extremal control approaches are equivalent ( François et al., 2014 ). That is (28) can equivalently be written as and from (25) one can see that HF = 0 , which is the same as in the nullspace method.
Remark 3. The connection between the nullspace method and the neighboring extremal method shows that, although the nullspace method aims to find a linear measurement combination as the self-optimizing variable, it can also be seen as estimating the cost gradient (i.e. ideal-self optimizing variable). In other words, the nullspace method gives J u = 0 at nominal conditions, see also (Jäschke and Skogestad, 2011, Appendix B) .
Although the standard neighboring extremal control ( Gros et al., 2009 ) does not take into account measurement noise, variations of this approach that considers the measurement noise has also been proposed ( de Oliveira et al., 2015 ).

Two-step approach
This approach is closely related to the traditional two-step RTO approach, where the steady-state cost gradient is computed analytically using the model Eq. (15). The disturbances d are estimated using the measurements y meas using any parameter estimation algorithm.
with ˆ d being the estimated parameter and K e is the estimation gain. Once the model parameters ˆ d k are estimated, the analytical Jacobian can be evaluated using the updated model. The estimated gradient ˆ J u ,model is then driven to a constant setpoint of zero using any feedback control law. This approach is schematically shown in Fig. 6 a. The two-step approach of estimating model parameters using a dynamic model in the context of feedback optimizing control was proposed by Adetola and Guay (2007) .

Feedback RTO using transient measurements (FRTO)
This approach proposed by Krishnamoorthy et al. (2019c) estimates the steady-state cost gradient using a nonlinear dynamic model and the process measurements y meas by linearizing the nonlinear dynamic model from the cost to the inputs. Any combined state and parameter estimation scheme (e.g. extended Kalman filter) may be used to estimate the states ˆ x and the unmeasured disturbances ˆ d using the dynamic model of the plant and the measurements y m . Once the states and unmeasured disturbances are estimated, (15a) and (15c) are linearized to obtain a local linear dynamic model from the inputs u to the objective function J, The system matrices are evaluated around the current estimates ˆ x and ˆ d , Assuming that A is invertible, the corresponding steady-state gradient is then given as This approach is schematically shown in Fig. 6 b. It is also worth noting that (32) was also used by Bamberger and Isermann (1978) , Garcia and Morari (1981) among others to estimate the steadystate gradient from linear dynamic models (cf. Remark 4 ). The estimated gradient is then driven to zero using any feedback controller. The reader is referred to Krishnamoorthy et al. (2019c) for more detailed discussions, as well as comparison with the traditional RTO framework. This approach was successfully

Model-free methods
In this section, we review different model-free methods to achieve optimal operation using feedback control. In Section 3.2 , we saw how process models can be used online to estimate the steady-state cost gradient, which can be controlled to a constant setpoint of zero. Model-free methods on the other hand involve estimating the steady-state cost gradient directly from the cost measurement without the need for detailed process models.
The underlying principle of model-free gradient estimation is via online experimentation. Simply put, the inputs are perturbed, and the corresponding change in the cost measurement is observed, which is then used to estimate the steady-state cost gradient. Therefore, all the model-free methods reviewed below assume that direct cost measurements J are available. Below we review different model-free gradient estimation. and for the sake of simplicity we consider only the unconstrained case.

Finite-difference (FD)
Finite-difference approach is probably the oldest and the most straightforward approach to estimate the steady-state cost gradient. Here, the inputs are perturbed by a small value u around the current operating point u k over a time period T , The time period T is chosen such that the system reaches steadystate within the time period. The gradients are then calculated by measuring the corresponding variation in the cost measurement.
Finite difference method is a popular approach due to its simplicity and ease of implementation, and has been used in several feedback optimizing control methods such as NCO-tracking control ( François et al., 2005;Jäschke and Skogestad, 2011 ) and hillclimbing control ( Kumar and Kaistha, 2014;Shinskey, 1996 ). Finite difference approach has also been used in other numerical optimization based methods that use cost gradients such as the integrated system optimization and parameter estimation (ISOPE) algorithm ( Roberts and Williams, 1981 ) and modifier adaptation ( Marchetti et al., 2009 ).
The finite-difference approach has shown to provide gradient estimates with sufficient accuracy for systems with relatively fast dynamics and high signal-to-noise ratio (SNR). However, it is known to be inefficent for processes with long settling times (which requires large T ), and measurements corrupted by large noise (which requires large u ).
In addition to the classical finite difference approach mentioned above, other variants of finite-difference approximation have also been used in literature. For example, Roberts (20 0 0) , Gao and Engell (2005) and Rodger and Chachuat (2011) used Broyden's formula to estimate the steady-state gradient from current and past measurement, without the need for additional perturbations. Brdy ś and Tatjewski (1995) proposed an alternative variant, where past operating points are used to calculate the gradients without additional perturbations.

Classical extremum seeking control
Extremum seeking control, as the name suggests, is a control system that is used to seek and maintain the extremum value of a static map between the input and the cost function ( Ariyur and Krstic, 2003 ). This approach dates back to the 1950s with the work of Draper and Li (1951) , where extremum seeking control was studied under the context of adaptive control. This method gained huge popularity since the work of Krsti ć and Wang (20 0 0) , and remains to be a popular approach to this day. Extremum seeking control has been applied to a wide array of application domains including, but not limited to, process control, aerospace, automotive, robotics, solar power, wind power, oil and gas, medical and biomedical applications etc., to name a few. Extremum seeking control has also been used in controller design such as optimal tuning of PID controllers ( Killingsworth and Krstic, 2006 ). Since 2000s, there has been several advancements in extremum seeking (ES) methods including, least square-based ES ( Hunnekens et al., 2014 ), sliding-mode ES ( Fu and Özgüner, 2011;Pan et al., 2003 ), greedy ES ( Trollberg and Jacobsen, 2016 ), discrete-time ES ( Choi et al., 2002 ), newton-based ES ( Ghaffari et al., 2012 ), Lie-bracket approximation based ES ( Dürr et al., 2013 ) to name a few.
In the classical extremum seeking approach, a slow periodic dither signal in the form of a sinusoidal wave a sin ωt is superimposed on to the input signal, The frequency ω of the sinusoidal perturbation is chosen to be slow such that the dynamic plant appears as a static map. This induces a periodic response in the cost measurement with the same frequency ω. A high-pass filter with cut-off frequency ω h is used to remove the static bias (also known as the DC-component) from the cost measurement, (J − η) is then correlated with the input perturbation and the static bias of the product of the two sinusoids is extracted using a lowpass filter with cut-off frequency ω l , from which the cost gradient can be obtained as The estimated gradient is then driven to a constant setpoint of zero using an integral controller. This is schematically represented in Fig. 7 a. As mentioned above, the dither frequency must be chosen relatively small such that the plant dynamics do no interfere with the extremum seeking scheme, and in addition the integral gain must be small such that the convergence to the optimum does not interfere with the sinusoidal perturbation. Hence this approach has a clear timescale separation between • plant dynamics (fastest) • sinusoidal perturbation (medium) • convergence to the optimum (slow) In order to extend the dither-demodulation approach to the multivariable case, each input channel is perturbed individually. The perturbation frequencies must be chosen such that the different frequency components are unique, in order to ensure that unique persistence of excitation condition is satisfied i.e. ω i = ω j , 2 ω i = ω j , ω i + ω j = ω k for any distinct i , j, and k ( Ghaffari et al., 2012;. Although extremum seeking control has gained popularity in electrical and mechanical systems, for most chemical and biochemical processes, the timescale separation requirement leads to prohibitively slow convergence to the optimum (typically convergence to the optimum in the range of several hours to days).

Least squares-based gradient estimation
In this approach, the steady-state cost gradient is estimated by using a first-order least squares fit, and was presented in the context of extremum seeking control by Hunnekens et al. (2014) . The last N samples of input u and cost J are used to fit a local linear static model of the form, to which the analytical solution is given by This approach is schematitcally represented in Fig. 7 b. The application of the least squares method requires that N > n u . In theory, this method does not require a constant perturbation using additional dither signal. However, a small perturbation in practice is recommended in order to track changes in the optimum and also to avoid an ill-conditioned problem in (41) .
Instead of solving a linear least squares problem using (41) , one can also solve a recursive least squares problem with forgetting factor to estimate θ = [ ˆ J T u , m ] T as described by Chioua et al. (2016) , Dewasme et al. (2011) .

Kalman filter-based gradient estimation
In this approach, a Kalman filter is used to estimate the steadystate cost gradient. Here, the underlying assumption is to fit a local linear static model (39) around the current operating point, which is the same as in the least squares approach (cf. Section 4.3 ). Instead of using least squares estimation, Henning et al. (2008) proposed to use an extended Kalman filter (EKF), where the two unknowns x 1 = J u and x 2 = m are the states of the Kalman filter. A discrete time model for the two parameters is given by, For the system (42) to be observable, two distinct measurements of the input-cost pairs at two different times are required ( Henning et al., 2008;Gelbert et al., 2012 ). Consequently, the measurement model is given by, where the input-cost measurement data at the current time step k and the measurement at time step k − i are used, where i can be chosen by looking at the observability gramian as explained by Gelbert et al. (2012) . The vectors w k and v k denote Gaussian white noises. An extended Kalman filter can now be used to estimate the "states"ˆ J u and m using the "system model" (42) and the "measurement model" (43) .

Gradient estimation using fast Fourier transform (FFT)
So far, we can see that the model-free gradient estimation approaches involve perturbing the input with additional dither signals, and the effect of the input perturbation on the cost is used to estimate the steady-state cost gradient. A natural and powerful approach to analyze the effect of periodic perturbations in any signal is to use the Fourier transform for spectral analysis, which tells us what is the contribution of each frequency component present in the signal. Therefore, one could periodically perturb the inputs and simply analyze the frequency spectrum of the cost measurement at the input perturbation frequency to obtain the cost gradient. In other words, the amplitude spectrum of the cost measurement provided by the fast Fourier transform (FFT) is equivalent to the magnitude of the cost gradient at the input perturbation frequency .
FFT is a fast, easy and robust numerical approach to extract the amplitude spectrum at different frequencies. This makes it a very favorable approach for multivariable systems, where the different inputs are perturbed with periodic signals with unique frequency components. The amplitude spectrum of the cost measurement at the different frequency components then provides the cost gradients with respect to each input. This intuitive nature of FFT has led to its use in a few engineering applications, see for example Corti et al. (2014) and Beaudoin et al. (2006) . Krishnamoorthy (2021b) formalized the FFT-based extremum seeking scheme and analyzes its properties under a static map setting.
In fact, the classical extremum seeking control presented in Section 4.2 can be seen as a special case of the Fourier transform, Fig. 8. Model-free gradient estimation methods: (a) Gradient estimation using Fast Fourier Transform ( Krishnamoorthy, 2021b;Corti et al., 2014 ) (b) Gradient estimation from black-box system identification, e.g. ARX ( Bamberger and Isermann, 1978;Garcia and Morari, 1981 ). where the cost measurement is demodulated with a sinusoidal signal of the same frequency as the input perturbation, instead of different frequencies.
In this approach, each input is perturbed with a unique periodic sinusoidal signal a i sin ω i t. Consequently, the cost measurement is a function of the all the sinusoidal frequencies, which is extracted by performing FFT over a sliding window of fixed length with N past data samples (similar to the least squares method presented in Section 4.3 ). To perform FFT, the cost measurement has to be detrended, such that it has zero mean. The detrending may be performed using any suitable method, for example the moving average filter. Note that this is analogous to using the high-pass filter in the classical extremum seeking control (cf. Section 4.2 ) to remove the static bias. The discrete Fourier transform then extracts the different frequency components of the detrended cost measurement J 0 Note that the product of the complex exponential function and the detrended cost signal is analogous to the demodulation of the dither and the cost measurement in the classical extremum seeking control (cf. Section 4.2 ). The magnitude of the gradient of the cost with respect to the i th input, perturbed using ω i can then be obtained by looking at the amplitude spectrum |J (ω i ) | . Since the amplitude spectrum |J (ω i ) | > 0 , to determine the sign of the cost gradient, the phase spectrum of the cost φ J (ω i ) with respect to the phase spectrum of the input signal φ u i (ω i ) is used. This is schematically shown in Fig. 8 a. Together, the gradient of the cost w.r.t to the i th input is then estimated as follows: Alternatively, the estimated gradient can be computed directly as where Re (·) is the real part of a complex number, and U i (ω) is the discrete Fourier transform of the i th input.
Much like any multivariable model-free gradient estimation scheme, the input perturbation frequencies must be unique, and should not lie in the harmonics of other dither frequencies, i.e. ω i = ω j , 2 ω i = ω j , ω i + ω j = ω k for any distinct i , j, and k .
It is also important to note that accuracy of the cost gradient at any particular frequency is sensitive to the choice of N. This is because the discrete Fourier transform treats the data window of length N as if it is periodic and produces only l = 1 , . . . , N discrete frequency components in (44) . Therefore, if the exact perturbation frequency is not part of the discrete frequency array l = 1 , . . . , N, this leads to inaccurate gradient estimation at the perturbation frequencies. To avoid this, the minimum length N is given by the least common multiplier of the different perturbation time periods 2 π /ω i . For a more detailed description and analysis of this approach, see Krishnamoorthy (2021b) .

Gradient from multiple units
As mentioned earlier, model-free gradient estimation methods require perturbing the input in order to estimate the gradient. However, in some applications, these perturbations are not desired, especially since when the effect of the perturbations are carried over to downstream processes. For example, in continuous processes, although some perturbations may be tolerated in unit processes in order to optimize the process, the perturbations may not be accepted at the product supply to the customer.
In processes with multiple units, one can carefully design the perturbations, such that the overall perturbations cancel out each other. One such approach was presented by Srinivasan (2007) , where in the presence of multiple units, the inputs to the units differ by an offset , and the gradient is then estimated using the finite difference method (cf. Section 4.1 ).
Similar synchronization methods for extremum seeking control were also recently presented by Pavlov et al. (2017) and Silva and Pavlov (2020) .

Gradient estimation using transient measurements
A common trait in all the model-free gradient estimation methods presented so far is that it requires the assumption that the dynamic plant acts likes a static map between the input and the cost. In order for this assumption to be valid, the input perturbation must be much slower than the plant dynamics, such that the dynamic plant can be approximated as a static map. Furthermore, the controller gain to drive the steady-state gradient to zero must be sufficiently small such that the convergence to the optimum is much slower than the perturbation signal. In summary, this means that the overall convergence rate is about two orders of magnitude slower than the original plant dynamics ( Krsti ć, 20 0 0; Tan et al., 2010 ).
Although this is not a major bottleneck for many electromechanical systems, for many chemical and biochemical processes, the settling times are often in the range of minutes to hours. The timescale separation thus leads to prohibitively slow convergence. Despite the very appealing characteristic of not requiring a detailed model, many model-free gradient estimation methods may therefore be impractical for real-time optimization in the chemical process industry.
In order to address this issue, one potential solution is to explicitly include the plant dynamics in the model-free gradient estimation scheme. The use of measurements to repeatedly identify a black-box local linear dynamic model around the current operating point for online optimization of slow chemical processes was first proposed by Bamberger and Isermann (1978) . Here, blackbox system identification models such as ARX models are repeatedly identified online. The cost gradient is then estimated from the identified black-box models, which is driven to a constant setpoint of zero. This approach was further analyzed by Garcia and Morari (1981) , where the authors recursively identified local linear dynamic models to estimate the steady-state gradient and drive the process to its optimum using a gradient descent algorithm. Later in 1989, McFarlane and Bacon (1989) presented an empirical strategy for open-loop online optimization using blackbox ARX models similar to the one proposed by Bamberger and Isermann (1978) and Garcia and Morari (1981) . As such, the method proposed by Bamberger and Isermann (1978) ; Garcia and Morari (1981) and McFarlane and Bacon (1989) can be seen as a dynamic variant of extremum seeking control for Hammerstein plants, where the cost measurement is used to repeatedly identify a local linear dynamic model, which is then used to estimate the steady-state gradient (cf. (32) ). Model-free gradient estimation using transient measurements by identifying local linear dynamic models have also recently been used in the context of modifier adaptation ( Vaccari et al., 2020;Oliveira-Silva et al., 2021 ).
In this approach, black-box ARX model of the form are repeatedly identified using process measurements. The inputs are perturbed using pseudo-random binary signal (PRBS) and the perturbation frequency could be chosen in the same time scale as the plant dynamics, leading to one order of magnitude faster convergence to the optimum compared to classical sinusoidal extremum seeking scheme that requires two orders of magnitude time scale separation. The ARX model is identified repeatedly by solving the least squares problem ˆ θ = arg min θ ψ − T θ 2 2 (49) where ψ, θ and are given by the expressions Introducing the notation, A poly (q ) = 1 + a 1 q −1 + · · · + a n a q −n a and B poly (q ) = b 1 q −1 + · · · + b n b q −n b with q −1 being the unit delay operator, the gradient is then estimated as, Remark 4. It is interesting to note that the identified ARX polynomials A poly (q ) and B poly (q ) , when converted to continuous time state-space system 1 as shown below, for the steady-state gradient, which is he same as (32) in the feedback RTO using transient measurements approach (cf. Section 3.2.2 ).
Alternatively, one could also repeatedly identify an ARMAX model or any other black-box model to estimate the steady-state cost gradient. For example, Golden and Ydstie (1989) used a second order Hammerstein model of the form However, with such high order ARX, ARMAX models or second order Hammerstein models such as the one used by Golden and Ydstie (1989) , the number of parameters that needs to be repeatedly estimated increases. If the excitation of the process is not sufficient, then all the black-box model parameters may not be estimated accurately. In the context of real-time optimization, this is especially a problem as the system approaches its optimum, where the steady-state relation between the input and the cost is typically flat. This challenge is illustrated using simple counter examples in Krishnamoorthy, 2019 , Chapter 5.
One simple solution to this problem is to simply turn off the gradient estimation once the plant reaches its optimum, as done by Bamberger and Isermann (1978) , McFarlane and Bacon (1989) . However, in practice, it is desirable to keep estimating the gradient even after reaching the optimum. This is to ensure that the changes in the optimum are tracked.
Alternatively, if the nominal linear dynamics are known, this can be fixed, such that we only estimate the unknown local steadystate effect. J = J u b 1 q −1 + · · · + b n b q −n b 1 + a 1 q −1 + · · · + a n a q −n a fixed u (51) Fixing the linear dynamic part enables us to effectively use transient measurements, thereby avoiding the steady-state wait time issue ( Krishnamoorthy, 2019 , Chapter 5). In this case, the least squares problem (49) is solved with ψ = J(t ) + a 1 J(t − 1) + · · · + a n a J(t − n a ) = b 1 u (t − 1) + · · · + b n b u (t − n b ) 1 for example using idss and d2c command in MATLAB More recently, van Keulen et al. (2020) proposed to apply a multisine dither signal to identify the local linear dynamic model of the plant, where the perturbation frequency is in the same timescale as the plant dynamics. The transfer function of the local linear dynamic system at the multisine dither frequencies are estimated using Fourier Transform of the past N samples of input and cost measurement (much like the FFT-based gradient estimation described in Section 4.5 ). The local steady-state gradient is then estimated by using an online complex curve fitting, by taking a frequency domain approach to system identification.
Another approach to account for the plant dynamics is to compensate for the phase shift introduced by the plant dynamics, such that the static map assumption can be relaxed. For example, one can estimate the different harmonics using a Kalman filter, and drive the high frequency harmonics to zero when the input perturbations are in the same timescale as the plant dynamics ( Atta et al., 2015;Trollberg and Jacobsen, 2016 ). In other words, the gradient estimation problem is replaced by a problem of estimating the harmonics, and this is used to drive the process to its steadystate optimum.

Combination of model-free and model-based approaches
It can be seen that the model-based and model-free methods for the unconstrained optimization problem have their own advantages and disadvantages. In short, one of main strengths of the model-based methods is that it converges faster, whereas the main weakness is the dependence on a model (making it susceptible to plant-model mismatch). On the other hand, one of the main strengths of the model-free methods is that it circumvents the need for developing models (hence is not susceptible to plant-model mismatch), whereas the main weakness is the prohibitively slow convergence to the optimum. Note that the modelbased and model-free methods are complementary and not contradictory. In general the model-free methods work in the slow time scale, whereas model based methods work in the fast time scale. This time scale separation can be exploited to combine the modelbased and model-free methods in a hierarchical fashion. For example, any model-based method may be used on the fast timescale, and a slow varying model-free method can be used on the on top to account for any plant-model mismatch.

Setpoint correction using model-free gradient estimation
When using the models offline to determine the self-optimizing CV (cf. Section 3.1 ), this is based on some nominal conditions. As mentioned earlier, as the operation drifts far away from the nominal operating condition, this leads to steady-state losses. Model-free gradient estimation approaches can be used to adjust the setpoint of the self-optimizing CVs in order to account for the steady-state losses, as shown in Fig. 9 a. Such synergistic combinations of model-free methods and self-optimizing control were shown to provide an improved performance, compared to each of the method used individually. For example, Jäschke and Skogestad (2011) proposed to combine NCO-tracking with selfoptimizing control and the improved performance was demonstrated using a CSTR example from Economou et al. (1986) . Similarly, Straus et al. (2019) proposed to combine the least squaresbased extremum seeking control with self-optimizing control, and demonstrated the performance improvement on a 3-bed ammonia reactor case example. In both the approaches, the setpoint of the self-optimizing variable c sp was updated by the model-free method in a cascaded fashion as shown in Fig. 9 a.
Modifier adaptation is another RTO scheme that was proposed by Marchetti et al. (2009) to address the plant-model mismatch.
Here, the main idea is to estimate the plant gradients directly from the measurements, and use this to correct the optimization problem. For the unconstrained optimization problem (5) , the modifier adaptation scheme iteratively solves are the so-called modifier terms. 2 Here, the plant gradient ˆ J u is estimated using any model free gradient estimation techniques reviewed in Section 4 .

Remark 5 (Modifier Adaptation)
. Although MA uses model-free gradient estimation algorithms, it is not a feedback optimizing control approach, since it is based on solving the numerical optimization problem (52) in real-time. To this end, Modifier Adaptation can be seen as a combination of the model-free gradient estimation approach with the traditional steady-state optimization (SRTO) shown in Fig. 2 .
A similar approach to modifier adaptation, but one that does not require explicitly solving (52) would be to use the modifier term δ J = ˆ J u −ˆ J u ,model as the setpoint to the gradient controller as shown in Fig. 9 b. By doing so, we add a bias correction to the model-based gradient estimate to account for the plant-model mismatch in the slow time scale.

Discrepancy modeling framework
Discrepancy modeling framework is an alternative approach to updating the setpoints using a model-free gradient estimation method in a cascade fashion. Here the main idea is to approximate the "unmodeled" effects using direct cost measurements. That is, the residual between the cost measured from the plant J and the cost predicted by the model J model is used to estimate the cost gradient ˆ J u . For example, one can train a function approximator φ (u ) that predicts the zeroth order bias correction term J , and use the derivative of the trained function φ (u ) to account for plant-model mismatch.
where φ (u ) can either be parametric function approximators (e.g. neural networks), or non-parametric function approximators (e.g. Gaussian processes). The plant gradient is then estimated as the sum of the model gradient ˆ J u ,model and the gradient of the function approximator.
Such an approach for estimating the plant gradient was recently used in the context of modifier adaptation by Gao et al. (2016) , Matias andJäschke (2019) , del Rio Chanona et al. (2019) . Although these methods were applied in modifier adaptation, the estimated gradient (54) can also be directly driven to a constant setpoint of zero using feedback control, which to the best of our knowledge, has not been shown earlier.

Operation with changing active constraints
The set of active constraints change with changing operating conditions. Feedback optimizing control must be able to automatically handle these changes and ensure that the process is operated optimally at steady-state over different operating conditions. Section 6.1 details how this is handled in the case of region-based control (paradigm 1), and Section 6.2 details how this is handled in the case of primal-dual feedback control (paradigm 2).

Region-based control
As mentioned earlier in Section 2.1 , paradigm 1 first involves identifying the potential active constraint regions. Control structures are then designed in each active constraint region by partitioning the available degrees of freedom into constrained and unconstrained degrees of freedom (cf. Theorem 1 ). The set of active constraints change with changing operating conditions, which requires selecting different set of controlled variables and reconfiguration of the control loops.
The identification of the current active constraint region, may be done by tracking the constraints and self-optimizing variables in all neighboring regions, and we switch region if any of these become active (usually identified by a sign change ( Manum and Skogestad, 2012 )). Note that this means that the switching is based on the controlled variables (CVs). The controllers in each region may in principle by completely different and may be multivariable (e.g. MPC).
However, a simpler approach in terms of implementation is to pair each constraint with one particular input. This is the approach studied in this paper. In this case, we implement a controller for each CV (constraints and self-optimizing variables) and the switching is done based on the controller output ˜ u , that is, on the manipulated variables (MVs). The advantage is that one may use the same controllers in many regions, and the switching may be performed using standard max/min-selectors. The disadvantage is that we need to pair each potential constraint with one specific input. This may not be possible in some cases with many constraint combinations, because it requires that the constraints paired with a given input are never optimally active at the same time.

CV-MV pairing
To this end, this requires choosing CV-MV pairing in each active constraint region. There is no systematic approach that provides a unique CV-MV pairing. However, this can generally be done in practice by using some rule of thumb.
• Pair-close rule -In order to avoid large time delays and sluggish control, it would also be wise to control a CV using an MV that is physically close to one another ( Skogestad and Postlethwaite, 2007 ). • Non-negative relative gain array (RGA) -The pairing must be chosen such that the steady-state RGA of the resulting transfer matrix is non-negative and close to identity matrix at crossover frequencies ( Skogestad and Postlethwaite, 2007 ). • Pair on large gain -One must also try to avoid pairing important CVs with MVs that quickly saturate, and instead pair such MVs with less important CVs that may be given up ( Skogestad, 2004 ).
Additionally, the constraint grouping when using selectors to switch between the different CVs (cf. step S1 in Section 6.1.3 ), also influence the MV-CV pairing. Note that there may be several different possible MV-CV pairings to achieve the same objectives, and the pairing rules listed above can guide in selecting a good control structure design that would help reduce the number of logic blocks required to reconfigure the control loops.
Once the different control loops are designed for each active constraint region, we then have to design a switching strategy between the different active constraint regions. Switching between active constraint regions can be achieved by using classical advanced control elements, such as selectors, split-range control, valve position control etc. as detailed below.

MV-MV switching
MV-MV switching can be achieved using three alternative classical control elements, namely, Split-range control Split range control (SRC), dates back to 1940's ( Eckman, 1945 ) and is very commonly used in process control industries. Here, there is a controller that produces a control signal v , typically between 0 -100%, that is input to the split-range block, which then translates the control signal v to the physical manipulated variables u i . A typical split range block with two MVs is shown in Fig. 10 a, where u 1 is used to control the CV and u 2 is saturated when the control signal is below the split value v * , whereas u 2 is used to control the CV and u 1 is saturated when the control signal is above the split value v * . The reader is referred to Valve position control Valve position control (VPC), also known as input-resetting control, is often used in industrial practice to improve the dynamic performance, by allowing one MV to take care of the fast response, and another of the long-term control. However, here it is used to extend the steady-state range when implemented as shown in Fig. 10 b. Here, u 1 is used to control the CV, while u 2 is not used, i.e. it is kept at its desired limiting value, e.g. u lim 2 = 0 . However, when u 1 is approaching its limit, u 2 controls u 1 to a setpoint of u lim 1 + in order to prevent u 1 from saturating ( Shinskey, 1996;Reyes-Lúa and Skogestad, 2019a ).
Controllers with different setpoints Another alternative is to use independent controller for each input, and the setpoint for the different controllers vary by a small amount, as shown in Fig. 10 c. For example, u 1 is used to control the CV y at its optimal setpoint y sp , and u 2 is used to control the CV y at a setpoint of y sp + y sp , where y sp is large enough such that only one controller is active at any given time. It is important to note that this approach leads to some steady-state loss, when u 2 is used to control the CV at y sp + y sp . However, it may be possible to reduce the loss of having different setpoints. The simplest is to have a master controller which slowly resets y sp so that y sp + y sp returns the truly desired setpoint. An alternative approach with individual controllers that completely avoids different setpoints, is the somewhat different "Baton strategy" ( Reyes-Lúa and Skogestad, 2020 ), where each controller needs to identify when it has saturated. The reader is referred to Reyes-Lúa et al. (2018) , Reyes-Lúa and Skogestad (2019b for more detailed information on using controllers with different setpoints to extend the steady-state operating range.

CV-CV switching
A CV constraint that is optimally active, may no longer be active when a disturbance changes. In this case, a different variable (another CV constraint, or a self-optimizing CV) must be controlled in order to operate the process optimally for the new disturbance. To switch between the CVs, one can use individual controllers for each CV. The MV value that is implemented on the plant is then selected among the controller outputs using one or more selector blocks Reyes-Lúa and Skogestad, 2019b ).
To design selectors, we pair each MV with a set of constraints a priori , such that for a given MV u , the potential controlled variables are: • N CV inequality constraints, denoted by y i for all i = 1 , . . . , N • at most one CV y 0 with a setpoint that may be given up (i.e. self-optimizing CV) • MV inequality constraints A systematic design procedure for designing selector blocks was recently proposed by Krishnamoorthy and Skogestad (2020b) . The main steps of the systematic design procedure of selectors for each MV u with different potential CVs can be summarized as follows: S1(constraint grouping) Group the candidate CV constraints into two sets, Y + and Y − • The set Y + consists of all constraints that are satisfied with a large input u . This includes a max-constraint on u ( u max ) • The set Y − consists of all constraints that are satisfied with a small input u . This includes a min-constraint on u ( u min )

S2 (SISO control loops)
Design individual SISO control loops to compute the input for each CV constraint ( u i ) and for the CV setpoint controller ( u 0 ). Note that in this case it is important to implement anti-wind up for all the controllers, such that the controller that is not selected, does not keep integrating.

S3a (Choice of selector)
• Use a min-selector on u i for constraints that are satisfied with a large input, u = min Y + u i • Use a max-selector on u i for constraints that are satisfied with a small input, u = max Y − u i S3b (Feasibility) For the set of constraints to be consistent, that is, to have feasible operation, we need u > u ( Krishnamoorthy and Skogestad, 2020b ), where u is the output from the min-selector and u is the output from the max-selector.
S4 (Optimality) When the problem is feasible, the optimal input is given by u = mid( u , u 0 , u ) , where u 0 is the control input computed by the controller that controls the self-optimizing CV, e.g. steady-state cost gradient. This can be achieved using a midselector block in addition to the min-and max-selector blocks in step S3a to compute u and u , respectively as shown in Fig. 11 a. Alternatively, this can also be achieved using a min-max or max-min selector block in series as shown in Fig. 11 b and c, respectively. The three structures shown in Fig. 11 are equivalent when the constraints are feasible (cf. step S3b). If we only have one constraint set Y + or Y − , then we only need a single selector block, namely, a min-or max-selector block, respectively. S5 (Constraint Priority) If the constraints are conflicting, that is u < u , then the three structures shown in Fig. 11 are not equivalent. In this case, the constraint priority can be used to decide the appropriate selector block. If the constraints in Y + take a higher priority than the constraint in Y − , then we can use a max-min structure ( Fig. 11 c). If not, we can use a min-max structure ( Fig. 11 b).
If necessary, one can find another MV v to control the constraint given up by the original MV u . This would involve MV-MV switching as described above in Section 6.1.2 .

Summary of switching strategies
To this end, the step-by-step procedure to design suitable switching strategies under paradigm 1 can be summarized as: • Step 1 -Identify the set of active constraints (CV with limits), self-optimizing CVs (CV with setpoints) and the manipulated variables (MV). • Step 2 -Organize the constraints in a priority list. The priority list may be used to decide which constraints may be given up, in situations where it may not be possible to control all the active constraints (e.g. when the no. of active constraints is more than the number of degrees of freedom, or if the constraints are conflicting as described in step S5 above). • Step 3 -Identify relevant active constraint regions. Fig. 11. CV-CV switching using (a) min-max-mid selector blocks (b) min-max selector blocks (c) max-min selector blocks.
• Step 4 -Design the control structure for nominal operating conditions. Typically this would be the most common or important operating region. • Step 5 -Design the control structure of the identified active constraint regions from step 3, and design MV-MV and/or CV-CV switching strategies between the different active constraint regions, as described above. Note that it may not always be possible to find simple switching schemes based on min-and max-selectors to achieve this. In such cases, some alternative logic scheme may be needed in some constraint regions.
Recent works such as Reyes-Lúa and Skogestad (2019b) , Reyes-Lúa and Skogestad (2019a) , Reyes-Lúa et al. (2018) , Skogestad (2019, 2020b) provides a detailed discussion on switching between active constraint regions along with a wide range of application examples in the context of feedback optimizing control. It is important to note that explicitly designing the switching strategies requires one to identify all the different possible active constraint switching a priori and carefully design the switching logic to account for all the possible switching. As such, this is more suited for small-scale processes with only a few active constraint regions that can be easily managed using such switching logics.

Primal-dual feedback optimizing control
In the case of primal-dual feedback controllers as described in Section 2.2 , the CV-MV pairing is rather straightforward, where for the dual controllers, it is natural to pair the Lagrange multiplier with its corresponding constraint control, and for the primal controllers, it is natural to pair diagonally (i.e. pair u i with ∇ u i L ) as long as the determinant of the Hessian det (∇ 2 uu L ) does not change sign. More precisely, here, we pair the physical MVs ˜ u to drive the controlled variables c = ∇ ˜ u L (λ, ˜ u , d ) to zero, and the Lagrange multipliers (i.e. the dual variables) must be paired to their corresponding constraints. By doing so, when an active constraint becomes inactive as the disturbance changes, the max-operator in (14) would make the corresponding Lagrange multiplier λ = 0 and hence changes in active constraints are handled automatically without the need for additional switching strategies.

Illustrative example -Williams-Otto reactor
In this section, we use a benchmark Williams-Otto reactor example ( Williams and Otto, 1960 ) to provide an illustrative comparison of the different model-based and model-free feedback optimizing control approaches discussed above. The Williams-Otto reactor converts the raw materials A and B to useful products P and E, along with by-products C and G, through a series of reactions C + P → G k 3 = 2 . 6745 × 10 12 e −11111 /T r The process is controlled using the feed stream F B with pure B component and the reactor temperature T r , i.e. the process has two physical manipulated variables. The feed stream F A with pure A component is a disturbance to the process. The objective is to maximize the profits from the valuable products P and E, subject to purity constraints on G and A in the product stream. the steadystate optimization problem is formulated as, x A ≤ 0 . 12 and the objective is to asymptotically drive the system to its optimal operating point using feedback control.
We will first only consider the case where the constraint x G = 0 . 08 is the only active constraint, and is controlled using the reactor temperature T r as shown in Fig. 12 . We then have one unconstrained degree of freedom, namely feed stream F B , for which we compare the performance of different model-based and modelfree approaches detailed in Sections 3 and 4 . This is schematically represented in Fig. 12 , where the self-optimizing CV c is given by the different methods. The simulation results shown in this section were performed using MATLAB v2020b 3 . Remark 6. Note that the focus of the simulation results throughout this paper is the asymptotic behavior of the different approaches, and not the transient behavior. As such, one may find alternative tuning parameters for the controllers that provides a different transient behavior.

Model-based approaches
In this section, we consider the case with no plant-model mismatch. The model used in this section is based on the three reaction model shown in Appendix B.1 . For the unconstrained degree of freedom, F B , we compare the use of two offline methods namely, nullspace method ( Section 3.1.2 ) and neighboring extremal control ( Section 3.1.3 ), and two online methods; two-step approach ( Section 3.2.1 ) and Feedback RTO ( Section 3.2.2 ).
The nullspace method and neighboring extremal control were designed around the nominal operating conditions of F A = 1 . 3 kg/s. Fig. 13 shows the simulation results using the different modelbased approaches. The true steady-state optimum is shown in black dotted lines.
The simulation starts at the nominal condition of F A = 1 . 3 kg/s, and one can see that all the four methods asymptotically converge to the optimum. At time t = 2 h, the disturbance changes to F A = 1 kg/s. As the process is driven away from the nominal operating conditions, self-optimizing control and neighboring extremal control leads to some steady-state offset from the true optimum, since these are based on local linear approximation around the nominal operating conditions. The online model-based methods, on the other hand, are able to asymptotically drive the process to its optimum, since these are based on local linear approximation around the current operating point. In general, it can be seen that the model-based approaches converge to the optimum in the same time scale as the plant dynamics.

Model-free approaches
In this section, we now compare the model-free approaches for the unconstrained optimum. Fig. 14 shows the simulation re-sults where the steady-state cost gradient is estimated using four different approaches, namely, the classical extremum seeking control ( Section 4.2 ), least squares estimation approach ( Section 4.3 ), Fast Fourier transform ( Section 4.5 ) and the Kalman filter approach ( Section 4.4 ). It can be clearly seen that all the four methods are able to asymptotically drive the process to its optimum without the need for rigorous nonlinear process models. However, the convergence to the optimum is significantly slower compared to the model-based approaches shown in Fig. 13 .

Combination of model-based and model-free approaches
We now consider the case with plant-model structural mismatch by considering a model with only two reactions and five components, as shown in Appendix B.2 , whereas the plant is given by the three reaction system as shown in Appendix B.1 .
We first use the model-based gradient estimation method (from Section 3.2.2 ), where the gradient is estimated using the structurally wrong model without any additional correction. The simulation results are shown in Fig. 15 (red solid), where it can be clearly seen that the true plant optimum (shown in black dotted) is not reached due to the model structural mismatch.
We now consider two approaches to account for the steadystate error induced by the structural mismatch. The first approach is based on the setpoint correction using the modifier term δ J which is computed based on the plant gradient and the model gradient (cf. Section 5.1 ). The simulation results using the modified setpoint is shown in Fig. 15 (black solid), where it can be seen that by modifying the setpoint using the estimated plant gradient, we are able to asymptotically drive the system to its true optimum (shown in black dotted) despite using a structurally wrong model.
The second approach that is used here is based on the discrepancy modeling framework (cf. Section 5.2 ). Here we use a Radial Basis Function network (RBFN) as the function approximator φ (u ) to account for the discrepancy between the model predicted cost and the measured cost J = J − J model , as done by Matias and Jäschke (2019) . The estimated gradient is given by (54) as described in Section 5.2 . The simulation results using this approach  is shown in Fig. 15 (red dashed), where it can be seen that the plant is asymptotically driven to its true optimum (shown in black dotted) despite using a structurally wrong model.

Handling changes in active constraint sets
So far, we considered a case where the set of active constraints remained the same. We now consider a case, where the disturbances varies as shown in the bottom right subplot in Fig. 17 which causes the set of active constraints to change. The two paradigms (cf. Sections 2.1 and 2.2 ) to handle the changes in the active constraint sets are compared in this section.

Region-based control
In this approach, we have to predetermine the different active constraint regions and design control structures for each active Fig. 16. Control structure design (a) Region-based control with selectors to switch between the active constraint regions (cf. paradigm 1 in Fig. 3 a) (b) Primal-dual feedback control approach (cf. paradigm 2 in Fig. 3 b). constraint region and switch between the different regions using classical control elements as detailed in Section 6.1 .
Here we show how to identify the different active constraint regions without using process models. Since there are two constraints, we can at most have 2 2 = 4 possible active constraint regions, namely, 1. x G and x A active 2. only x G active 3. only x A active 4. no active constraints However, the purity constraint on x G is very low such that the one can expect the constraint on x G to be always active. Therefore, we can eliminate regions 3 and 4, and design control structures for regions 1 and 2 only, and design a suitable switching strategy between these two regions. It is important to note that eliminating active constraint regions 3 and 4 without solving the optimization problem offline requires good process understanding and insight.
In region 1, optimal operation occurs when the purity constraints on x G and x A are active. Therefore, this region corresponds to active constraint control. One possible control structure design here is to control x G at its limit of 0.08 kg/kg using the reactor temperature T r and control x A at its limit of 0.12 kg/kg using the feed rate F B .
In region 2, optimal operation occurs when the purity constraints on x G is active, which is controlled using the reactor temperature, and we have one unconstrained degree of freedom, namely F B , which should be used to control a self-optimizing variable. Here we use the steady-state cost gradient computed using the model (assuming no structural mismatch).
Automatic switching between the two regions can be achieved by using a max selector. Fig. 16 a shows the proposed control structure design using this approach. The simulation results are shown in Fig. 17 (red dashed), where it can be seen that as the disturbance changes from F a = 1 . 4 kg/s to F a = 1 . 9 kg/s at time t = 3 h, the max constraint on x A becomes active, and the max-selector switches from the gradient controller to the x A composition controller, hence achieving asymptotic optimal operation, as the operating condition changes. Similarly, at time $t = 6$h, the disturbance reduces to $F_a = 1.5$kg/s, and the selector switches from the $x_A$ composition contrtoller to the gradient controller.

Primal-dual feedback optimizing control
Alternatively, one can use the primal-dual feedback controllers as detailed in Section 2.2 , where we consider the dual variables λ x A and λ x G as additional degrees of freedom which are used to control the constraints to its limit, while ensuring that these are non-negative using a max-selector. The primal MVs F B and T r are used to control the gradient of the Lagrangian function to a constant setpoint of zero. The proposed control structure using the primal and dual variables is shown in Fig. 16 b. Here, the cost and constraint gradients are estimated using a model (assuming no structural mismatch). The simulation results are shown in Fig. 17 (black solid), where it can be seen that as the disturbance changes from F a = 1 . 4 kg/s to F a = 1 . 9 kg/s at time t = 3 h, and again at time $t = 6$h to $F_a = 1.5$kg/s, the proposed control structure is able to achieve asymptotic optimal operation without the need to identify the active constraint regions a priori , and without the need to explicitly design additional switching strategies. However, this approach is less robust to faulty control loops since the constraints are controlled indirectly by updating the Lagrange multipliers. For example, we simulated a case where F B saturates at 4 kg/s for the same disturbance profile. In the primal-dual approach, the constraints on x G as well as x A were violated, since the constraints are controlled indirectly by updating λ. Whereas in region-based approach, the constraint on x G was still feasible, since this was directly controlled by T r and is therefore not affected when F B saturates. This is as expected and the simulation results are not shown here for the sake of brevity.

Discussion
In this section, we discuss the main distinguishing properties of the different feedback optimizing control approaches.

Rigorous process models
Model-free methods, as the name suggests, does not need rigorous nonlinear models, circumventing the modeling related challenges. Rigorous nonlinear process models are required only by the model-based methods. The offline model-based approaches reviewed in Section 3.1 requires steady-state process models to choose the best measurement combination as self-optimizing controlled variables. Models are used online to estimate the cost gradient, which are then controlled to a constant setpoint of zero. If the steady-state models are used online, then the models can be updated only using the steady-state process measurements. This leads to steady-state wait time issues ( Darby et al., 2011 ). Therefore, in order to use transient measurements, it is recommended to use dynamic models for online model-based methods ( Krishnamoorthy et al., 2019c ).  Fig. 3 a) and the primal-dual feedback control approach (cf. paradigm 2 in Fig. 3 b) on the benchmark Williams-Otto reactor example.

Measurement requirements
Since optimal operation is achieved using feedback control, the choice of measurements that are available affects the applicability of the different methods.
For active constraint control, direct measurements of the constraints are required. It is reasonable to assume that the constraints are typically measured in most applications, since these are required for monitoring the process operations anyway.
For the unconstrained optimum, models can be used offline to select a linear combination of a subset of the available measurements (cf. Section 3.1 ). Online model-based gradient estimation methods, that relies on the use of state and parameter estimators, requires sufficient measurements such that the states and the pa-rameters being estimated are observable. Since the model maps the measurements to the cost, direct cost measurements are not required for model-based methods (cf. Section 3.2 ).
On the other hand, the model-free methods require the cost to be directly measured. However, note that in many chemical processes, direct measurement of the cost is often not available, especially, if the cost function is comprised of several terms. A typical cost function of a chemical process has the form where Q, F , P 1 and P 2 are the flow rates (in [kg/s]) of utility, feed, products 1 and 2 respectively, and p Q , p F , p 1 and p 2 are the corresponding prices (in [$/kg]). This means that accurate flow measurements of all the components comprising of the cost function is required, in order to employ model-free methods.

Accuracy
The offline model-based methods are based on local linearization around some nominal optimal point. Consequently, controlling a linear measurement combination leads to steady-state losses if the operation drifts far away from the nominal optimal point, even if there is no plant-model mismatch.
Furthermore, in the presence of plant-model mismatch, the estimated gradient differs from the actual plant gradient, leading to suboptimal operation. In the presence of structural mismatch, parameter estimators (such as the one used in the twostep method and the feedback RTO method) are not sufficient to close the optimality gap, as clearly demonstrated by Roberts and Williams (1981) , Chachuat et al. (2009) to name a few. This was also demonstrated in Fig. 15 . Consequently, in online model-based methods, where the cost gradient is estimated using the model, plant-model mismatch implies that the plant may not be driven to its true optimum.
Since model-free methods estimate the plant-gradient directly from the cost measurement, the estimated gradient in theory reflects the true plant gradient under the assumption that the gradient estimation algorithm is properly implemented. If not, this can lead to estimation errors, leading to suboptimal operation. For example, poor choice of the tuning parameters in extremum seeking control has been shown to lead to inaccurate gradient estimation ( Krsti ć, 20 0 0 ). This important caveat is often overlooked, and there is a clear need to better understand the sensitivity of the tuning parameters on the gradient estimation accuracy in many modelfree gradient estimation methods.
Furthermore, as mentioned earlier, a typical cost function in a chemical process plant may comprise of several flowrate terms. Often, the operational profit is made by shifting smaller amounts of feed to the most valuable product, and very accurate flow measurements are required in order to capture this. In practice, data reconciliation using nonlinear process models may then be needed to get accurate flow estimates. In such cases, it is important to keep in mind that the model-free methods will not truly be "model-free" and may suffer from structural mismatch, since a nonlinear process model is needed to get a measurement/estimate of the cost.

Disturbance rejection
Model-based methods can handle unmeasured, but expected disturbances, i.e. the model captures the effect of the anticipated disturbances. Whereas, model-free methods can handle unexpected and unmeasured disturbances. However, it has been shown that changes in any disturbance may affect the accuracy of the gradient estimation in model-free methods. This is because, the effect of the disturbances on the cost measurement is often not explicitly accounted for in the model-free gradient estimation routines. That is, the model-free methods typically assume that any change observed in the cost measurement is induced by the input. For example, abrupt changes in disturbances may temporarily affect the gradient estimate, before converging to the true gradient.
Disturbance measurements, if available, may be used to improve the gradient estimation, by explicitly accounting for the change in the cost measurements caused by the disturbances, ( Krishnamoorthy et al., 2016 ). Alternatively, one can also use an event-based supervisory control to halt the gradient estimation temporarily following abrupt/abnormal changes in the cost measurement, similar to the steady-state detection used in traditional RTO. This approach has been used in the context of extremum seeking control by Marinkov et al. (2014) . It is also important to note that disturbances occurring in the same frequency as the input perturbation leads to inaccurate gradient estimation since this makes it impossible to differentiate the effect of the disturbance and the input on the observed cost measurement.

Convergence time
The process knowledge available in the form of rigorous process models enables model-based methods to converge significantly faster than their model-free counterparts. In model-based methods, the convergence time is predominantly determined by the PID controller tuning.
Model-free methods presented in Sections 4.1 -4.6 assume that the dynamic plant behaves like a static map. The time scale separation required for this assumption to be valid, makes the convergence of model-free methods to be very slow. In the chemical process industry, the settling times are typically in the range of minutes to hours, which often leads to prohibitively slow convergence, typically in the range of hours to even days. This is one of the main reasons why model-free methods are not used widely in process industries, despite the popularity of such methods in other fields. Addressing this challenge can broaden the applicability of model-free gradient estimation approaches in the chemical process industry.

Perturbation
Model-based gradient estimation methods in general do not require additional perturbation signals to estimate the steady-state cost gradient, whereas all model-free gradient estimation methods require additional perturbations in order to estimate the steady cost gradient accurately. As a rule of thumb, the amplitude of the perturbations must be such that the signal-to-noise ratio SNR > 1. For multivariable processes, it is also important that the different inputs are perturbed with unique frequencies in order to be able to extract the gradient from each input channel. Although some dither-free ( Hunnekens et al., 2014 ) or diminishing dither approaches ( Wang et al., 2016 ) have been studied in the literature, in practice some kind of additional perturbation is always required in order to track changes in the optimum. In many process industries however, the additional perturbations may not be desirable, since this can affect the product flow and quality specifications, and degrade the process equipment such as valves and pumps leading to frequent maintenance and production shutdown. This potentially limits the applicability of model-free gradient estimation methods in some process applications.

Ease of implementation and tuning
Development of rigorous nonlinear models is a major bottleneck for implementing model-based methods. Once the models are available, the implementation is straightforward. Meth-ods such as self-optimizing control using linear gradient combination is perhaps the easiest to implement, since it uses only standard controllers such as PID, that have been in use for several decades. Standard PID tuning rules, such as the SIMC tuning rules ( Skogestad, 2003 ) can be used to tune the PID controllers.
Model-free methods, on the other hand, circumvents the challenge of developing rigorous models. However, model-free gradient estimation algorithms may require several tuning parameters, such as the controller gain, perturbation frequency and amplitude. In addition to these, the different methods have their own tuning parameters. For example, the classical extremum seeking control requires tuning the cut-off frequencies for the high-pass and lowpass filters. The finite difference method has the time period T as a tuning parameter. The least squares method, FFT-based gradient estimation, and the dynamic model identification requires tuning the size of the moving window length N. The dynamic model identification also requires choosing the model structure and the model order, which may not be trivial. All these tuning parameters affect the accuracy and robustness of the gradient estimation, and the speed of convergence to the optimum. The model-free gradient estimation methods are predominantly tuned using trial and error method, and the lack of tuning guidelines can make the implementation non-trivial.

Handling nonconvexity
So far, we assumed that the stationary point u * of the unconstrained optimization problem (5) is also the local minimum (cf. Assumption 1 ). It may happen that this assumption may not hold for some processes. In this case, when we estimate the steady-state cost gradient and drive it to a constant setpoint of zero, this does not always correspond to the optimal operating point. By driving the cost gradient to zero, the system in principle only converges to a stationary point. This means that if Assumption 1 does not hold, then the stationary point may be a saddle point or even a local maximum.
Bayesian optimization is an alternative model-free approach that can be used to drive the processes to the global minimum without the need for detailed process models. Bayesian optimization is a black-box optimization approach where the real-time cost measurement is used to update a Gaussian process (GP) model. That is, it builds a surrogate model for the objective function, which is updated online by conditioning on the observed cost measurements. An acquisition function defined using the GP surrogate model is then optimized to compute the next control input ( Mockus, 2012;Jones et al., 1998;Frazier, 2018;Shahriari et al., 2015 ). The acquisition function is chosen to trade-off exploitation vs. exploration in order to find the global minimum. Bayesian optimization can be seen as a model-free approach that drives the processes to its global minimum using only the real-time cost measurement. However, it is not strictly a feedback control problem, since it involves optimizing the acquisition function at each time step to compute the next input.

Scalability to large-scale systems
The scalability of feedback-optimization in general is rather limited compared to traditional RTO framework, and there are different facets to the scalability issue, as discussed below: Complicated control structure design-With large multivariable plants with several inputs and constraints, the number of relevant active constraint regions increases. This implies to poor scalability for region-based control (paradigm 1) discussed in Section 2.1 . Firstly, identifying all the relevant active constraint regions a priori can be challenging. Secondly, designing control structures for each active constraint region and designing switching strategies between the different control configurations can lead to a complicated and messy control structure design. Furthermore, it may also happen that it may not be possible to find a pairing for the selectors that is unique in all constraint regions. This makes maintenance and monitoring of the control loops more challenging, and perhaps one would then be better off with the traditional RTO framework.
Alternatively, the scalability issues can be handled by using the primal-dual feedback control paradigm detailed in Section 2.2 . This is because, this approach does not require identifying the active constraint regions a priori . Secondly, the CV-MV pairing is rather straightforward in this approach as discussed in Section 6.2 . Furthermore, the controlled variables for both the dual controllers and the primal controllers remain the same throughout (cf. (14) ). This was also one of the main motivations behind the distributed feedback optimizing control structure proposed by Krishnamoorthy (2021a) Multivariable gradient estimation -In most model-free gradient estimation methods, the gradient estimation gets more challenging as the number of inputs increases. This is due to the unique perturbation frequency required to extract the effect of the input on the cost measurement. Due to the time scale separation requirement, the integral gain is limited by the slowest perturbation frequency, affecting the convergence speed. Furthermore, in largescale processes, the cost may be measured several units downstream from the input perturbation. This causes additional time delays in the process dynamics, which subsequently affects the convergence speed due to the time scale separation requirement. In such large scale processes, it is also important to ensure that the regulatory control loops that does not have an impact on steadystate economics (such as level control or pressure control), does not attenuate or amplify the perturbations, as this can lead to erroneous gradient estimation. Therefore, the model-free gradient estimation algorithms typically are more suited for unit operations, where the cost is measured locally.

Summary
In this paper we provided a comprehensive overview of the different approaches that aims to achieve asymptotic optimal operation using feedback control without the need to solve numerical optimization problems online. We firstly conclude this review by pointing out that there exists a wide array of different methods that can be used to achieve asymptotic optimal process operation using feedback control, as an alternative to traditional steadystate RTO. Secondly, we showed that the different methods have their own advantages and disadvantages, and differ on key properties ranging from the choice of controlled variables, the degrees of freedom, type of measurements used, the need for process models, convergence speed, accuracy, ease of implementation, scalability etc. Understanding the key distinguishing properties of the different feedback optimizing control methods is important to choose the right tool for the right problem, such that the benefits of online process optimization can be exploited in a wide range of applications.

Declaration of Competing Interest
We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.
We confirm that the manuscript has been read and approved by all named authors and that there are no other persons who satisfied the criteria for authorship but are not listed. We further con-firm that the order of authors listed in the manuscript has been approved by all of us.
We confirm that we have given due consideration to the protection of intellectual property associated with this work and that there are no impediments to publication, including the timing of publication, with respect to intellectual property. In so doing we confirm that we have followed the regulations of our institutions concerning intellectual property.
We understand that the Corresponding Author is the sole contact for the Editorial process (including Editorial Manager and direct communications with the office). He/she is responsible for communicating with the other authors about progress, submissions of revisions and final approval of proofs.