Derivative-Free Optimization With Proxy Models for Oil Production Platforms Sharing a Subsea Gas Network

The deployment of offshore platforms for the extraction of oil and gas from subsea reservoirs presents unique challenges, particularly when multiple platforms are connected by a subsea gas network. In the Santos basin, the aim is to maximize oil production while maintaining safe and sustainable levels of CO2 content and pressure in the gas stream. To address these challenges, a novel methodology has been proposed that uses boundary conditions to coordinate the use of shared resources among the platforms. This approach decouples the optimization of oil production in platforms from the coordination of shared resources, allowing for more efficient and effective operation of the offshore oilfield. In addition to this methodology, a fast and accurate proxy model has been developed for gas pipeline networks. This model allows for efficient optimization of the gas flow through the network, taking into account the physical and operational constraints of the system. In experiments, the use of the proposed proxy model in tandem with derivative-free optimization algorithms resulted in an average error of less than 5% in pressure calculations, and a processing time that was over up to 1000 times faster than the phenomenological simulator. These results demonstrate the effectiveness and efficiency of the proposed methodology in optimizing oil production in offshore platforms connected by a subsea gas network, while maintaining safe and sustainable levels of CO2 content and pressure in the gas stream.


I. INTRODUCTION
One of the main problems faced by vertically integrated oil companies (i.e., companies that control production, transport, storage and refining) is the supply of crude oil from offshore oilfields to refineries, that is, the management of the crude oil supply [1]. For offshore oil production, as in the Brazilian Pre-Salt layer [2], oil companies operate with floating production, storage and offloading vessels The associate editor coordinating the review of this manuscript and approving it for publication was Hao Wang .
(FPSOs), or simply platforms, to produce and store crude oil. After production, crude oil is then transferred to onshore terminals via submarine pipelines or special vessels equipped with dynamic positioning systems. As pipelines are not a viable alternative for oil transportation in deepwater oilfields, a fleet of vessels is deployed to transfer crude oil to terminals [3].
In addition to oil transportation logistics, such offshore production systems share an underwater gas flow network for transporting the produced gas to onshore terminals [4]. An example is the Santos Basin where several deep water FPSOs and fixed platforms are in operation. Due to the high content of contaminants in the produced gas, the platforms are equipped with dedicated units for sequestration of the CO 2 , which is reinjected into the reservoirs, but a part remains in the exported gas. To meet the restrictions on the maximum concentration of CO 2 at the onshore terminal, gas exports from the Pre-Salt platforms must be coordinated and mixed with gas devoid of contaminants, produced by a limited number of platforms operating in Post-Salt reservoirs. Such restrictions lead to the complex problem of coordinating the production of the platforms, seeking to maximize the total oil production from the multi-reservoir asset while ensuring the CO 2 limits in the produced gas, among others.

A. OVERVIEW AND RELATED WORKS
The production optimization of multiple platforms sharing a gathering network and processing facilities is not a new topic in the literature. According with the multilevel control hierarchy [5], the related works fall in one of two layers depending on the time span of predictions and the nature of the models.
The first layer is the long-term optimization which concerns decisions extending over years of operation and which makes use of reservoir models. Regarding the longterm, [6] addressed the optimization of the production rates that maximize the recoverable oil volumes in the context of multiple reservoirs. In [7], the value chain of such oilfields was optimized in the long-run, while managing CO 2 injection sites and considering other shared facilities. In [8], strategies were presented to optimize gas transportation in order to honor contractual agreements. In [9], models were developed to optimize the transportation network of natural gas on the Norwegian Continental Shelf. Oilfield management in long-term operations has also been the focus of derivative-free optimization (DFO) [10], [11], [12], [13]. In [14], for instance, different DFO methods were discussed, including, but not limited to, Hooke-Jeeves direct search, genetic algorithms and particle swarm optimization. In [15], DFO was applied to gas-lift optimization in an integrated environment, including reservoirs, wells and facilities, by means of the Mesh Adaptive Direct Search (MADS) algorithm. In [16], a DFO algorithm based on the trust-region method was applied for robust well control optimization of oil reservoirs under geological uncertainty.
The second layer is the short-term optimization which regards the daily and weekly operations and which considers the production infrastructure, such as the submarine flow network and topside processes. Our work belongs to this layer, with a focus on production optimization of multiple offshore platforms that share a subsea gas pipeline network while considering composition constraints, such as limits on the CO 2 content in the gas delivered to onshore terminals. Besides being of academic interest, this problem has practical applications in offshore operations, such as the multi-reservoir oilfields in the Santos Basin. In [4], a methodology was proposed to coordinate the production from multiple offshore platforms, which share a subsea gas pipeline network and are jointly limited by processing constraints at the onshore facilities. This methodology relies on surrogates models for production platforms and piecewise-linear approximation of fluid flow, yielding a mixed-integer linear programming (MILP) formulation to which standard algorithms can be applied. An application of this MILP-based methodology to the operations in the Santos Basin proved to be effective, but computationally costly. Aiming for locally optimal solutions with the direct use of multiphase flow simulators, an application of DFO algorithms was proposed in [17] for that problem. The DFO approach brings about flexibility and reduces the effort of model synthesis and maintenance. The application of a DFO algorithm to the Santos Basin obtained solutions of nearlyoptimal quality, but the computational cost was exceedingly high due to the direct use of a complex phenomenological simulator. In [18], a methodology for hyper-tuning of oil well simulators was presented using DFO. Besides meeting demands from the regulatory agency, accurately adjusted well models empower engineers to plan production and derive piecewise-linear surrogates that can be used in MILP production optimization. In [19], a DFO exact penalty algorithm was proposed and applied to production optimization of an offshore platform that receives the production stream of gas-lifted oil wells. Being an exact penalty method, it can accept infeasible iterates and handle constraints not be given explicitly, allowing constraints to be modeled through the optimization procedure.
This work aims to develop proxy models [20], [21] for gas pipeline networks that enable fast computation of pressures as a function of flows and concentrations of components of interest. The proposed proxy models capture the structure of the governing equations of such networks, while at same time using system identification to yield polynomial models from simulated data and field measurements. As such, the proxy models can relieve algorithms from the need of using simulation software, promoting fast and reliable optimization using DFO algorithms.

B. PROBLEM STATEMENT
The problem being addressed in this paper is the optimization of production in offshore oil platforms that share a subsea gas network. This production system is modeled using a directed graph, with the nodes representing production platforms, terminals, and junctions where material and pressure balance are enforced. The edges in the graph represent pipelines, wells, valves, and equipment such as separators and electric submersible pumps. The set of vertices in the graph is divided into three subsets: source nodes representing production platforms, sink nodes representing terminals, and internal nodes that do not accumulate material. The set of edges is also divided into two subsets: equipment such as chokes and pumps, and pipelines. The goal of the optimization problem VOLUME 11, 2023 is to maximize oil production, or an economic function, of an entire network of reservoirs and production platforms, while honoring various constraints such as capacity limits and pressure requirements. The oil is offloaded from the production platforms to the onshore terminal by shuttle tankers, whereas the produced gas is transferred by a subsea pipeline network.
In other words, the problem of interest consists in determining the optimal operation of an oilfield in order to maximize overall oil production or profits. This can be formulated as an optimization problem of the form: where x is the vector of decision variables, f (x) is the objective function representing the profit to be maximized, g i (x) are the inequality constraints, and h j (x) are the equality constraints that express physics relations, operational constraints, and limits that characterize the system behavior.
One challenge of the optimization problem being addressed is that the constraints may not be available in analytic form. This can make it difficult to solve the problem using traditional optimization methods, which typically rely on the ability to compute the gradient of the objective function and the constraints. In cases where the constraints are not available in analytic form, it may be necessary to employ alternative optimization approaches such as heuristics or metaheuristics, which do not rely on the availability of gradients. In this specific paper, we rely on DFO formulations for that matter.

C. CONTRIBUTIONS
The key contributions of this work to the state of the art are: • A methodology based on systems identification for proxy modeling of gas flow networks, which takes into account the structure of the governing equations of gas flow in pipelines.
• An analysis showing that the proposed proxy models are extremely fast to compute and achieve good accuracy, in terms of flows and pressures, with respect to data obtained from a high-fidelity multiphase flow simulator.
• A computational analysis demonstrating that DFO combined with the proxy models is an effective optimization tool for multi-reservoir oilfields sharing a subsea gas network, which can maximize a joint objective and handle simulated constraints.

D. ORGANIZATION
Section II presents a conceptual formulation for production optimization of multiple platforms sharing a subsea gas pipeline network. Section III provides background on the physics of gas flow in pipelines and methodologies for system identification, which are key for the design of proxy models. Section IV discusses how to convert the conceptual formulation into a concrete formulation using simulation and proxy-models, which can be optimized with standard algorithms. Section V introduces a production system in the Santos Basin as the case study, and further develops and analyzes proxy-models to approximate the simulation models of gas pipeline networks. Section V-C reports results from the solution of concrete formulations for the case study with DFO algorithms. Finally, Section VI draws some final remarks.

II. PROBLEM FORMULATION
Based on [4], this section presents a formulation for production optimization of offshore oil platforms that share a subsea gas network. The formulation arises from a directed graph G = (V, E) that models the production system. The nodes represent production platforms, terminals, and junctions where material balance is physically enforced. Connecting two nodes, arcs model pipelines that transfer fluids from source to sink and can also represent wells, valves, and equipment such as separators and electric submersible pumps (ESP) [22]. Because of the graph structure, the formulation becomes flexible and general to model heterogeneous production systems consisting of platforms, wells, subsea and topside facilities. We follow the standard notation of network-flow models [23] which concerns material balance and specialized works [24], [25] that consider pressure, temperature and energy balance in oil fluid flow. The set of vertices V is split into three subsets: V src are source nodes representing production platforms; V snk are the sink nodes representing terminals; and V int are the internal nodes that do not accumulate material, such as manifolds. The edge set E is divided in two subsets: E eqp denotes equipment such as chokes and pumps; and E pipes represents pipelines. Table 1 presents the notation for vertex and edge sets of the graph-based model for heterogeneous production networks. Sets are defined in calligraphic typeface, vectors in boldface, and scalar in normal typeface.
A pressure variable p v and flow vector q v are associated to each node v ∈ V. Flow rates are characterized by a set P of phases, typically oil, gas, and water, and a set C p of components of interest within phase p, such as CO 2 and C 3 contaminants. The vector q v = (q v,p , q c v,p : p ∈ P, c ∈ C p ) tracks the phases and components, but the meaning depends on the node: q v is the fluid flow traversing an intermediate node (manifold); the total production flow from a source (platform); or the flow entering a sink (terminal).
The pressure drop p e = p u −p v in a pipeline e = (u, v) ∈ E pipes is a function of the flow q e and physical properties of the pipeline, such as diameter, geometry, and roughness. The pressure drop is a function of the control variable when the arc e ∈ E eqp models an equipment, such as a choke valve. Table 2 lists the decision variables which play a part in the definition of physical and operating constraints.
• Flow balance must hold at internal nodes v ∈ V int : (1) • The flows in the arcs (v, u) leaving a platform v ∈ V src must add up to the platform's outlet production: • The flows in the arcs (u, v) entering a terminal v ∈ V snk must add up to the terminal's total inlet q v : • Flows and pressures in arcs (u, v) ∈ E must be consistent with the pressure at upstream and downstream nodes: • The pressure drop in a pipeline (u, v) ∈ E pipe is a function of flow and downstream pressure, and a function of a control variable if (u, v) is an equipment. The fraction of a component c ∈ C p is defined as: for arc and node flows respectively. For an arc e, q e,p is the flow of phase p, q c e,p is the flow of component c in phase p, and z c e,p ∈ [0, 1] is the fraction. Variables for nodes are defined likewise.
Certain process limits at production platforms and terminals can be expressed in the form of constraints on component fractions. Additionally, compositional balance constraints must also hold for phase p ∈ P and component c ∈ C p at network nodes: • For a production platform v ∈ V src , material balance must hold for component c of phase p, • In an internal node v ∈ V int , material balance must hold for the flow of component c in phase p, • Likewise, for a terminal node v ∈ V snk , • Compositions of flows leaving a source or internal v ∈ V src ∪ V int must be consistent, From the first equation, the component fraction in streams emanating from a platform must be all the same, a common hypothesis in flow splitting [26]. For an internal node, the second equation ensures that streams of distinct compositions can reach the node, such as a manifold, where they get mixed into a homogeneous outlet.

A. CONSTRAINTS ON ARCS AND NODES
Bound constraints are imposed on each node v ∈ V, VOLUME 11, 2023 and each arc (u, v) ∈ E, Constraints are imposed on the composition of flows leaving production platforms and entering terminals where z v = (z c v,p : p ∈ P, c ∈ C p } is a vector with the compositions for all phases at node v.

B. BOUNDARY CONDITIONS AT PRODUCTION PLATFORMS
First proposed in [4], boundary conditions can model the interface of a production platform v ∈ V src with the subsea gas network in terms of its outlet flow q v , pressure p v , and vector z v of compositions. Boundary conditions are a power tool to decouple a system into a set of subsystems, bringing about flexibility to connect heterogeneous subsystems and computational advantages to be explained later. Let there exist local control settings for wells, subsea and topside equipment that enable the platform to operate at the boundary condition x v . The feasible set X v is assumed closed and bounded, a plausible assumption given that a platform v is restricted by physical limits.
Notice that multiple control settings might exist for a boundary condition x v , each potentially inducing a distinct objective value for the production unit. Whichever the objective function selected for a platform v, we assume that the optimal objective f v (x v ) is known for the boundary condition x v , namely the maximum gain yielded by platform v when it operates at x v . This objective can be expressed in terms of production (cubic meters of oil per day) or any function modeling an economic gain.
The computation of control settings that maximize the objective for a given x v , i.e. calculating the optimal f v (x v ), is a challenging task. It entails solving a potentially hard optimization problem of varying complexity that depends on the platform v. For instance, in offshore platforms operating with satellite wells, the optimization is relatively simple because surface operating conditions are determined by a fixed pressure at the separation unit [27]. In more complex environments, wells have subsea completion that demands the modeling of pressure, subsea and topside processing equipment, flow splitting, and routing to manifolds and headers, among other features [24], [25], [26].

C. CONCEPTUAL FORMULATION
This work focuses on the production optimization of large offshore oilfields, consisting of several platforms producing from multiple reservoirs that share a subsea gas pipeline and compete for limited resources. This production optimization problem is conceptually formulated as: where: (i) q, p, and z are vectors with the flows, pressures, and compositions for all network nodes and arcs; is the vector with all boundary conditions; and (iii) θ = (x, q, p, z) collects all of the decision variables. The vector functions H and G model the network constraints (1)-(13), such as flow conservation, pressure balance, and composition restrictions. Further, the boundary conditions at sink nodes are embedded in the network constraints, such as their fixed operating pressure. The objective expresses the overall economic or production goal of the oilfield.

D. DISCUSSION
The production optimization problem P is regarded as conceptual, or abstract, because the sets X v of boundary conditions, their respective objectives f v , and pressure-drop relations are not explicitly known. Such relations are typically implemented in black-box simulators, or else need to be inferred from field measurements. By generating proxy models, the conceptual problem is cast as a concrete optimization problem to which algorithms can be applied. The kinds of algorithms depend greatly on the nature of the proxy models, which are thus pivotal for the efficiency of a solution methodology.
We follow the modeling of platforms as sets X v of boundary conditions as proposed in [4], using proxy models based on piecewise-linear approximation of the objective f v (·) which enables the optimization of heterogeneous units. Explicitly accounting for the detailed model of production platforms would render an intractable problem P, given the complexity and sheer size of the MINLP problems for local optimization of platforms. This work contributes by proposing a methodology based on systems identification, and the properties of fluid flow in pipelines, that leads to the synthesis of proxy models for pipelines that are sufficiently accurate and extremely fast to compute. For one, the pipeline proxy models can be derived from field measurements or from data produced by high-fidelity fluid flow simulators. For another, the resulting proxy models can serve as extremely fast simulation tools in what-if analyses or embedded in P to obtain a concrete optimization problem.
In what follows, we propose a methodology for the synthesis of proxy models and the use of DFO for solving the resulting production optimization problem. The optimization algorithm will choose boundary conditions that ensure a feasible operation for the platforms and subsea network, while yielding an optimal system-wide operation.

III. BACKGROUND
This section presents basic concepts on the hydraulics of gas pipelines and networks, along with fundamentals on system identification that are key for developing proxy models.

A. GAS PIPELINES
The hydraulics of gas flow in pipelines is relevant for choosing a suitable structure for pipeline models, which will be identified from simulated data and real measurements. The focus is on equations and methods related to pressure calculation, given flows rates and CO 2 content. We start from equations of hydraulics for single-segment pipelines and then address multiple-segment networks.

1) HYDRAULICS
The theory of natural gas flowing in pipelines is in a very mature state. The Fritzsche equation that computes the flow rate for a given differential pressure dates back to 1908, for example. Several phenomena have been studied in detail in the past decades, leading to equations and models for a vast number of different flow scenarios.
For the sake of objectivity a scenario with modest complexity is chosen. It is not too complex where the physics intuition about the equations may be lost, yet not overly simple such that important information, which could potentially be used as prior knowledge in proxy modeling, is lost. The proposed scenario makes the following assumptions: 1) Single-segment Pipeline: Long offshore gas pipelines are often built of many connected different pieces of pipes with different physical properties; those smaller pieces are called segments. In order to compute flows given pressures, for example, the available methods use converging algorithms. Here, however, a single segment is considered to represent a pipeline and therefore equations can be used instead of algorithms. 2) Single-phase Gas Flow: Platforms can produce natural gas in purely gas phase, liquid phase or a mixture of them. Besides in the reservoir, natural gas can change its phase inside a pipeline depending on local pressure and temperature. With a mixture of phases, the socalled multi-phase flow, the equations become far more complex than the ones of single-phase. The hypothesis here is that single-phase equations can describe most behaviors satisfactorily, assuming there is a dominant flow regime in a particular segment. 3) Real Conditions: Under high pressure and at low temperature, which is the case in subsea pipelines, the ideal gas law is not applicable. A nonlinear term known as compressibility factor is defined as the ratio of the gas volume at a given temperature and pressure to the volume the gas would occupy if it were ideal at the same conditions. Given the relative magnitude of pressures and temperature variations in subsea pipelines this term can't be beforehand neglected. 4) Pipeline Not Elevated: Elevation is the difference of height between two points of a pipeline segment.  The undersea elevation variation is normally small in comparison to the length of the pipeline itself, therefore is neglected. 5) Isothermal Flow: Under isothermal flow, the pressure drop can be calculated using constant gas properties such as the compressibility factor, instead of integrating the state equation for each infinitesimal piece of the pipeline. Because the whole purpose of the proxy is to avoid the use of algorithms, the assumption is that the flow is isothermal, in other words, that the gas temperature is constant along the pipeline. 6) Turbulent Flow Regime in Smooth Pipes: Among the categories of turbulent flow in pipes, this work chooses the turbulent flow in smooth pipes for better representing the scenario of interest. But other categories could be selected, such as turbulent flow in fully rough pipes, and transition flow between smooth and rough pipes. Figure 1 illustrates a pipeline segment with dimensions. Before presenting the equations for gas flow in pipelines, let us introduce the nomenclature in Table 3.
The momentum equation applied to a portion of a pipe of length dx, inside which flows a compressible fluid, for example natural gas, in steady state conditions is: The third term, potential energy, which depends on height is neglected according with the scenario characterization presented above. The last term, viscous dissipation, is conveniently simplified in order to ease integration as follows: which is a constant and leads to the equation: Integrating Eq. (17) results in the known general gas flow equation, (18) in which the meaning of the constants k i are not relevant for the purpose of system identification. From Eq. (18) the influence of flow rate on pressures is quite obvious. The influence of CO 2 however is hidden in G and ζ . The compressibility factor ζ can be further detailed as [28]: It can be noticed that both Eqs. (18) and (19) depend on the natural gas specific gravity G, which is define as: (20) where M air is the air molecular weight, which is a constant equal to 28.9625 g/mol, and M air is given by the mixture of gases: where M i is the molecular weight, y i is the molar fraction of the i th component and I is the number of components. Table 4 shows the molecular weight value for the most important components of natural gases.
The first key to figure out the influence of CO 2 on pressures regards the natural gas specific gravity G. From Eq. (21), the molecular weights reported in Table 4, and assuming that the molar fraction of Methane is much greater than the other components, it is possible to conclude that the higher the CO 2 content the higher will be the specific gravity G. Thus, increasing CO 2 content increases G, which in turn decreases the flow rate q for the given pressure as seen in Eq. (18). Conversely, in Eq. (19) the compressibility factor ζ shrinks exponentially with the increase of the CO 2 content, which thereby increases the flow rate.
So far the clues do not lead to a conclusion simple enough to drive the design of the proxy model structure. Another try is the Darcy friction factor f present in the denominator of Eq. (18). For turbulent flow in smooth pipes the friction factor is given by the Collebrook-White equation: This equation could have been the second key to try to figure out the influence of CO 2 on pressures, but it is far too complicated and associated to the influence of the first key the final conclusion is that, for this part of the pressure model, an approach based on data should be favored. Such an approach is called gray-box system identification.

2) PIPELINE NETWORK
So far the basic scenario has considered a single pipeline segment. A bit more complex scenario in which there are two different pipeline segments connected is shown in Figure 2.
If the pressure P 1,1 and the flow rate q 1 are known, then it is possible to calculate P 1,2 = P 2,1 with Eq. (18) and after that calculate P 2,2 . As such, P 2,2 no longer comes from an equation, but from a cascade algorithm that uses this equation. Still this is a one-pass algorithm which, from a proxy system identification perspective, could be tackled as a hierarchically structured model. Considering gas pipeline networks found in scenarios of interest, such as oilfields in the Santos Basin, streams from multiple pipelines flow into a single pipeline as illustrated in the simple network shown in Figure 3. In this context, assuming that the pressure P 3,2 and flows q 1 and q 2 are known, the objective is to calculate pressures P 1,1 and P 2,1 . If all simplifications are kept, the solution can also come from a one-pass algorithm. It computes first pressures P 1,2 = P 2,2 that depend on P 3,2 and the sum of q 1 and q 2 . With these pressure values, then P 1,1 and P 2,1 can be calculated.
If the assumptions from Section III-A1 no longer hold, then several other variables start to play a role in the computation process. For instance, the gas and liquid-phase rates change along the segment according to the pressure gradient, possibly involving complex behavior such as flashing. The change of phase in itself also influences the pressure gradient and the integrated equation such as (18) cannot be applied.
Simulators [29] handle such complex settings by iterative algorithms that integrate fluid property values along the pipe segment length. They also use another iterative method called nodal analysis that guesses flow rates and converges in an iterative fashion to a set of consistent pressures. This iterative nature is the reason for a high computational burden. Hundreds or even thousands of iterations are often needed to converge. Not only the number of iterations, but also the complexity of the equations involved in the calculations play a major role in the total computational load.
To be fast the chosen proxy model structure would need to avoid this iterative nature and simplify as much as possible the involved equations. The next sections detail how to achieve these objectives.

B. SYSTEMS IDENTIFICATION
System identification is a methodology for building mathematical models of dynamic systems using measurements of the system's input and output signals [30]. The field of system identification can be broadly divided in two main categories: linear and nonlinear. The field of system identification can be broadly divided into two main categories: linear and nonlinear. Linear models should be considered first; only when they do not yield a satisfactory performance should nonlinear models be used, which are significantly more complex. As will be seen later, the relation between pressures and flow rates of natural gas flowing in pipeline networks is nonlinear. Following, there is a brief description of the sequence of tasks performed in nonlinear system identification for static models.

1) MODEL INPUTS AND OUTPUTS
The inputs and outputs are straightforward for the problem of concern. The inputs consist of (i) the natural gas flow rate q v from the offshore platforms v ∈ V src , assuming a single phase regime (P = {gas}), (ii) fraction z c v of the component c ∈ C in the gas stream q v , assuming CO 2 as the component of interest. The output is (i) the exporting pressure p v of the production platforms.

2) MODEL ARCHITECTURE
The selection of the architecture is a subjective decision that should be made based on three aspects.
The first aspect regards how the system's output changes along time once the input is set, to put in other words, if the model is static or dynamic. (i) Static is a kind of modeling to which the time between an excitation signal and its respective output response is not important. The outputs change instantaneously with input variation or reach a steady-state after lag time known as settle-time, which is a property of stable system. (ii) Dynamic modeling is when the temporal behavior of the system is taken into account. Such models tend to be more complex, given that the trajectory of the states of the system is also modeled during the settletime [31].
The second aspect relates to the overall modeling approach. (i) White-Box is fully derived by first principles, such as physical and chemical laws, leading to a theoretical modeling of all equations and parameters [31]. (ii) Black-Box is based solely on measurement data. Both model structure and parameters are determined from experimental modeling. It is appropriate when no prior knowledge about the system is available, or the equations based on first principles are highly complex [31], [32]. (iii) Grey-Box represents a compromise between white-and black-box models by integrating various kinds of available information. Typically, the model's structure is determined from prior knowledge, whereas measurement data are used for parameter identification [32].
Finally, the third aspect consider the mathematical functions that define the model. (i) Linear models are the simplest and must be used whenever possible. (ii) Polynomials extend linear models, becoming more flexible as the degree increases [31]. (iii) Look-up Tables are essentially a large collection of samples combined with some interpolation and optimization technique. (iv) Neural Networks (NNs) are mathematical arrangements inspired in biological neurons with distinct structures that can be particularly interesting for static (multilayer perceptron) or dynamic models (recurrent neural networks). NNs with a large number of layers, known as deep learning, have achieved great success in complex problems. They are difficult and slow to train, and the incorporation of prior knowledge is limited. Recently, [33] proposed a regularization to bring prior knowledge of the physics involved into the training of NNs.

3) MODEL STRUCTURE
The basic principle here is to compose a complex model of many simpler low-dimension models. Some ways to organize the smaller models include the multiplicative correction model, refinement submodel, parameter scheduling, projection based structures, and additive and hierarchical structures. Hierarchical structures are particularly helpful to VOLUME 11, 2023 model physical phenomena, mainly because they break the model down into physically meaningful pieces.

4) PARAMETER ESTIMATION
This step is not as subjective as the previous ones. It is the application of well know techniques over the model skeleton built so far, with defined inputs, architecture and structure.
A key concept is the loss function, which serves to assess the quality of the model parameters and guide optimization algorithms. The loss function gives the error value as a function of the parameters of the model. In a broad sense, error is the difference between the output of the model and the output of the process for a given input: Arguably, the most common form of the loss function is the sum of squared errors: where θ is the vector with model parameters and N is the number of training data samples. When the error between process and model output is linear on the parameters, and the sum of squares is applied as a loss function, the optimal parameters can be computed by means of least squares. For the sake of simplicity, we consider the case where the system input u and output y are scalars. Given an input u j , the predicted model output y j is given by where x i,j are called regressors given by with g i (·) being problem dependent functions. For instance, if g i (u) = u i−1 is a monomial of i th degree, the prediction becomes a polynomial function of the inputs. The regressors can be arranged in matrix forms as where The least-squares problem for the loss function (24) becomes which can be solved efficiently by means of Cholesky factorization and iterative methods.

5) MODEL VALIDATION
The aim here is to check whether or not all previous steps were carried out successfully. This is achieved in two steps.
(i) The first step uses a criterion to measure the quality of the model on training data. The criteria can be a visual inspection of data plots, sum of squares due to error (SSE), coefficient of determination (R 2 ), degrees of freedom in the error (DFE), degree-of-freedom adjusted coefficient of determination, root mean squared error (RMSE), and finally expected error (EE).
(i) If the chosen criterion calculated on training data achieves good enough results, the second task is to reevaluate the model on fresh data, also called test data. The coefficient of determination is a measure of the system response variation that is explained by the model. An intuitive relation is given by: For instance, R 2 = 0.9 means that 90% of system variation is explained by the model. In mathematical notation, where y is the mean of training data output {y j } m j=1 . The expected error is a measure of the model output error expectation. Because the value is expressed in the same units of the system's output, it is intuitively interpretable:

IV. SOLUTION FRAMEWORK
This section addresses the synthesis of a concrete formulation for the conceptual problem (14), which can be solved with standard algorithms for production optimization. The nature of the approximation of the objective and constraints is pivotal with respect to the class of algorithms that are applicable. The equations (14c) that relate flows, pressures, and compositions of the gas pipeline network can be modeled directly with a phenomenological simulator, or with the proxy models developed in the previous section. Conversely, the objective function f v is hard to compute since it entails solving a complex MINLP problem [25] for a given boundary condition x v , which depends on the particular structure of the platform v.

A. MODELING THE OBJECTIVE FUNCTION
Since it would impractical to solve f v for several x v iteratively, an alternative is to compute this objective offline for a finite, sample set X v ⊂ X v of feasible boundary conditions. Then, f v can be approximated by a multidimensional piecewise-linear function f v using the point set Notice that the resulting piecewise-linear function is a proxy model for the objective. The modeling of a piecewise-linear function in mathematical programming is a topic of considerable complexity though. The first issue regards the partitioning of the domain X v into a polyhedron set, which can be an arbitrary set of polyhedra, a triangulation, and a regular lattice, among others. The second issue is about the formulation of f v in mathematical programming with the use of continuous and binary variables [34], [35]. A number of formulations are found in the literature that depend on the domain partition, including the convex combination, the disaggregated convex combination, the logarithmic convex combination, and the special order set of type 2 (SOS2) [34]. Instead, if the problem is solved with a DFO algorithm, then f v (x v ) can be obtained by convex combination of the closest points to x v in S v that form a simplex.

B. MODELING THE PIPELINE NETWORK
This work considers the use of the multiphase flow simulator Pipesim to solve the pipeline network equations (14c), and thereby compute the node pressures and pressure drops for a given vector x = (x v : v ∈ V src ) of boundary conditions. Together with a piecewise-linear approximation of the objective function f v , this black-box model leads to a concrete problem that can be solved with DFO algorithms. This approach, which relies on the use of a black-box simulator in the optimization loop, was proposed in [17], which serves as a reference model and a means of comparison.
Conversely, our work proposes the use of the proxy model developed in the previous section in place of the black-box simulator. The concrete problem that results is explicit in the sense that network relations are given by algebraic equations, without simulation models that are computationally costly to run. Given the good accuracy of the proxy models identified for the pipelines in the Santos Basin, this alternative brings about efficiency gains. Actually, two solution methods may result from the pipeline proxy model. The first is obtained by combining the proxy models for the pipeline pressure drops (Eq. (5)), with a mixed-integer formulation of the objective functions f v . Instead of a MILP formulation for f v , the second uses the set S v as a look-up table to compute f v (x v ) by convex approximation of the closest points to a given boundary condition x v , as in the case of direct use of a black-box simulator.

C. CONCRETE FORMULATIONS
Considering the models presented above for the objective function and pipeline network, three concrete formulations for the production optimization problem (14) are readily derived: • Simulation-Based (SB): It consists of the use of Pipesim simulator to solve the pipeline network equations and algorithmic piecewise-linear modeling of the objective function in tandem with a DFO algorithm.
• Proxy-Based (PB): Akin to the simulation-based strategy, but instead the pipeline network equations are solved approximately and algebraically using the proxy-models identified from simulation and field data. The resulting concrete formulation can be optimized with DFOs.
• Mixed Approach (MA): The mixed approach combines both simulation and proxy-based strategies. Alternate searches can be performed between the two methods, or a sequential approach can be followed (i.e., first optimizing the proxy model and using the results as a starting point for the simulation-based procedure).

V. CASE STUDY A. SANTOS BASIN
The Santos Basin is a sedimentary basin on the Brazilian continental shelf, extending for hundreds of kilometers along the coast. Large oil and gas reserves were discovered in deep waters about 300 km off the coast, below the salt layer, the socalled Pre-salt. The Pre-salt carbonate reserves is probably the most important oil discovery in Brazil [36]. From the production point of view, this new frontier has technological challenges that are still being addressed [2]. Figure 4 gives the schematic representation of a gas pipeline network in the Santos Basin. FIX is a fixed platform producing gas and condensate from wells with subsea completion. This platform is connected to the onshore terminal by a subsea pipeline of 167 km. FIX was designed to receive gas from the Floating Production Offloading and Storage (FPSO) platforms. The gas with high CO 2 content produced from Pre-salt reservoirs by FPSOs 1, 2 and 3 which is gathered in a manifold M PS . The gas from the Pre-salt is then mixed with gas from the Pos-salt produced by FPSO 4 and FIX, which have low CO 2 content, and then transferred to the onshore terminal. The oil produced by the FPSOs is transported to refineries by special vessels.
The challenge for integrated operations rests on maximizing overall production from the oil reservoirs, while accounting for physical and operational constraints in the production platforms, pipelines, and terminals. Of concern is the concentration of contaminants in the gas delivered to the terminal, which must be within the operational range of the onshore facilities.
The  The network transports gas from the platforms to the onshore terminal which imposes limits mainly on CO 2 content, but potentially others in the future as the reservoirs mature, such as C 3 . Thus the phase set is P = {gas} and the content of interest is C gas = {CO 2 }. The boundary condition x v of a platform v consists of the gas outlet flow q v,gas , the outlet pressure p v , and the CO 2 content z CO 2 v,gas in the gas phase. The onshore terminal t = OS operates at a fixed pressure p t = 45 bar.
The case study and its associated characteristics including wells and pipelines are based on actual production systems operated by Petrobras. For this reason, further details of the processes are confidential and cannot be disclosed.

B. PROXY MODEL SYNTHESIS
Here, the methodology for systems identification is applied following the steps previously presented in Subsection III-B, with support from the theory on gas pipeline hydraulics. The aim is to design a proxy model for pressure calculation in the subsea gas pipeline of the Santos Basin.

1) MODEL INPUTS
As illustrated in Figure 4, the boundary conditions with the network inputs are: 1) x v = (p v ): the operating pressure at the terminal v = OS. 2) x v = (q v , z CO 2 v,gas ): the exporting gas rate and CO 2 content from the platforms v ∈ {FPSO-1, FPSO-2, FPSO-3} that operate in the Pre-salt reservoirs.
3) x v = (q v ): the exporting rate of the platforms v ∈ {FIX, FPSO-4} that produce gas without contaminants.

2) MODEL OUTPUTS
The output consists of total gas delivered to the terminal, its CO 2 content, and pressures at the source and intermediate nodes of the network. Mathematically, the output vector is . For the proxy model formulation, we assume that there is no loss in the gas flows (i.e., the total flow is the sum of the individual platform flows) and no phase transition. These assumptions allow the approximation of the gas delivered to the terminal and its CO 2 content with the following algebraic equations: As for pressure outputs, a proxy model is presented below based on the system identification methodology presented above. Notice that when the experiments employ the phenomenological simulator for benchmark purposes, no such assumptions are made and the pressures are taken from the resulting simulations.

3) MODEL ARCHITECTURE
Regarding the model architecture, three decisions are made: 1) Temporality: because the settling-time is much shorter than setpoint changes in daily production operations, a static model is chosen rather than a dynamic model. 2) Overall Modeling Approach: white-box modeling is overly complex and slow for optimization purposes, which motivates the design of a proxy model. The gray-box is favored in regards to black-box modeling because of the prior knowledge seen thus far in terms of physical principles, which can determine the model structure. 3) Mathematical Base Functions: considering the general flow equation (18), polynomial functions seem appropriate to model the relation between pressures and flow rates. Conversely, the equations that relate pressures and CO 2 content are far more complicated than the relation between flows and pressures. For these reasons, polynomial functions of flow rate and CO 2 content will be used as proxy models for pressures.

4) MODEL STRUCTURE
The model structure is based on two decisions: 1) Polynomial Degrees: Equation (18) shows a quadratic relation between pressures and flow rate in pipelines, therefore a polynomial of second degree is chosen. Given the complexity of the relation between pressure and CO 2 content based on prior knowledge, a second degree polynomial is initially chosen to represent the nonlinear behavior. The degree may be increased depending on performance. 2) Submodel Structure: With 9 inputs for 5 outputs (pressures at network nodes), the model dimensionality is relatively low to be handled without submodel composition. Nevertheless, it is worth looking into the behavior of a single variable to assess how well a polynomial model would fit the process. Figure 5 shows how the exporting pressure at FPSO 1 behaves according to its flow rate and CO 2 content variation, keeping constant the boundary conditions of the remaining platforms. From Figure 5 one concludes that the raw data is not a good fit for a polynomial model. This highly nonlinear behavior arises from the superposition of several complex equations and algorithms used in white-box simulators, which jointly generate data that is difficult to interpret.
This work proposes a simple submodel defined as a string of segments without junctions that preserves enough homogeneity such that the integrated equation (18) is still applicable. In such a composition, the original network must be dismantled in as few pieces as possible, as shown in Figure 6, otherwise too many submodels would have to be identified.
The pipeline segments from a to f , highlighted in red, are modeled individually. Each submodel has only one output, being the difference between its upstream and downstream pressure, called differential pressure P. The inputs are the flow rate q, CO 2 content z, and downstream pressure P ds . The use of downstream pressure as input variable decouples a submodel from the adjacent ones. Figure 7 shows the behavior of one of these pieces (submodel) with data gathered from the simulator. The curve is much smoother than in Figure 5 and apparently fits quite well to a polynomial model due to its curvature.
Considering the developments above, the seconddegree polynomial equation (35) is used as a proxy model for the pressure drop P in a pipeline segment, with the number of inputs being equal to three. This submodel equation is used for all six pipeline segments, from a to f , as follows: The final composite model able to calculate the pressures of the platforms, instead of pieces, is organized hierarchically and the set of equations, one for each platform, is:

C. COMPUTATIONAL ANALYSIS
In this section, the outcome of proxy models for estimating pressure in gas pipeline networks is evaluated in relation to the optimization of production in a multi-reservoir oilfield situated in the Santos Basin. Three distinct formulations are solved using a DFO algorithm, and the accuracy of these models is subsequently assessed.

1) PROXY MODELING
The values for θ in each P a...f of Equation (36) is now estimated. In order to train the proxy model, a set of data containing inputs u i and outputs y i are supplied by the phenomenological simulator. For each submodel a to f , 8000 samples were collected from the combination of 20 × 20 × 20 inputs (q, z and P ds ), except for the gas stream in the segment e which does not carry CO 2 . From this database, the values of all θ are calculated by least squares and the result is shown in Table 5. Based only on data from simulator, the criteria R 2 and EE as defined by Equations (32) and (33) are applied to each submodel a to f . The results are shown in Table 6. As far as R 2 is concerned, the worst fitting result is in submodel f . The interpretation is that the model explains 98% of P response. Regarding EE, submodel d was outperformed by the others. The value 2.1456 is the expected error in pressure units for this pipeline segment.

2) TEST CASES
The nominal case is denoted by BASE. PRE gas is a variation of the nominal case in which the total gas coming from the Pre-Salt is increased by 15%, while the same ratio reduces the Post-Salt gas. The initial guess POS gas represents a boundary condition in which the gas coming from the Pos-Salt is increased by 15%, and the gas with high content of contaminants is decreased in the same proportion. The same analogy holds for PRE CO2 and POS CO2 , i.e., these guesses represent boundary conditions in which the CO 2 concentration is increased and decreased by 15% with respect to the nominal case, respectively.
However, since the CO 2 concentration is constant for the Pos-Salt platforms, only a variation in CO 2 content of the Pre-Salt platforms is considered. It should be mentioned that the values of physical quantities are given in the International System of Units (SI). Gas flow is measured in million standard cubic meters per day (MSm 3 /d), CO 2 concentration in percent (%), and time in seconds. Three test cases are considered in the experiments: • C1: the first case does not impose constraints on the flow rate (q max t,gas ) and CO 2 concentration (z max t,CO 2 ) at the terminal, only pressure constraints are in place.
• C2: the second one limits the total gas reaching the onshore terminal with respect to the first case.
• C3: the third case constrains the maximum CO 2 content allowed in the gas stream reaching the terminal, while relaxing the upper bound on the total gas delivered.
All three cases impose a maximum pressure of 250 bar at the production platforms. Overall, the problem to be optimized consist of: s.t. : H(q, p, z) = 0, (37c) G(q, p, z) ≤ 0, z CO 2 t,gas (q, p, z) ≤ z max t,CO 2 , (37f) where f v (x v ) represents the total oil production from platform v (Sm 3 /d), when it operates according to boundary condition x v ; and p fix t = 45 bar. However, other objectives can be readily accommodated such as the maximization of an economic gain considering the revenue from hydrocarbon sales, discounted costs regarding gas compression and water treatment before discharge.

3) OPTIMIZATION RESULTS
For all test cases and concrete formulations, the resulting production optimization problem (37) was solved with the OrthoMADs algorithm [37], which is part of a class of algorithm known as MADS (Mesh Adaptive Direct Search) [38]. MADS employs a strategy that seeks to find an optimal point of a function by evaluating and comparing its value at different points. In Mesh Adaptive methods, all points are on top of a mesh which is refined iteratively. Overall, the goal of each iteration k is to find a point x that improves the best point x k known thus far (i.e., f (x) < f (x k )). OrthoMADS provides an improvement in the Poll stage of the MADS algorithm, considering polling directions that are deterministic and orthogonal to each other [39]. The algorithm uses Halton's pseudo-random sequence, which covers the space more evenly than a real random sequence, to generate tentative solution vectors. Our work used the NOMAD package [40], which implements the OrthoMADS algorithm in C++. The experiments were implemented in Python 3.6 using OrthoMADS and performed in a Windows virtualized environment, with 2 cores and 8 GB of RAM. Table 7 presents the results obtained by optimizing problem (37) with direct use of Pipesim as the black-box simulator (Simulation Based). These results define the benchmark, against which the results to be obtained with the support of proxy models will be compared.

PROXY-BASED METHOD
For the proxy model optimization (Proxy-Based), a safety threshold variable (γ ) is defined on the pressure constraints to counter for the cumulative error in the network pressures, which are calculated according with P regressions given in Eq. (36). After optimizing problem (37) with the proxy model, the obtained solution is tested with the simulator for more accurate pressure measurements, i.e., checking solution feasibility. Figure 8 presents the workflow of the proposed proxy-based strategy. With the addition of γ , the pressure constraints become: with γ ∈ (0, 1). Next, we will illustrate the impact of the parameter γ on solution feasibility. Table 8 shows the results for the proxy-based optimization following the workflow from Figure 8, with γ ranging from 0% to 5%. It is possible to notice that, given some initial scenarios and conditions, the feasibility of the solution with respect to the simulator is achieved by increasing the value of γ .
An interesting aspect is that the optimization with the proxy model produced better results than the benchmark, arguably because of the smoothing characteristic resulting from the regression. When using the simulator itself, sometimes the optimizer can get stuck at certain points on the grid, being prevented from leaving the region due to the non-smoothness of the simulations. The non-smoothness of the simulation output can be observed in Figure 5.

HYBRID METHODS
Considering the optimization based jointly on simulation and proxy models, we can highlight two types of model combination: • The first is a sequential method. First a good starting point x 0 for the boundary conditions is sought through VOLUME 11, 2023    the proxy model, which has a relatively shorter computational time. Then, a simulation-based optimization is invoked from this starting point.
• The second is a concurrent method, in which the proxy model is used to assist the simulation-based optimization. The surrogates are used by the algorithm to dynamically adapt parameters, perform local search, and identify promising candidate solutions [41], [42]. Table 9 reports the results of the sequential method, whereby the solution obtained with the proxy model formulation is used as an initial guess for the simulation-based approach. Despite being initialized with a relatively good starting point x 0 , the algorithm performs several iterations until converging to a local solution. In this approach, optimal points from the proxy model that violate the base restrictions are also considered, leaving the task to recover feasibility to the DFO algorithm coupled with the simulation model. Table 10 shows the results obtained with the DFO algorithm using jointly proxy and simulation models. It can be seen that the proxy model can aid the optimizer to achieve better results than the benchmark, taking advantage of the smoothness of the proxy model to get out of locally optimal points on the grid. An advantage of this approach is the guarantee of feasibility at termination, since the algorithm optimizes the simulation model. However, the combined method incurs additional computational time.
Although this approach can improve the benchmark results, it can have a detrimental effect on the objective in some cases. Adding this fact to the higher computational time, the search for a starting point using the pure proxy model may become more appropriate than the simultaneous mode in this particular case study.

VI. CONCLUSION
This work developed proxy models of gas pipeline networks for fast computation of pressures at inlet nodes, and gas flow rate and CO 2 content at the outlet node. For pressure, the proxy model was derived from the system identification methodology, while taking into account the governing equations of gas flow into the model's structure. For CO 2 and gas flow, simple algebraic equations based on mass conservation and proportions provided good approximations. Using the resulting proxy models, the variables of interest can be calculated far more efficiently when compared to a highfidelity simulator, given the required boundary conditions at the nodes that represent production platforms and terminals.
The use of the proxy models with a DFO algorithm proved to be effective. Characterized by the gas export rate and CO 2 content, the operating conditions for production platforms were optimized to maximize overall oil production, while ensuring pressure bounds at platforms, and gas rate and CO 2 limits at the terminal. However, a suitable safety margin on pressure bounds was introduced to compensate for the prediction error of the proxy model. Besides optimizing exceedingly faster than with the direct use of the simulator, the DFO algorithm converged more consistently to solutions of higher production.
For future works, different methods of DFO could be employed and compared such as trust-region and augmented Lagrangian algorithms. Additionally, other characteristics could be considered in the model, such as moisture, contaminants besides CO 2 , and multiphase flow.
CARLOS LUGUESI received the degree in technical-professionalizing (electronics technician) and the degree in industrial electrical engineering emphasis electronic from the Federal Technological University of Paraná, in 2000 and 2005, respectively. He is currently an Automation Engineer with Holic Automação Industrial. He has experience in electrical engineering, with emphasis on industrial electronics, and electronic systems and controls. LAIO ORIEL SEMAN received the Ph.D. degree in electrical engineering from the Federal University of Santa Catarina, in 2017. He is currently a University Professor with the University of Vale do Itajaí, Brazil. His research interests include strategies for static and dynamic optimization, along with applications in traffic control, cyberphysical systems, and oil and gas production systems.
JOSÉ TORREBLANCA GONZÁLEZ is currently a Lecturer of electronic technology with the Department of Applied Physics, University of Salamanca. He worked as a Technical Engineer in the design, assembly, and repair of all types of refrigeration systems, from the assembly of industrial air conditioners to the assembly of freezing and conservation centers at private enterprise for two years. He has extensive knowledge in the operation of power electronics circuits and works on several research projects. One is dedicated to renewable energies, especially photovoltaic installations, another project is involved in the development and creation of test benches to study the behavior of photovoltaic modules in real sunlight. On the other hand, he is dedicated to intelligent devices and the study of sensors that can be used to help people, more specifically patients with different types of illnesses such as diabetes. Currently, he also dedicates another part of his time to the study of energy efficiency and implementation of systems with lower consumption and higher efficiency. He is also a Collaborating Researcher with the Expert Systems and Applications Laboratory (ESALab), University of Salamanca, Spain. His research interests include distributed systems with a focus on data privacy, communication, and programming protocols, involving scenarios and applications for the Internet of Things, smart cities, big data, cloud computing, and blockchain. VOLUME 11, 2023