Machine learning for industrial sensing and control: A survey and practical perspective

With the rise of deep learning, there has been renewed interest within the process industries to utilize data on large-scale nonlinear sensing and control problems. We identify key statistical and machine learning techniques that have seen practical success in the process industries. To do so, we start with hybrid modeling to provide a methodological framework underlying core application areas: soft sensing, process optimization, and control. Soft sensing contains a wealth of industrial applications of statistical and machine learning methods. We quantitatively identify research trends, allowing insight into the most successful techniques in practice. We consider two distinct flavors for data-driven optimization and control: hybrid modeling in conjunction with mathematical programming techniques and reinforcement learning. Throughout these application areas, we discuss their respective industrial requirements and challenges. A common challenge is the interpretability and efficiency of purely data-driven methods. This suggests a need to carefully balance deep learning techniques with domain knowledge. As a result, we highlight ways prior knowledge may be integrated into industrial machine learning applications. The treatment of methods, problems, and applications presented here is poised to inform and inspire practitioners and researchers to develop impactful data-driven sensing, optimization, and control solutions in the process industries.


Motivation
Data analytics and machine learning (ML) ideas are not new to the process industries 1 .The review paper by Venkatasubramanian [1] provides an excellent overview of the history, successes, and failures of various attempts over more than three decades to use ideas from artificial intelligence (AI) in the industry.In particular, statistical techniques such as principal component analysis, partial least squares, canonical correlation analysis, and time series methods for modeling, such as maximum likelihood estimation and prediction error methods, have been extensively used in industry [2].Several classification and clustering algorithms, such as k-means, support vector machines, and Fisher discriminant analysis, are also widely used in industry [3,4].And several nonlinear approaches, such as kernel methods, Gaussian processes, and adaptive control algorithms, such as reinforcement learning, have been applied in some niche applications [5,6,7].
Despite the longstanding success of many statistical techniques in industry, there is also considerable interest in developing sensing and control technologies based on more recent ML architectures [1,8,9].Broadly speaking, these aspirations are driven by the promises of increased autonomy: increased operational efficiency, consistency, and safety; improved scalability beyond linear methods; upskilling of plant personnel [10].Consequently, this paper addresses the need to dissect and organize the general use of modern ML techniques in industrial applications.In doing so, such a treatment will inform practitioners of the latest research trends and their potential practical impact.Conversely, researchers in core areas will benefit from a holistic view of successful ML techniques and the industrial requirements they satisfy.

Overview and scope
This paper is a significant extension of Gopaluni et al. [11]: in addition to a more detailed and expansive treatment of the literature, we discuss the practical success of various methods.Note that this is primarily a problemdriven survey; however, we have provided sufficient references for interested readers on the underlying methods discussed here.Moreover, we have included additional exposition on some of these methods in the supplementary material.
Hybrid modeling is first introduced to provide a conceptual framework underlying core application areas, namely: 2 1. Soft sensing 2. Process control Process control also includes process optimization.In our survey, we identify several methodological areas of research: statistical learning and machine learning, deep learning and its variants, and reinforcement learning.Algorithms from each of these methodological areas are used to varying degrees among the core applications.Soft sensing encompasses more statistical and machine learning methods, with some discussion of deep learning.On the optimization and control side, we discuss hybrid modeling in tandem with mathematical programming and reinforcement learning.This is by no means an exhaustive survey of the recent research on these topics.However, we have tried our best to include some of the most critical developments of ML tools in the process industries.In that vein, we only discuss methods that have seen industrial use or have received considerable research attention within process systems engineering, either in real life or in simulations.Therefore, speculation about the potential use of very recent developments in the broader ML community, such as ChatGPT or other large language models, is beyond the scope of this paper.However, we provide insight into the practical deployment of ML techniques in the process industries.
This paper surveys a large number of algorithms.Table 1 gives a convenient list to reference across all sections.Throughout this paper, artificial intelligence is the broadest term for classifying machines that aim to mimic human intelligence.It is intended to predict, automate, and optimize the tasks humans have traditionally performed, such as speech recognition, image recognition, decision-making, and translation.Machine learning is an area of artificial intelligence and computer science where algorithms are developed to extract patterns from data and make predictions.Supervised learning is a branch of machine learning comprised of algorithms for determining a predictive model based on labeled data with known outcomes.On the other hand, unsupervised learning is a branch of machine learning devoted to learning patterns from unlabeled data.

Mathematical modeling approaches
The core applications of this paper are soft sensing and process optimization and control.These areas rely on dynamic mathematical models to infer measurements, make decisions, and synthesize controllers.Therefore, before describing the prominent machine learning (ML) techniques in these areas, it is useful to introduce the foundational assumptions and architectures underlying such models.

Knowledge-driven, data-driven, and hybrid modeling
Knowledge-driven (mechanistic or white box) modeling based on first principles and data-driven (or black box) modeling constitute two opposite strategies.Developing mechanistic models requires a deep understanding of the processes at play.It is often labor-intensive, but embodying first principles may enable extrapolation beyond the conditions under which these models are trained.By construction, mechanistic models have a fixed structure and comprise a fixed number of parameters, often with a physical or empirical interpretation.For this reason, they may also be classified as parametric models.
By contrast, data-driven models require little physical knowledge and are fast to deploy or maintain.But a larger dataset is also typically needed for their construction, and their validity may not extend far beyond the conditions under which they are trained.The structure of a data-driven model does not need to be dictated by a priori knowledge but may be tailored to the training data at hand instead.A further distinction is whether a data-driven model tries to describe data with a set of parameters of fixed size, regardless of the size of the training dataset, in which case it is categorized as parametric, or whether its structure and number of parameters may evolve with the size of the dataset, commonly referred to as nonparametric [13,14].The socalled nonparametric regression models fall in the second category, whereby the predictor does not take a predetermined form, using techniques such as nearest-neighbor interpolation, local regression, and Gaussian process (GP) regression.However, the distinction between parametric and nonparametric models in statistical and ML is not without intricacies.For instance, a linear SVM is a typical example of a parametric model, having a fixed number of parameters-a weight for each input dimension.In contrast, RBF-kernel SVM may be considered nonparametric since the number of parameters grows with the size of the training set-a weight for each training point.The basic idea behind hybrid models is to combine knowledge-driven and data-driven models in such a way as to overcome their respective limitations.This strategy is also frequently referred to as gray box or block-oriented modeling in the literature.At the same time, the term hybrid semi-parametric modeling is coined to describe those hybrid models where the data-driven component is nonparametric [15].Multi-fidelity modeling has also developed fast in recent years and is akin to hybrid modeling.The idea is to use a (possibly inaccurate) knowledge-driven model as low-fidelity and correct it with (noisy) process data, considered to be higher fidelity [16].In particular, this strategy has been applied in uncertainty propagation, inference, and optimization and is also instrumental in small data problems (see supplementary material).
It is worth noting that hybrid modeling has been investigated for over 25 years in chemical and biological process engineering [17,18,19,20].The claimed benefits of hybrid modeling in these application domains include faster prediction capability, better extrapolation capability, better calibration properties, easier model life-cycle management, and higher benefit/cost ratio to solve complex problems; see recent survey papers on the development and applications of hybrid models by von Stosch et al. [15], Solle et al. [21], Schuppert and Mrziglod [22], Zendehboudi et al. [23], Ahmad et al. [24], Bradley et al. [25].Hybrid models may be used to enable soft sensors (see Section 3) or model-based optimization and control (see Section 4) in a first principles approach.

Hybrid modeling paradigms 2.2.1. Traditional serial and parallel hybrid models
The usual classification of hybrid model structures is either as serial or parallel [26].In the serial approach, the data-driven model is most commonly used as an input to the mechanistic model (see Figure 1A), for instance, a material balance equation with a kinetic rate expressed using a data-driven model.This structure is especially suited to situations where precise knowledge about specific underlying mechanisms is lacking, yet sufficient process data exists to infer the corresponding relationship [17,20].However, when the mechanistic part of the model presents a structural mismatch, one should not expect the serial approach to perform better than a purely mechanistic approach.In the parallel approach, by contrast, the output of the data-driven model is used to correct the predictions of the mechanistic model [18,19], most often in the form of an additive correction (see Figure 1B).This structure can significantly improve the prediction accuracy of a mechanistic model when the data-driven component is trained on the residuals between process observations and mechanistic model predictions.However, this accuracy may not be better than the sole mechanistic model when the process conditions differ drastically from those in the training set.
Historically, the most common data-driven modeling techniques embedded in hybrid models have been multilayer perceptron (MLP) and RBF-based regression [15].Recent representative applications include the development of a serial hybrid model to predict hydraulic fractures created by injecting fluid into a reservoir that accounts for the leak-off rate of the fracturing fluid using an MLP [27] and the development of a serial hybrid model of the thin film growth process coupling a macroscopic gas phase model described by partial differential equations to a microscopic thin-film model described by stochastic partial differential equations via an MLP [28].Naturally, many other statistical and ML techniques have also been investigated in this context.For instance, Ghosh et al. [29] used subspace identification to construct the data-driven component in a parallel hybrid model and demonstrated the approach on a batch polymerization reactor.Lopez et al. [30] developed a serial hybrid model of a lignocellulosic fermentation process, whereby the glucose concentration is estimated from spectroscopic data using a partial least squares regression model.GP regression has also attracted attention due to its ability to estimate the predictor's variance, for example, in bioprocess engineering applications [31].
Parallel hybrid models can significantly alleviate the issue of maintaining a complex mechanistic model since the data-driven component is trained to capture model mismatch in the first place, possibly in a nonparametric manner.For dynamic systems in particular, a popular approach entails training the data-driven model on the residuals between the predicted and observed states at given time instants [32].Notice that such a data-driven model could either comprise algebraic or differential equations.By contrast, serial hybrid models can prove more challenging to design, especially when the outputs of the data-driven component cannot be observed directly [33].In such a case, training and assessing the performance of the data-driven component requires one to simulate the full serial hybrid model and compare its outputs to the available observations.Identifying the unknown model parameters within such hybrid models has relied on regularized regression techniques, such as LASSO and LARS [34].
Another challenge shared by serial and parallel hybrid modeling paradigms is automatically detecting the best structure for the data-driven component.Generally speaking, minimizing the number of parameters needed to capture the underlying mechanisms is desirable, that is, to neither underfit nor overfit the data.Classical approaches to help discriminate among multiple nonparametric model structures include the Akaike Information Criterion and Bayesian Information Criteria.Willis and von Stosch [35] proposed an approach based on sparse regression and mixed-integer programming to simultaneously decide the structure and identify the parameters for a class of rational functions embedded into a serial hybrid model.Recently, Zhang et al. [36] applied hybrid modeling in combination with sparse identification of nonlinear dynamics [SINDy; 37] to a photo-production bioprocess, whereby a sparse quadratic correction of the kinetic model is identified using mixed-integer nonlinear programming techniques.More generally, there is significant scope for extending sparse and symbolic regression techniques to enable the construction of hybrid models.Notably, the platform ALAMO [38] can enforce constraints on the response variables to incorporate first principles knowledge, thereby revealing hidden relationships between regression parameters that may not be directly available to the modeler.One approach to incorporating such constraints is via semi-infinite programming [39].Another promising direction entails using sum-of-squares optimization techniques to tackle this problem [40,41].

Emerging trends
The traditional hybrid modeling approach has put a mechanistic model at its core.It uses data-driven elements to either describe specific unknown or poorly understood mechanisms or correct the predictions of the mechanistic model.Another way of incorporating domain knowledge and mechanistic models is feature engineering, where the inputs to the data-driven elements are augmented by terms that would also appear in mechanistic models; for instance, think of enthalpy, which is not a measurement but a useful term in energy balances.Hybrid models whereby the mechanistic model is now used as an input to the data-driven component have become increasingly popular in recent years (see Figure 1C).This approach includes physics-informed neural networks where the underlying conservation equations are imposed as extra constraints on the MLP's parameters [42], like the classical orthogonal collocation theory on finite elements using piecewise polynomials [43,44].Co-Kriging techniques have also been developed where a GP trained using data from a mechanistic model is combined with a second GP trained using process data (or a high-fidelity model) [45].Such an approach also enables multi-fidelity modeling using linear or nonlinear autoregressive techniques [46,47] and deep GPs [48], and finding applications, for instance, in the optimization of complex black box simulators and legacy codes.Another body of research has been concerned with learning a dynamic system by accounting for prior information, for instance, the regression of polynomial dynamic systems with prior information using sum-of-squares optimization methods [49].
Since there is no universal framework, a recurring challenge with hybrid modeling is selecting the appropriate paradigm-for example, physics-driven against data-driven backbone, or serial against parallel structure-for a particular application, such as small vs. large datasets or noisy vs. high-quality data.This selection process still lacks a solid theoretical basis, although systematic computational comparisons of various hybridization techniques have emerged in recent years [25].Finally, looking beyond current hybrid models, Venkatasubramanian [1] argued for the development of hybrid artificial intel-ligence systems that would combine not only mechanistic with data-driven models but also causal models-based explanatory systems or domain-specific knowledge engines.Likewise, the mechanistic model could be replaced by a graph-theoretical model, such as signed digraphs, or a production system model, creating entirely new research fields.

Soft sensors in process industries
Soft sensing represents the most fundamental application of machine learning (ML) techniques in the process industries.By extension, optimization and control add complexity to a soft sensing core.As a result, based on our analysis and own experience, soft sensing contains the most industrial penetration of ML applications.We quantitatively analyze which ML methods have seen practical success and which are currently being researched.We offer practical considerations and insights for implementing soft sensors in practice to balance the apparent industrial-academic disconnect.

Motivation for soft sensing
In the process industries, some variables are difficult to measure online due to technological limitations or the high cost of sensors.These variables indicate a product's intermediate or final quality and must be continuously monitored and controlled.In such circumstances, mathematical models are developed using easy-to-measure variables.These models provide a continuous estimate for quality variables in real time.The mathematical models devoted to the estimation of plant variables are called soft sensors [50,51].The process industries, such as refineries, steel plants, polymer industries, or cement industries, remain the dominant users of soft sensors (see Figure 2).
Similar to hybrid modeling, soft sensors can be categorized as knowledgedriven and data-driven.Knowledge-driven soft sensors (or white box models), such as Kalman filters, are based on first principles models that describe the physical and chemical laws that govern the process, such as mass and energy balance equations.In contrast, data-driven soft sensors (or black box models) have no information about the process and are based on empirical observations (historical process data).A third type of soft sensor, called hybrid models (or gray box models), uses a data-driven method to estimate the parameters of a knowledge-driven model.This special combination is closely related to the general concept of hybrid modeling, as discussed in Section 2. For instance, a model may incorporate physics-based simulations and process measurements.

A quantitative overview of soft sensing
Literature was collected by gathering articles published between 2015 and 2023 in relevant journals from publishing houses like Elsevier, Springer, Wiley, Taylor and Francis, MDPI, World Scientific, Hindawi, De Gruyter, AMSE, and IEEE.For the publication search, keywords such as "soft sensor", "virtual sensor" or "inferential model" were used.The statistics shown in Figure 3 were computed based on the collected literature.
These statistics indicate that the research conducted in soft sensing between 2015 and 2023 was primarily focused on data-driven models.This is unsurprising, as data-driven soft sensors can often capture complex and unexplained process dynamics more succinctly.In contrast, knowledge-driven soft sensors require much expert process knowledge, which is not always available.In addition, knowledge-driven soft sensors are difficult to calibrate, especially for complex nonlinear processes.Note that hybrid model-based soft sensors received the least research attention.Data-driven soft sensors can be further categorized based on the learning technique used for modeling.
Tables 2 to 3 show the current trends in the data-driven soft sensing.Table 1 contains the full forms for the acronyms used in Tables 2 to 3. The research in soft sensing has dramatically shifted from statistical to ML methods.Artificial neural networks (ANNs) received the greatest attention among ML methods.The class of feedforward single hidden layer neural networks (shallow networks)-encompassing multilayer perceptron (MLP), GRNN, ELM, radial basis function neural network (RBFNN), wavelet neural network (WNN) in Table 1-have more applications in soft sensing than recurrent neural networks (RNNs) and deep learning.Aside from ANNs, support vector machine (SVM) is the second most widely used ML method for developing inferential models.
Transfer learning is slowly gaining applications in inferential measurements.Transfer learning alludes to the scenario where knowledge gained while performing one specific task is exploited to carry out a different but related task.Especially when data collection becomes difficult in the task of interest, transfer learning still works by sharing information on relevant data in other domains [52].Transfer learning has yet to be applied to the online prediction of process variables.
Static (time-invariant) soft sensors are developed using data from a single operating mode.However, their prediction accuracy degrades over time as the process shifts to a new operating region.Adaptive soft sensors tackle this issue by updating their parameters based on new samples. 3Less than onethird of soft sensors are adaptive, most of which use a just-in-time strategy to update model parameters in response to samples arriving in real time (see Figure 4 and Table 4).Therefore, computationally feasible methods are required.In particular, partial least squares (PLS) is the preferred algorithm   In Table 5, publications on each data-driven technique have been grouped into three categories: publications based on simulation data, publications based on industrial data, and publications that reported industrial implementation.Notice that most soft sensors have been developed and tested on industrial data.Still, only some of them-PLS, MLP, WNN, SVM, relevance vector machine (RVM), Gaussian process regression (GPR) and regression tree (RT)-have made it into actual industrial implementation.Of course, there may be a publication bias for academic examples, as not all real-world industrial applications may be reported on.

Computational cost of soft sensors
The training time refers to the time taken to determine optimal values for the parameters of a soft sensor model.Once the developed soft sensor is implemented online in a distributed control system, it is used to estimate key process or quality variables at regular sampling intervals.The time required to get the estimates is called soft sensing time.[54], slow feature analysis [55], independent component analysis [56], and factorial analysis [57] can be developed in a single iteration, they require relatively low computational time compared to LASSO [58], and GMM [59] techniques, which involve using iterative optimization algorithms to determine the model parameters.In general, ML methods need more computational time than statistical methods [53].Further, the computational complexity of ML methods is influenced by the factors listed below [60] : • Amount of training data.
• Number of features or input variables.
• Type of training algorithm employed.
• Number of layers.
• Number of neurons (size) in layers.
• Type of device used (such as CPU or GPU).The ELM is considered the fastest ML algorithm because it does not have parameters that need to be learned.The second fastest ML method is the GRNN, which has a single learnable parameter (spread or width of a radial basis function). 4Then come RTs and decorrelated neural network ensembles, which can be constructed more easily than shallow neural networks like RBFNN, MLP, WNN, and adaptive network fuzzy inference system (ANFIS).As RBFNN uses hybrid learning (not hybrid modeling)unsupervised learning for the middle layer and supervised learning (linear regression) for the last layer-it is usually faster than MLP, ANFIS, and WNN, which use iterative gradient descent algorithms.SVM is the slowest of the kernel-based ML methods (SVM, GPR, and RVM).Bayesian networks rely on the expectation-maximization algorithm to optimize their parameters, which takes a little more training time than RTs.Dynamic ML methods, such as RNNs, involve more operations than their static ML counterparts, so they require more memory and computational power [61,14].Similarly, deep neural networks (DNNs) often include several layers and hence, contain many parameters.A large amount of training data is necessary to train DNNs.Therefore, DNNs are recognized as the most computationally expensive methods of all the data-driven techniques.

Industry implementation of soft sensors
In industries, soft sensors are developed by in-house control engineers or third-party contractors (service engineers) from service providers such as Honeywell or Yokogawa.These service providers use their own software to build the soft sensors.When the existing technology used by service providers is inadequate to handle a problem or in-house control engineers have no knowledge of other soft sensing algorithms, the industries provide research funding to universities, research organizations, or startups to develop sophisticated soft sensors to model complex nonlinear processes.The following steps outline how soft sensors are developed and implemented in industries.
1.After recognizing a need for a soft sensor application, a team consisting of a panel operator, process engineer, control engineer, and project manager is formed.The process engineer prepares a charter to define the core objectives, scope, responsibilities, and timeline of the project.This outlines the benefits that the soft sensor project can offer.All the benefits are usually quantified in terms of how much money can be saved.For example, this cost-benefit analysis typically involves weighing the upfront costs-hardware, software, consultants-and continued costs-software licenses, in-house domain experts to handle support and maintenance-against anticipated improved revenue and throughput, as well as reduced cost of the soft sensor.Once the team is satisfied with the benefits, the soft sensor project launches.
2. The next step in executing a soft sensor project involves obtaining process knowledge or expert experience knowledge to identify input variables that have a noteworthy influence on output variables [62].The use of process knowledge or expert experience avoids the inclusion of redundant input variables in soft sensor modeling, leading to reduced model complexity and improved accuracy.In the absence of such knowledge, ML algorithms such as LASSO, hybrid LASSO, and ridge regression can be used to identify and remove input variables that have negligible impact on the output variable.
3. The third step entails process data collection and preprocessing.The process data are often abundant but poor in information.This is due to significant disturbances, outliers, and missing values.Soft sensors developed using these data may provide incorrect estimates for quality variables.The outliers and missing values from the raw industrial data should be removed to obtain clean data for developing the soft sensor.
Although it may not be theoretically rigorous, the usual practice is to detect and delete samples with outliers [63].Missing values are treated in the same fashion.This approach ensures that the clean data are free of outliers and missing values.
4. The data collection in industrial settings is often associated with multirate sampling.If the sampling frequency of the input variables is higher than that of the output variable, then it is necessary to synchronize the variables.Down-sampling may be used to deal with the multi-rate sampling problem.In the down-sampling approach, samples of the input variables that do not have the respective measurements of the output variable are removed [53].
5. After the process data are preprocessed, they are split into training and validation subsets.The training subset is used to construct a soft sensor model whereas the validation subset is used to evaluate the prediction performance of the soft sensor model.This is called offline validation.The usual practice is to develop a linear model first.If the linear model cannot produce accurate estimates, then more complex statistical or ML algorithms are used.
6.If the soft sensor model delivers satisfactory performance in the offline validation, it is implemented in a distributed control system.Then the performance of the soft sensor is monitored for some time period.If the soft sensor exhibits poor performance, then modifications are made.This is online validation.For offline and online validation, metrics such as the correlation coefficient and root mean squared error are used to quantify the performance of soft sensors [53].In addition, qualitative analysis is considered to see if soft sensor estimates follow the lab data trend.If the soft sensor estimates are poor, the input data are first examined for possible reasons, such as sensor failures, data transmission problems, outliers, plant shutdowns, and plant upsets.Poor estimates can be characterized by low correlation to lab data, estimates out of the operational range, or significant deviation from lab data.If the input data are good, the following strategies are used to get accurate and reliable estimates: • Retraining of the soft sensor using the latest data.
• Changing the soft sensor modeling algorithm.
• Using a different training algorithm.
• Changing the parameter initialization method.
• Using approaches that can avoid or reduce overfitting.
Regardless of the type of soft sensor, practicing engineers usually follow the above approach to assess the performance of soft sensors.
7. If the online soft sensor consistently provides reasonable results, the soft sensor is used as a measuring device in a control loop.After successfully implementing the soft sensor-based control application, the soft sensor application is handed over to the panel operator.The human-in-theloop aspect described above is crucial in translating research results into practical applications.

Challenges in soft sensor development
Challenges that are often encountered in soft sensor developments are discussed below.
• Lack of labeled data is the main challenge that must be dealt with in order to build good soft sensor models.Quality variables are less frequently measured than easily measurable process variables, such as temperature, pressure, flow rate, and level.A sample of a quality variable is collected once every shift (that is, 8 hours) or 24 hours.Because of the long sampling interval, an insufficient amount of practical labeled data is available.A soft sensor trained with a limited amount of labeled data may not be able to capture the underlying relationship between the input variables and the output variable.To deal with this problem, a virtual sample generation method may be used to obtain estimated output values for the corresponding input data [64].As an alternative, semi-supervised learning may be used to construct the soft sensor.Unsupervised learning algorithms like PCA, autoencoders, stacked autoencoders, or deep belief networks can extract features from unlabeled input data.These features are related to the output variable by any data-driven linear or nonlinear model [14].
• Operating conditions of the industrial process may change depending on the demand for products, prices of raw materials, and so on.A soft sensor developed using data from one operating condition may not perform well when the operating condition changes.In this situation, multimode soft sensors can be used to get accurate estimates [65].
• Soft sensor maintenance is crucial to continuously attain reasonable estimates, as the performance of an online soft sensor may degrade over time.As a result, estimates obtained by a poorly performing soft sensor do not follow lab data trends.To circumvent this hurdle, the soft sensor is retrained with recent data, and deployed online.A more popular approach to maintain the accuracy of the soft sensor is to adopt a bias updating strategy.In the bias updating strategy, the soft sensor outputs are brought closer to the lab data [66].

Data-driven and hybrid modeling approaches for optimization and control
We revisit data-driven and hybrid modeling in the context of solving optimization and control problems.We further introduce reinforcement learning as an emerging paradigm for solving challenging control tasks.In the same way hybrid modeling represents a spectrum between knowledge-based and data-based modeling, model-based optimization, model predictive control, and reinforcement learning all encompass model-based and model-free methodologies.Naturally, these techniques are also compatible with hybrid modeling approaches, offering new challenges and research opportunities.

Model-based optimization
A large number of hybrid modeling applications have been geared towards offline process optimization.Here, a hybrid model is appealing because key operational variables in terms of process performance may be included in the mechanistic part of the model.This is to retain sufficient extrapolation while capturing other parts of the process using data-driven techniques, for example, to reduce the computational burden.Local (gradient-based) or stochastic search techniques have traditionally been applied to solve the resulting model-based optimization problems.But a recent trend has been using complete search techniques to overcome convergence to a local optimum and guarantee global optimality in problems with trained machine learning models embedded, such as multilayer perceptron (MLP) [67,68,69], Gaussian process (GP) [70], or gradient-boosted trees [71].Applications in chemical engineering include the optimization of simple reactor operations and process flowsheets [67] and optimal catalyst selection [71].
It should be noted that developing a data-driven or hybrid model to speed up the optimization of a more fundamental model is akin to conducting a surrogate-based optimization.The latter constitutes an active research area in process flowsheeting, computational fluid dynamics, and molecular dynamics [72].They can be broadly classified into local and global approaches.Global approaches proceed by constructing a surrogate model based on an ensemble of mechanistic simulations before optimizing it, often within an iteration where the surrogate is progressively refined.Several successful implementations rely on MLPs [73], GPs [74,75,76], or a combination of various basis functions [39,77] for the surrogate modeling.Practical applications have been for rigorous design of distillation columns [75,76] and flowsheet or superstructure optimization of chemical processes [74,73].By contrast, local approaches maintain an accurate surrogate of the mechanistic model within a trust region, whose position and size are adapted iteratively.This procedure entails reconstructing the surrogate model as the trust region moves around.Still, it can offer global convergence guarantees, for example, when the surrogates meet the full linearity property [78].Applications of this approach to chemical process optimization include solved-based CO 2 capture [79] and integrated carbon capture and conversion [80].

Model predictive control and real-time optimization
The real-time optimization (RTO) and nonlinear/economic model predictive control (MPC) methodologies use a process model at their core.So far, most successful implementations of RTO and MPC have relied on mechanistic models [81,82,83].But there has been interest in data-driven approaches, which use surrogate models trained on historical data or mechanistic model simulations to drive the optimization.The type of surrogate models used in such data-driven MPC includes MLPs [84,85] and GPs [86,87].However, comparatively little work has been published on embedding hybrid models into MPC to reduce data dependency and infuse physical knowledge for better extrapolation capability [88,89].Teixeira et al. [90] applied batchto-batch optimization to bioprocesses by relying on hybrid models where an adjustable mixture of nonparametric and parametric models represented the cell population subsystem.In the RTO area, Cubillos et al. [91] investigated the use of parallel hybrid models with MLP embedded on the Williams benchmark plant, but then they had to use stochastic search methods to solve the resulting optimization problems.Recently, Zhang et al. [89] took the extra step of using the same hybrid model simultaneously in the RTO and MPC layers and demonstrated the benefits for a simulated CSTR and distillation column.Notice that most of these applications consider serial hybrid models with embedded MLPs to approximate complex nonlinearities in the system.Nevertheless, there is a dearth of industrial or experimental implementations of such technologies to date.
An RTO methodology that exploits the parallel approach of hybrid semiparametric modeling at its core is modifier adaptation [92].Unlike classical RTO, modifier adaptation does not adapt the mechanistic model but adds correction terms-the modifiers-to the cost and constraint functions in the optimization model.The original work used process measurements to estimate linear (gradient-based) corrections [93].Gao et al. [94] proposed combining quadratic regression models trained on available plant data with a nominal mechanistic model to account for curvature information and filter out the process noise.Likewise, Singhal et al. [95] investigated datadriven approaches based on quadratic surrogates as modifiers for the predicted cost and constraint functions and devised an online adaptation strategy for the surrogates inspired by trust-region ideas.Implementations of this RTO methodology for industrial systems include load sharing for gas compressors [96] and solid-oxide fuel cells [97].
More recently, Ferreira et al. [98] were the first to consider GPs, trained from past measurement information, as the cost and constraint modifiers.Using nonparametric regression models to describe the plant-model mismatch in RTO applications makes sense insofar as the mismatch is generally structural.Del Rio Chanona et al. [99,100] developed this strategy further by introducing modifier-adaptation schemes that rely on trust regions to capture the GPs' ability to capture the cost and constraint mismatch.Recently, Petsagkourakis et al. [101] proposed to use co-Kriging to drive the surrogate modeling, where a first (low-fidelity) GP emulating the mechanistic process model is integrated within a second (high-fidelity) GP that is trained using the process measurements.The benefits of using GPs in this context lie in their ability to perform real-time uncertainty quantification and allow chance constraints to be satisfied with high confidence.By and large, these developments share many common grounds with surrogate-based optimization techniques (see Section 4.1), with the added complexity that the process data are noisy and the process optimum might change over time.Finally, it is worth noting that the potential benefits of this RTO technology have been mostly investigated through numerical simulation, which cannot substitute for both experimental and industrial validations and should be the subject of future research.

Reinforcement learning
Reinforcement learning (RL) is a class of numerical methods for the datadriven sequential decision-making problem [102].The RL agent (algorithm) aims to find an optimal policy, or controller, based on industrial process data collected through interactions with its environment.
Note that RL represents a more general class of techniques from hybrid modeling-based optimization.Briefly, RL includes algorithms for synthesizing control policies without explicit reliance on a model of the process dynamics.The supplementary material contains a more precise background on RL; readers are also referred to Sutton and Barto [102].
Finding such a policy requires solving the Bellman equation based on the principle of optimality.However, the equation is often intractable as it ends up with a high-dimensional optimization problem [103].Recent advances in machine learning (ML) enable feature analysis of raw sensory-level using deep neural networks (DNNs).The aid of DNNs facilitates efficient numerical methods for approximately solving the Bellman equation.Therefore, the scalability of RL algorithms has been significantly improved.As a result, so-called deep RL is an emerging technology that has shown remarkable performance in real-world and simulated applications such as robotics, autonomous driving, and board games [104,105,106].
Deep RL has naturally gained attention from the process control community.In this section, we survey applications of RL in process control, and we discuss advances and challenges in RL as they potentially pertain to process control applications.

Reinforcement learning for process control
With high demands on the performance of process systems, efficient optimization is becoming increasingly essential.The ultimate dream goal of any process control system is to develop a controller capable of attaining optimality in large-scale, nonlinear, and hybrid models with constraints, fast online calculation, and adaptation.This ideal controller should be amenable to a closed-loop solution and robust to online disturbances.
Mathematical programming-based control, such as MPC and direct optimization, are popular because they adequately address many of these requirements.Sections 4.1 and 4.2 discuss the mathematical programming paradigm in more detail.RL has been studied in parallel because it has contrasting features compared to mathematical programming methods [107].According to the review and perspective studies of Shin et al. [5], Nian et al. [8], Spielberg et al. [6], Yoo et al. [108], the advantages of RL are that: First, a closed-loop state feedback policy can be obtained for generic stochastic control problems, while an open-loop solution is obtained through mathematical programming approaches.Most of the computation is done offline by learning the policy through offline data or simulation.Assuming that the environment used for offline training is identical to that of the online implementation, the policy is optimal.Second, the mathematical programming formulation for stochastic control problems often becomes prohibitively large to be solved within a decision interval.On the other hand, uncertainties are implicitly or explicitly quantified by the value or policy functions in RL approaches.The trained RL policy can be implemented with minimal online computation required.Third, RL is flexible to varying levels of system knowledge, including model-free, partial model-free, and model-based RL.Table 6 summarizes the comparison between RL and mathematical programming methods.
Several pioneering pieces of work due to Wilson and Martinez [109], Kaisare et al. [110], Peroni et al. [111] proposed applying model-free RL to process control problems over discretized state and action spaces.Qlearning was implemented for the tracking control of a fed-batch bioreactor [109] and free-end maximization problem of a fed-batch bioreactor [110,111].Lee et al. [112], Lee and Lee [113] extended the concept of applying modelfree RL to dual adaptive control and scheduling problems.It was shown that the approximation of the value function could provide robust control despite the presence of process noise and model changes.RL methods that guarantee robustness in dynamic optimization were later studied in Nosair et al. [114], Yang and Lee [115].
Some recent applications of RL rely on a linear approximator to solve optimal control problems with a continuous state space model [116,117,118,119].Especially, Zhu et al. [116] applied a model-free RL variant called factorial fast-food dynamic policy programming to a Vinyl Acetate monomer process.The algorithm improves scalability by breaking down the exponential size of the action space by action space factorization.In the meantime, model-free deep RL applications have become increasingly studied in the process control field.Table 7 summarizes some recent work in this area.In the remaining sections, we elaborate on the use of deep RL in process control.

Practical implementation of reinforcement learning
One promising application of RL is the synthesis of existing control structures [141,142,143,144,145].For example, proportional-integral-derivative (PID) controllers constitute the lowest level of control structures, and augmenting these with RL methods immediately gives practical results.PID tuning is a suitable testbed for RL applications, as there exists a suite of tuning methods and industrial autotuners to benchmark against [136].PID controllers are also standard in practice, meaning the base layer control is not substituted for a more complex strategy, for example, based on DNNs (see Figure 5).
Model-free RL was applied to schedule a set of PID gains obtained a priori Fast.However, performance depends on estimators.
Another application is to construct hierarchical control structures with RL methods.Shafi et al. [149] introduced a two-layer structure for optimizing the bitumen recovery rate of a primary separation vessel.A supervisory RL agent optimizes the recovery rate, while a low-level RL agent computes the interface level actuation.Kim et al. [150] proposed a different type of twolayer structure for a product maximization problem of a fed-batch bioreactor.A model-based RL agent solves the high-level optimization problem, and an MPC tracks the trajectory of the high-level optimizer, rejecting real-time disturbances.
Several studies make a comparison between RL methods based on practical performance criteria.Wang et al. [151] compared 14 model-free and model-based RL algorithms based on the following criteria: nominal performance, sample efficiency (total training time, training time per step), robustness against noise, and asymptotic performance.Lawrence et al. [136] proposed nominal performance, stability, perturbation to the system, initialization, hyperparameters, training duration, practicality, and specialization as key criteria for evaluating RL methods for process control problems.In addition, Dogru et al. [131] used the extent of exploration: the ratio of the visited over the total operational state and action spaces.
It is worth noting that RL implementations on physical systems are sparse.Some works in process control applications are validated on physical systems [147,152,120,8,136,131].These references tend to focus on PID tuning or low-dimensional state/action spaces.A cascaded tank system is also the most common environment.There are several plausible reasons for the lack of real-world RL applications: The added engineering and software development is not always feasible to accommodate; the algorithmic complexity of RL algorithms exacerbates the issue; practical and theoretical problems, such as sample efficiency, convergence, and closed-loop stability, are pressing concerns.Indeed, most deep RL algorithms can achieve impres- Reward-maximizing update sive final performance on complex tasks, but at the cost of extensive hyperparameter tuning and significant variation between implementations [153].In the following section, we highlight a few methods that are geared towards making RL more reliable and scalable: Synthesis between model-based and model-free learning; transfer learning and meta-RL; offline RL.

Challenges and advances in deep reinforcement learning
Applying RL to industrial settings has many practical, technological, and theoretical challenges.We refer to Shin et al. [5], Nian et al. [8] for further reading.Here, we mainly focus on the sample efficiency of RL algorithms.Sample efficiency refers to the amount of data needed to train an RL agent.The supplementary material contains a more general discussion about ML with limited data.
Classical algorithms for value-based methods, such as Q-learning, and policy-based methods, such as REINFORCE 5 , enjoy theoretical convergence.However, convergence can be slow due to high variance in value estimates or limited to the tabular setting or linear function approximation [102].Nonetheless, these methods provide the foundation for deep RL algorithms.Deep RL attempts to scale up RL methods to high-dimensional problems as a synthesis with the deep learning framework.The first notable result is an extension of Q-learning, named DQNs, introduced by Mnih et al. [121].DQNs are limited to discrete action spaces but showed impressive results in tasks with high-dimensional sensory input data, such as Atari games.
More recent algorithms, such as the deep deterministic policy gradient (DDPG) algorithm [125], allow for continuous action spaces.Despite the advances made by DDPG, it is notoriously difficult to use, for example, due to sensitivity to hyperparameters and overestimation of Q-function values [153].This limits the viability of DDPG for real-world applications such as process control, as a physical system cannot be extensively probed.However, the concurrent algorithms, TD3 [130] and soft actor-critic [138], built off DDPG to improve the overall training robustness and sample efficiency.Despite these advances, model-free RL algorithms alone are not sufficiently dataefficient and, therefore, not yet useful in real industrial applications [154].In the rest of this section, we identify several areas of RL research aimed at this issue.
Although formulating a dynamic model can be a bottleneck in the RL algorithm, model-based methods require much fewer interactions with the plant [154].Several model-based RL algorithms have been developed, focusing on solving the continuous-time counterpart of the Bellman equation called the Hamilton-Jacobi-Bellman (HJB) equation.Since they aim to solve the HJB equation adaptively, the methods are called approximate dynamic programming (ADP) [155,156,157].ADP algorithms vary with their levels of model utilization, ranging from heuristic dynamic programming, dual heuristic programming, and globalized dual heuristic programming [158,159].Stochastic optimal control is an extension for handling stochastic differential equations, a continuous-time description for uncertainty.Policy improvement with path integrals (PI 2 ) is a sampling approach to solving the stochastic HJB equation [160].PI 2 has shown remarkable data efficiency and performance for robot learning.
Another line of work has focused on unifying model-free and model-based approaches [161,162].The main motivation is that model-free algorithms often achieve superior final (asymptotic) performance over model-based approaches but suffer from relatively weak sample complexity.Bao et al. [129] utilized ideas from D'Oro and Jaśkowski [161] wherein a dynamics model is used to improve the action gradient estimation of the critic network.While integrating dynamic models into traditionally model-free algorithms has proved promising, these algorithms are designed to train an agent using online interactions on a system-by-system basis.More general strategies aim to reduce the cost of calibrating RL agents to novel environments by utilizing historical datasets, training over many related systems, or transferring previously trained agents to new ones.
Offline RL (sometimes called batch RL) aims to learn an optimal policy from historical data alone [163].Although off-policy algorithms like DDPG can theoretically learn from historical data, online exploration is critical unless constraints are imposed on the learned policy [164].An offline strategy for pre-training RL agents with historical process data, followed by online fine-tuning of the policy, is proposed by Mowbray et al. [134].On the other hand, transfer learning is a framework for speeding up the training of RL agents.By pre-training a policy, such as in a simulation environment, one can use this as the initial policy on the true system of interest.This idea is demonstrated for batch bioprocess optimization [127].One can efficiently mitigate plant-model mismatch by fine-tuning the initial policy on the real system.
Meta-learning, or learning to learn, is a ML strategy for leveraging prior training experience to learn a new "task" quickly [165].Meta-RL is a strategy for training a "meta agent" to synthesize experience from many related systems to adapt its policy to novel systems rapidly.For example, Finn et al. [166] develop a simple and highly influential algorithm for any neural network architecture that directly optimizes for initial parameters such that they can quickly be adapted to new tasks with a small amount of data, showing superior performance over standard transfer learning in classification and RL tasks.Duan et al. [167] propose strategies for learning a latent context variable as part of the meta-policy architecture, thereby capturing the "task" structure and enabling the meta-RL agent to adapt its policy with new process data.This framework is appealing in process control applications because many systems may have a known structure, making training over a distribution of related systems feasible.Consequently, this end-to-end framework removes a model identification step during the online implementation of the RL agent by leveraging prior training experience.Meta-RL has also seen recent applications to process control [168].
While significant strides have been made to make these algorithms more sample-efficient, they are not yet practical.Motivated by this challenge, we have outlined different ways in which models can be integrated into otherwise model-free algorithms.Moreover, meta-RL, offline RL, and transfer learning, while still emerging, are promising avenues for MPC applications.These ar-eas have tremendous potential for applications that can redefine automation in the process industries.

Discussion
Soft sensing and process control encompass statistical learning, machine learning, deep learning, and reinforcement learning to varying degrees.Table 8 shows the respective high-level prominence in these two application areas.Although Table 2 indicates significant interest in the soft sensing literature around deep learning, Table 5 shows methods like PLS and SVM have received the most industrial use.However, the prominent use of industrial data is still promising.Meanwhile, our survey of process control indicates a more significant emphasis on deep learning and reinforcement learning in the literature.Simulation-based studies are commonplace in this context, as discussed in Section 4.3.2. 6able 8 and the above discussion show a duality between sensing and control in the context of machine learning methods.To fully capture the benefits of modern machine learning methods, a unified framework that encompasses modeling, sensing, and control is required.Reinforcement learning is well-suited to bridge the gap between sensing and control through a global reward-based objective (rather than treating prediction and control performance as independent goals).Applications in sensing do not necessarily contradict the model-free nature of reinforcement learning, which is most appealing.Rather, this characteristic makes it versatile for processing and optimizing real system data.To illustrate this point, Xie et al. [171] propose using reinforcement learning for sensing, even though it has typically been described in the context of control.Moreover, Esfahani et al. [172] utilize reinforcement learning for both state estimation and control under a single closed-loop performance objective.
On the other hand, Section 4.3 discussed the complexity of reinforcement learning algorithms.More broadly, deep learning and reinforcement learning algorithms are rife with complexity and hyperparameters, making it difficult to parse their fundamental inner workings [153,173].A promising avenue toward unifying sensing and control is distilling reinforcement learning pipelines and reimagining techniques from other branches of machine learning.Truly robust and powerful methods will follow from such a critical rapprochement of the longstanding statistical learning methods in Table 1 and newer concepts in deep learning and reinforcement learning.An instance of this aspiration in action is by Eysenbach et al. [174], where they show a novel use of binary classification and policy iteration is capable of achieving state-of-the-art performance.

Conclusions
Recent advances in machine learning give us renewed optimism for achieving higher levels of automation in the process industries.To distill this general goal, we have surveyed soft sensing and process control through a practical lens.Soft sensing represents the most dominant area regarding industrial applications of statistical and machine learning techniques.On the other hand, considerable research attention has been given to deep learning applications, but with limited industrial successes.Through synthesizing research trends and industrial requirements, we have strived to enable academics and practitioners alike to develop sophisticated yet practical methods for building better models and controllers.

Figure 1 :
Figure 1: Typology of hybrid models (see von Stosch et al. [15]).A and C represent serial structures: under A, a data-driven model is used as input to a knowledge-driven model; C is the reverse.B represents a parallel structure in which knowledge-driven predictions are corrected by data-driven predictions.

Figure 2 :
Figure 2: Distribution of soft sensor applications.

Figure 3 :
Figure 3: Research publication in soft sensors from 2015 to 2023.

Figure 4 :
Figure 4: Distribution of global and adaptive soft sensors.

Figure 5 :
Figure 5: Application of RL for tuning PI controllers in a lab setting.The policy plays the role of a PI controller and receives updates towards improved performance.J is a general long-term cost function and k p , k i are controller gains.Adapted from [136].
paper.NPL & RBG gratefully acknowledge the financial support of the Natural Sciences and Engineering Research Council of Canada (NSERC) and Honeywell Process Solutions.JML gratefully acknowledges the research facilities for this work provided by the Institute of Engineering Research at Seoul National University.BC gratefully acknowledges funding by the Engineering and Physical Sciences Research Council (EPSRC) under grants EP/T000414/1 and EP/W003317/1.BH, FA and SKD gratefully acknowledge financial supports from the Natural Sciences and Engineering Research Council of Canada (NSERC) under grants IRCPJ 417793-15 and ALLRP 561080-20.

Table 1 :
Full forms for acronyms.Divided into three sections, top to bottom: 1) statistical learning, 2) machine learning & deep learning, and 3) reinforcement learning & control methods.

Table 2 :
Distribution of data-driven methods for soft sensors, split between statistical and ML methods.

Table 3 :
Distribution of various types of ANNs for soft sensors.

Table 4 :
Distribution of statistical and ML methods in local modeling of adaptive soft sensors.

Table 5 :
Breakdown of methods for soft sensors according to the level of industrial applications.

Table 6 :
A comparison of RL and mathematical programming.

Table 7 :
Model-free deep RL applications in process control.Asterisk (*) indicates a model-based modification to the nominal algorithm.Highlighted rows indicate validation on a physical system.