A Bayesian inferential sensor for predicting the reactant concentration in an exothermic chemical process

In many chemical reactors, concentration measurements are conducted off-line in a laboratory, which involve manual work and can therefore be conducted only infrequently. We propose a Bayesian inferential sensor to predict the reactant concentration in the inlet stream of an exothermic chemical process. The inferential sensor is based on the Bayesian inverse approach and the autoregressive integrated moving average (ARIMA) model. It enables the prediction of the reactant concentration at the frequency of automated on-line measurements, which is typically much higher than that of laboratory measurements. We demonstrate the method on real industrial process data from catalytic hydrogenation of aromatic compounds. The predicted aromatics concentration in the inlet stream, generated based on the latest on-line measurements and two-week-old laboratory data, has a coefficient of determination of 0.936 and a root mean square error of 0.654 mass-%.


Introduction
Reliable characterization of process fluids in a chemical plant often requires off-line laboratory sampling.The characterization may include, for example, the density and species composition measurements.As laboratory samplings involve manual work, they can typically be performed only infrequently (e.g., once per day).From the process control and monitoring point of view, such a sampling frequency may be insufficient.In a modern industrial plant, other physical quantities, such as flow rates, temperatures, and pressures, are sampled automatically in various locations, using sampling intervals having an order of minutes, or even seconds.The methods to predict in real-time the quantities measured infrequently off-line or not at all, based on frequent on-line measurements, are referred to as inferential sensors, or soft sensors.
The development of inferential sensors for industrial applications started around three decades ago [1,2].As industrial processing plants are instrumented with a growing number of sensors, the possibilities for inferential sensors are also expanding.At a general level, inferential sensors can be divided into first-principle models and black-box models, as well as their hybrids, gray-box models.The first-principle models are derived using physics-based algebraic equations.Therefore, these models are application-specific, and their definition requires phenomenological knowledge of the process.These models typically assume that the process is in a steady-state condition [1,3].The black-box models, on the other hand, are purely based on obtained measurement data and do not use phenomenological knowledge of the process.Kadlec et al. [1] review the literature on the data-driven inferential sensors (i.e., the black-box models) for industrial processes.Fortuna et al. [4] published a textbook on monitoring and control of industrial processes using inferential sensors.Souza et al. [3] provide a literature review of the methods to design and evaluate regression-based inferential sensors.
In the following, we focus on inferential sensors, proposed in the literature, that have been demonstrated to predict process fluid concentrations.Fortuna et al. [5] propose two neural network-based inferential sensors to predict the stabilized gasoline and butane concentrations in a debutanizer distillation column.Salvatore et al. [6] trained neural networks to predict the sulfur content in the product stream of a diesel hydrotreating unit based on on-line temperature, flow rate, and pressure measurements.Soares and Araújo [7] propose a black-box prediction model that consists of an adaptive ensemble of extreme learning machines.The authors demonstrate the use of the model as an inferential sensor to predict quantities in six industrial processes, in four of which the predicted quantity is a concentration.Osorio et al. [8] propose a neural network-based inferential sensor to predict the ethanol concentration in a batch distillation process using four temperature measurements.Mojto et al. [9] investigate the effectiveness of several data-based inferential sensors (i.e., black-box models) to identify the product composition in industrial distillation columns.Cao et al. [10]  accuracy and reduced model complexity in comparison to reference methods.The authors test their method on the Tennessee Eastman process model [11].Several other inferential sensors (e.g., [12][13][14][15]) have been proposed and tested on the above-mentioned application of the debutanizer distillation column, the data of which is openly available [4].Bidar et al. [16] report a comparison of the prediction accuracy when using different inferential sensors proposed in the literature for the application.
Bayesian inferential sensors take also into account the uncertainty of measurements and other process phenomena.The basic idea is to update the prior distribution of the quantity of interest based on observed data, in order to obtain the posterior distribution.Thus, instead of yielding just a point estimate, the output of a Bayesian inferential sensor is a distribution, from which the uncertainty of the estimate can be assessed.For a review of Bayesian inferential sensors, the reader is referred to the publication by Khatibisepehr et al. [2].In a broader scope, Armstrong and Hibbert [17] review Bayesian methods in analyzing chemistry data.
A few works on Bayesian inferential sensors are worth mentioning here.di Sciascio and Amicarelli [18] propose a Bayesian inferential sensor to determine biomass concentration in batch biotechnological processes.The inferential sensor consists of a method to fill missing data (due to uneven sampling frequencies) and a prediction method based on Gaussian process regression.Khatibisepehr et al. [19] propose a Bayesian method to design an inferential sensor that adaptively switches between models, tuned for different operating modes, and enables the inclusion of prior knowledge.Deng et al. [20] use a Bayesian model updating strategy to calibrate an inferential sensor using the latest measurement data from the plant.Kaneko and Funatsu [21] proposed an inferential sensor where they use the Bayes' rule in an ensemble learning method that combines the predictions from models trained using different segments of historical data.
While inferential sensors are often used to characterize the product stream (e.g., for quality control) [22], our attention in this work is on the determination of reactant concentration at the inlet of an exothermic chemical reactor.The on-line measurement of such a quantity is often costly or unreliable to conduct.The prediction is useful at least for the following purposes.First, it provides real-time information of the state of the upstream processes.Second, the disturbances in the reactant concentration can be quantified and used in feed-forward control of the process.Third, in the case of a catalytic process, the more frequent estimates of the reactant concentration, in comparison to only performing laboratory sampling, can be used to improve the monitoring of the catalyst deactivation.
In an exothermic chemical process, the measured temperature difference over the reactor is a particularly useful observation for the determination of the reactant concentration.However, as the reactant concentration is a causal factor of the temperature difference, the problem is formally described as an inverse problem [23] (i.e., where the cause is determined inversely from an observed effect ).
In this work, we propose a Bayesian inferential sensor to determine the reactant concentration in the inlet stream of an exothermic chemical process, based on on-line temperature, flow, and pressure measurements and past off-line reactant concentration and density measurements.We determine the prior distribution of reactant concentration using the time series prediction method autoregressive integrated moving average (ARIMA) [24].After observing the temperature difference over the reactor, as well as other quantities measured online, we update our belief of the reactant concentration using the Bayes rule and a first-principle thermal model of the reactor.Our work is inspired by Bayesian solution methods to inverse problems studied in the field of heat transfer engineering [25] (e.g., the works by Wang and Zabaras [26,27]).The method we propose in this work is also applicable to an endothermic process.The key requirement is that substantial amount of heat is generated/absorbed.For the sake of brevity, we call the process an exothermic process.
The structure of the paper is the following.In Section 2, we start by providing short introductions to the theory of the main statistical methods used in this work.In Section 3, we describe a simple thermal model of an exothermic reactor, which is based on first-principles, and propose a Bayesian inferential sensor to predict the reactant concentration.In Section 4, we present results of a case study on an exothermic process from the petrochemical industry.Finally, we discuss the proposed method and obtained results in Section 5.

Theory
Our work is based on two statistical frameworks: the Bayesian approach to inverse problems and the ARIMA time series model.We will introduce these in the following two sections.

Bayesian approach to inverse problems
As indicated in the introduction, inverse problems are characterized by the cause being determined by observing the effect.Such problems typically are mathematically ill-posed. 1The often-used deterministic solution approaches handle the ill-posedness of the problem via regularization (see, e.g., the Tikhonov regularization [29] and the truncated singular value decomposition [23]).The purpose of regularization is to ensure the uniqueness of the solution and the stability of the solution procedure.However, these methods usually aim to find a single solution to the problem without modeling the uncertainty of the solution.
In the Bayesian approach to solve inverse problems, the problem is first modeled in terms of probabilities taking into account the sources of uncertainty (e.g., the measurements noise and model inaccuracies).The solution is then sought via statistical inference.While the deterministic approaches yield a single solution, the Bayesian approaches yield a distribution of solutions.
By definition, the cause is at least partially unknown in inverse problems.Furthermore, the cause may consist of multiple unknown variables.Following the taxonomy by Kaipio and Fox [25], let us refer to the cause of interest as the primary unknown and the other causes as auxiliary unknowns.Let us denote these causes by  and , respectively, such that the former is a continuous variable and the latter a vector of continuous variables.Even if we are not interested in the auxiliary unknowns, they are sources of uncertainty and need to be modeled.We denote the effect by , and its measurement noise by , both of which are continuous variables.The additive noise model is then where  is the forward model, defining the mapping (, ) ↦ .
We assume that (, ) and  are mutually independent and that the marginal distribution of  ∼   () is known.This means that the effect  conditioned on the cause (, ) is distributed like , and thus the likelihood function is The Bayes' theorem for continuous variables states that where (, ) and (, |) are the prior and posterior distributions of the cause (, ).The marginal distribution () is a normalizing constant, and thus the posterior distribution is proportional to the numerator on the right-hand side of Eq. ( 3): Substituting Eq. ( 2) to (4) yields

ARIMA
A time series is a sequence of observations, typically recorded at equally spaced time points.Methods of predicting future values of the time series are valuable, e.g., in engineering, economics, and natural sciences.Most of the prediction methods are based on the feature that adjacent observations in a time series are dependent.One such method is the autoregressive integrated moving average (ARIMA) model, which was popularized by Box and Jenkins [24] in the 1970s.The authors present a systematic approach for model identification, parameter estimation, and model checking.In this section, we present a short introduction to the ARIMA model.
Let us consider a series of  successive observations  1 ,  2 , … ,   2 of a stochastic process, and define a deviation variable z at time point  as where  is a reference value.The autoregressive-moving average (ARMA) model is a combination of the autoregressive (AR) model and the moving average (MA) model and is defined as where   , … ,  − is a sequence of independent random errors, also referred to as shocks, and  The ARMA models are applicable to time series that are from a stationary process, i.e., a process having time-independent parameters, such as the mean and the variance.Thus, when using ARMA models, the reference value  is typically defined as the mean of the stationary process.However, many processes are non-stationary.A useful operator for such processes is the differencing operator ∇  , where  is the order of differencing.The first ( = 1) and second order ( = 2) differencing of   are The higher order operators are obtained by continuing the differencing.
If  ≥ 1, ∇  z = ∇    because the reference value  is canceled in the first differencing.Now, by defining a new variable we can write the ARIMA model as (cf.Eq. ( 8)) The ARIMA model is also applicable to time series from a nonstationary process, given that the order of differencing is appropriate.A commonly used notation for such a model is ARIMA(, , ).As an example, ARIMA(1,1,2) model is defined as In order to fit an ARIMA(, , ) model to time series data, we need to choose the parameters (, , ), defining the model structure, and tune the corresponding hyperparameters ,  1 …   ,  1 …   , and 2  , the last of which is the variance of the independent random errors.For this purpose, we use the automated model-selection function auto.arima[30].
The function first determines the parameter  such that it is the lowest integer value that still transforms the data into a stationary time series, using, e.g., the augmented Dickey-Fuller [31] or the Kwiatkowski-Phillips-Schmidt-Shin test [32].Then, the function seeks the combination of parameters  and  that, e.g., minimize the Akaike Information Criterion (AIC) [33], defined as where  is the likelihood of the model with fitted hyperparameters.Thus, the procedure requires tuning of model hyperparameters with all tested parameter combinations (, ).The tuning is conducted by maximizing the log-likelihood, log(), of the model fitting.
Once the parameters (, , ), and the corresponding hyperparameters, are determined, the model can be used to predict future values of the process,   +1 , …   +  , where  is the current time point and   is the number of consecutive future points to be predicted.Assuming that the random shocks are independent and have a zero mean, a standard criterion for the best forecast is the one that minimizes its mean squared error [34,Section 5.8].The -periods-ahead forecast, made at time  , is then of the form (15) The ARIMA model yields also an estimate of the uncertainty related to the forecast.This estimate is based on the estimated variance of the random shocks  2   , which is obtained when fitting the model to the data.The variance of the -periods-ahead forecast error where weights  1 …  −1 are computed from weights  1 …   and  1 …   .The variance increases the further to the future we aim to predict.For more information on time series forecasting, the reader may wish to consult the textbooks by Box et al. [24] and Montgomery et al. [34].

Methods
In this section, we describe the proposed Bayesian inferential sensor.We start by deriving a first-principle thermal model of an exothermic reactor in Section 3.1.We then formulate the posterior density for the reactant concentration in Section 3.2 and describe the steps performed in the Bayesian inferential sensor in Section 3.3.

Thermal model of an exothermic reactor
Let us consider exothermic reactors with two alternative configurations.The first has one vessel with inlet and outlet streams (Fig. 1(a)).On-line temperature and pressure sensors are located at the inlet and outlet streams of the reactor.We denote the temperatures by  in and  out and the pressures by  in and  out , respectively.For the sake of generality, the inlet stream consists of liquid and gas phases, the mass flow rates of which,  l and  g , respectively, are measured separately.
The second configuration is otherwise the same but consists of two vessels, and has a connecting pipe in between them (Fig. 1(b)).The temperatures and pressures are measured also at the outlet of the first vessel and the inlet of the second vessel.We denote these temperatures by  mid1 and  mid2 and pressures by  mid1 and  mid2 , respectively.
We make the following general assumptions on the process and its state.First, we assume that the reactor is in a steady-state condition.Second, we assume that a possible change in the ratio of the liquid and gas phases due to the chemical reaction is negligible.Thus, the inlet and outlet streams are assumed to have the same mass flow rates  l and  g .Third, we assume that the chemical reaction does not notably change the thermal properties of liquid or the gas phase.Fourth, we assume that the walls of the vessels are adiabatic (the heat loss of the connecting pipe in the second configuration is still modeled).Fifth, we assume that the reactant A is in the liquid phase.
Let us start the derivation by writing the enthalpy balance of the two-vessel reactor (Fig. 1(b)) as where Ĥl,in and Ĥl,out are the specific enthalpies of the liquid phase in and out of the reactor, respectively.Ĥg,in and Ĥg,out are the equivalents of the gas phase. is the heat generation of the exothermic reaction. pipe is the heat loss from the connecting pipe to the ambient.In the one-vessel configuration (Fig. 1(a)), the heat loss has the value of  pipe = 0, as the connecting pipe does not exist.The specific enthalpy of the liquid, having the specific heat of  l and density of  l , at location  is We write the enthalpy of gas stream at location  in a slightly different form as where  g and  g are the specific heat and molar mass of the gas, respectively, and  is the gas constant.We note that Eq. ( 18) is equivalent to Eq. ( 19) due to the ideal gas law.Following the steady-state assumption, the enthalpy change in the unit is negligible (   = 0).Substituting Eqs. ( 18) and (19) to Eq. ( 17) and reorganizing the terms yields Let us denote the temperature and pressure changes by  =  out −  in and  =  out −  in , respectively.For the sake of readability, we also introduce a new variable With these symbols, Eq. ( 20) can be written as Let us now express also the heat generation  using other physical quantities.The conservation law for the reactant A in the reactor is where  A is the reaction rate in the reactor (having the unit of mol/s), and ṅA,in and ṅA,out are the molar flow rates of the reactant at the inlet and outlet of the reactor, respectively.Given a molar mass of the reactant,  A , the molar and mass flow rates have the following relationship at the location Assuming again a steady-state condition (  A  = 0), and substituting Eq. ( 24) into Eq.( 23) yields where  in and  out are reactant concentrations at the inlet and outlet of the reactor, respectively.We assume that the heat generation  is proportional to the reaction rate  A as follows where  is the mass-specific heat generation rate of the reaction (having the unit of J/kg) and  A is the molar mass of the reactant.Substituting Eq. ( 25) into (26) yields As the heat loss  pipe is non-zero only in the two-vessel configuration, we present its derivation and the associated assumptions in Appendix.Substituting Eqs. ( 27) and (A.3) (defined in Appendix) into Eq.( 22) yields where  pipe =  mid2 −  mid1 .Finally, we convert Eq. ( 28) into the following function where ,  in ,  out , and  l are uncertain process variables and  = [,  g ,  l ,  g ,  l ,  g ,  pipe , ]  is an array of deterministic process variables and constants, the (measurement) uncertainty of which we assume negligible in comparison to those of the uncertain process variables.Our reasoning is that all variables in  are typically measured using automated sampling, which ensures that a measurement is available at every time point of interest.In contrast, the measurement of uncertain process variables  in ,  out , and  l may be have occurred multiple days/weeks ago.The mass-specific heat generation rate  cannot be explicitly measured (we will describe its estimation in Section 2.1).From this point onward, variable  denotes only the measured temperature difference (over the reactor), whereas function  * is the modeled temperature difference.Eq. ( 29) is referred to as the forward model.The purpose of this work is to determine the reactant concentration at the inlet of the reactor  in based on the measured temperature difference  .As the cause is determined by observing the effect, we use the inversion theory.

Posterior distribution of quantities measured off-line
Let us now write the posterior distribution (Eq.( 5)) by the variables used in Section 3.1.We substitute the generic effect  by  and the generic cause (, ) by (,  in ,  out ,  l , ) in Eq. ( 5), which yields   (,  in ,  out ,  l | ) ∝  e ( −  * (,  in ,  out ,  l , )) It is worth noticing that the symbol  is not included in the prior distribution because it is an array of deterministic variables.We further assume that the mass-specific heat generation rate , the reactant concentration at the inlet  in and outlet  out , and the density of the liquid  l are mutually independent: This assumption will be revisited later in Section 5, where we also present a correlation study between the variables.We know a priori that variables ,  in ,  out , and  l are non-negative.Therefore, we model their prior distributions by truncated Gaussian distributions (with the lower limit of 0), and the likelihood function by a Gaussian distribution: when ,  in ,  out ,  l ≥ 0 and 0 otherwise, where   is the standard deviation of the temperature sensor, and   and   ,  ∈ {,  in ,  out ,  l } are the means and standard deviations of the prior distributions.Due to the truncation of the Gaussian priors, no analytical solution exists for the posterior distribution (Eq.( 32)).Nevertheless, it can be approximated using statistical sampling methods.
In this work, we sample the posterior distribution by the No-U-Turn sampler (NUTS) [35], which is a Markov chain Monte Carlo (MCMC) algorithm that closely resembles Hamiltonian Monte Carlo (HMC).In HMC, the problem of sampling from a target distribution is transformed into the problem of simulating Hamiltonian dynamics [36], which mitigates the inefficiency related to the random walk feature often used in MCMC algorithms (e.g., the Metropolis-Hastings algorithm [37,38] and the Gibbs sampling [39]).HMC involves at least two parameters to be tuned by the user, i.e., the step size  and the number of steps  for which the Hamiltonian system is simulated.The NUTS algorithm has a built-in criterion for stopping the simulations.Therefore, it does not require the number of steps  to be specified by the user.In addition, the algorithm tunes the step size  adaptively while sampling a target distribution.Due to these features, the algorithm is particularly useful for practitioners without expertise in tuning HMC algorithms.The algorithm is shown empirically to be at least as efficient as a standard HMC method [35].

Bayesian inferential sensor
As already described in the introduction, temperature, flow rate, and pressure measurements are typically sampled automatically online at industrial chemical reactors.Therefore, we assume here that the quantities of  ,  l ,  g ,  pipe , and  are measured frequently using automated sampling (AS), whereas the quantities of  in ,  out , and  l are measured infrequently using laboratory sampling (LS) (see the symbols in Eq. ( 29)).We assume symbols ,  g ,  l , and  g to be constant parameters.The directed acyclic graph (DAG) of the Bayesian inferential sensor is depicted in Fig. 2, for a case where each set of LSs is taken simultaneously with an AS.
The mass-specific heat generation rate  cannot be explicitly measured.Therefore, we make a probabilistic prediction of it at the time points of LS.The prediction is generated by statistical sampling of the posterior distribution (Eq.( 32)) and marginalizing out  in ,  out , and  l .The means   ,  ∈ { in ,  out ,  l } of the prior distributions are those sampled at the time point in question.The corresponding standard deviations   ,  ∈ { in ,  out ,  l } are represented by the measurement uncertainty.The mean   is the mean of the estimate at the previous time point of LS, and   is a sufficiently large value, in order to avoid too restrictive a prior distribution for .The procedure ensures that we have a measurement or a prediction of all quantities appearing in Eq. ( 29) at the time points of LS.
At the time points of only automated sampling (AS), we use the ARIMA model to make probabilistic predictions of variables  in ,  out ,  l , and  (the time series values of  are the means of the past predictions  by the Bayesian inversion).We use the resulting predictions as the prior distributions of the variables, parameterized by the means   and standard deviations   ,  ∈ {,  in ,  out ,  l }.We then sample the posterior distribution (Eq.( 32)) and marginalize out ,  out , and  l .The resulting distribution is the final prediction of the reactant concentration  in .
When making a time series prediction by the ARIMA model, we determine the order of the model by the auto_arima function from the Pmdarima library [40].We use the Augmented Dickey-Fuller test [31] to find the smallest parameter  that still transforms the data into a stationary time series with the confidence level of 0.05 (the null hypothesis being that a time series is non-stationary).The auto_arima function then tunes the hyperparameters of the model for all combinations of  ∈ {0, … ,  max } and  ∈ {0, … ,  max }, where  max and  max are the maximum values for parameters  and , respectively.We define the auto_arima function to choose (, ) from the tested combinations based on the corrected AIC (AICc) [41,42], in order to avoid bias when the number of samples is small.As the optimization algorithm to maximize the log-likelihood, we use the limited-memory BFGS with optional box constraints [43], which is the default choice in auto_arima.
The ARIMA model predicts future values of equally spaced time series.In our case, these values are the laboratory samplings.However, we are also interested in the quantities of the laboratory samplings at the time points of only automated samplings.Therefore, we use linear interpolation to extend the probabilistic predictions from the ARIMA model to time points with only automated sampling (Fig. 3).We do this for both the mean and standard deviation of the prediction.The standard deviation of the latest measured value corresponds to the measurement uncertainty of the sampling.Algorithm 1 summarizes the steps of inferential sensor conducted at a given time point .

Case study: Catalytic hydrogenation of aromatic compounds
We evaluate the methodology, proposed in Section 3, on a dataset recorded from an industrial process of catalytic hydrogenation of aromatic compounds.The process is an exothermic reaction A + H → B, where a set of aromatic compounds A are hydrogenated (Hydrogen being denoted by H) at a catalyst surface to yield a set of de-aromatized compounds B [44].The reaction occurs in a fixed-bed reactor with arrangements of one to two tubular vessels installed in series 3 (similar to Fig. 1).H is fed to the process in abundance.The flow rates of H and hydrocarbons,  g and  l , are measured separately using automated sampling.Accordingly, the temperature and pressure difference over the reactor,  and , are measured by automated sampling.The mass concentrations of A (in the stream of hydrocarbons) at the inlet and outlet of the reactor,  in and  out , as well as the density of liquid phase at the inlet,  l , are measured using laboratory sampling.When operating such a reactor, the catalyst deactivates due to phenomena such as poisoning, fouling, and thermal degradation [45].Our model does not explicitly consider catalyst deactivation but takes into account the residual aromatics at the outlet stream, the increased concentration of which can be caused by catalyst aging.We refer the reader to the paper by Haitsiukevich et al. [44] for modeling of the catalyst activity in the same process.
The dataset contains measurements recorded during four years of operation.The automated measurements (i.e., temperatures, pressures, and flow rates at the different locations of the reactor) are recorded once per hour.The laboratory measurements of the inlet aromatics concentration  in and density  l are conducted roughly once in 14 days, whereas the outlet aromatics concentration  out is measured roughly once in seven days.We use the word 'roughly' as these sampling intervals have some variability (e.g., the operation period includes maintenance shutdowns and some laboratory samplings were conducted with an interval of 21 days).Nevertheless, when making the predictions by the ARIMA model, we treat the past data as if the sampling interval was even.We use the upper limits of  max = 3 and  max = 3 for the number of autoregressive and moving average terms, respectively.Furthermore, in order to ensure the robustness of the method, we define the order of the ARIMA model to be (0, 0, 0) (i.e., a white noise model) if the time series has less than 10 samples.
We use an implementation of the NUTS algorithm in the probabilistic programming library PyMC3 [46] to sample the posterior distribution in Eq. ( 32), using the following parameters.First, we perform 1000 tuning steps to adjust the step size  (for these steps we define the target accept rate to be 0.95).Second, we sample 1000 draws from the posterior distribution.Both the tuning steps and actual draws are conducted at every time point , when a new prediction of  or  in is made.We extract the following metrics from the draws: (i) the mean of the sampled draws is the mean of the prediction, (ii) the 15.87% and the 84.14% fractiles of the draws are the lower and upper bounds of the one-sigma confidence interval (i.e., there is a confidence of 68.3% that the value is within the bounds).

4.
The mean of the predicted versus the actual reactant concentrations at the inlet of the reactor,  in .The subfigures (a) and (b) correspond to predictions with two-and four-week-old laboratory samplings, respectively.The predictions are plotted separately using the ARIMA and the ARIMA+BI methods.Labels 'min' and 'max' are the extreme values measured during the operation period of four years (for confidentiality reasons, we cannot disclose the actual values).

Table 2
Prediction accuracy of the aromatics concentration at the inlet of the reactor,  in , before (i.e., the ARIMA model) and after observing the automated measurements (i.e., the ARIMA+BI model).The reference method always predicts the previous measured value of the aromatics concentration.The dataset contains a total of 97 laboratory samplings of the inlet aromatics concentration  in .In order to assess the prediction accuracy of  in , we use the rolling-origin evaluation method [47].The origin refers to the last sample of the time series that belongs to the training set.All data after the origin belong to the test set.The rollingorigin evaluation is an out-of-sample evaluation method, in which new predictions are successively generated while moving the origin forward.We generate successive predictions for the aromatics concentration  in using at least  time units old laboratory samplings.We consider two values for this minimum age of the data.First,  = 13 days represents the typical two-week sampling interval of the inlet aromatics concentration.Second,  = 27 days represents a situation where the latest inlet aromatics concentration is missing.We have subtracted one day from both values, due to the intraday variation in the sampling times.For brevity, we refer to these values still as  = {2, 4} weeks in the remainder of this paper.For automated samplings,  is always zero (i.e., we use the latest observations).Due to shortages of time series values at the beginning of the period and after a maintenance break, we study a reduced number of 92 pairs of predicted and actual values of the inlet aromatics concentration  in .

Method
Fig. 4 shows the correlation plots of the actual values and the predictions from the ARIMA model and the Bayesian inverse approach (abbreviated as ARIMA+BI) with  = {2, 4} weeks.Table 2 lists the corresponding root mean square error (RMSE) and the coefficient of determination  2 .The table also lists the prediction accuracy of a reference method, which always predicts the latest sampled value for the inlet aromatics concentration  in .Such a method is often used in practice.
The results indicate that the prediction accuracy of the ARIMA method is only marginally better than that of the reference method.
Nevertheless, the prediction accuracy improves significantly when updating the prediction from the ARIMA model by the Bayesian inverse approach (ARIMA+BI).Based on both metrics, the prediction accuracy of all three methods decreases slightly when using four-week-old laboratory samplings instead of those that are two-week-old.
Fig. 5 visualizes the predictions of the correlation plot in Fig. 4(a) as a time series, also showing the one-sigma confidence intervals for the predictions.In addition to improving the prediction accuracy, the Bayesian inverse approach also reduced the uncertainty of the predictions.At nearly all time points, it adjusts a prediction towards the ground truth.However, we also observe overshooting at a few time points (see, e.g., the predictions at around 40 months).
As mentioned earlier, the purpose of the Bayesian inverse approach is to predict the reactant concentration at the inlet of an exothermic reactor in between the laboratory samplings.Fig. 6 visualizes the predictions made in between the laboratory samplings at an interval of one day during two representative time periods.Here, we use laboratory samplings immediately when they are available ( = 0).The linear interpolation (see Section 3.3) is visible in the predictions from the ARIMA model.The predictions indicate the aromatics concentration to fluctuate in between the samplings during the first time period (Fig. 6(a)).In contrast, the variation in the predicted aromatics concentration is smoother during the second time period (Fig. 6(b)).However, a temporary reduction in the concentration can be seen on the 10th day.It is worth noticing that during these time periods the predictions just before a new laboratory sampling shift gradually towards the sampled aromatics concentration.

Discussion
Let us revisit the assumptions of this work that we consider the most critical.First, we assume in Section 3.1 that the reactor is in a steadystate condition.Arguably, a rapid change in the reactant concentration at the inlet of the reactor would temporarily cause  A  and   to be non-zero.While the former is difficult to monitor by automated sampling, the latter could possibly be taken into account by including a signal of the time derivative of the average reactor temperature into the thermal model (assuming that the liquid mass and its thermal properties remain the same inside the reactor).Second, also in Section 3.1, we assume that the walls of the reactor are adiabatic.The walls of the studied reactor (Section 4) are well-insulated.However, if the heat loss through the reactor walls is considered significant in some other  application, it should be taken into account in the thermal model.Third, in Section 3.2, we assume that the mass-specific heat generation rate , the reactant concentration at the inlet  in , and the outlet  out , and liquid density  l are mutually dependent.For the case study, the pairwise Pearson correlation coefficients between these variables are shown in Table 3.The absolute values of the correlation coefficients are in general small.However, the value of 0.524 between variables  in and  l can be considered as a moderate correlation.Nonetheless, the good prediction accuracy of the ARIMA+BI method (see Table 2) supports the validity of the assumptions made in this work for the studied application.
In Section 3, we use the rolling-origin evaluation method to determine the RMSE and  2 metrics.These metrics are determined based on 92 pairs of predictions and actual measurements.Although the dataset is relatively small, its collection during normal operation of the plant lasted four years, illustrating the difficulty of collecting relevant industrial data for scientific purposes.The predictions made in between the laboratory samplings, shown in Fig. 6(a), cannot be directly verified.Nevertheless, we presume the metrics obtained by

Fig. 3 .
Fig. 3. Linear interpolation in between the latest measured value and predictions obtained by an ARIMA model.

Fig. 5 .
Fig. 5. Probabilistic predictions of the inlet aromatics concentration  in by the ARIMA and ARIMA+BI models.The predictions are visualized by showing their mean (horizontal tick at the center) and the one-sigma (68.3%) confidence interval (vertical line).Labels 'min' and 'max' are the extreme values measured during the operation period.

Fig. 6 .
Fig. 6.Probabilistic predictions for time points in between laboratory samplings during two representative time periods.
use causality analysis to select relevant measurement signals for an inferential sensor, which yields improved prediction https://doi.org/10.1016/j.chemolab.2023.104942Received 17 February 2023; Received in revised form 21 August 2023; Accepted 22 August 2023 Algorithm 1 inferentialSensor() if LS & AS occur at  // estimate the prevailing mass-specific heat generation rate    (| ) ← sample Eq. (32) and marginalize out  in ,  out , and  l return None else if AS occurs at  // construct the time series forecasts for variable  ∈ {,  in ,  out ,  l } ARIMA(, , ) ← fit an ARIMA model to past data (measured/estimated) on    ,   ← predict the next value of  using ARIMA(, , ) and interpolate at  // predict the reactant concentration  in   ( in | ) ← sample Eq. (32) and marginalize out ,  out , and  l return   ( in | )

Table 1
Model parameters and measurement uncertainties.For confidentiality reasons, we cannot enclose some of the values in the table (marked by N/A).
Table 1 lists the model parameters and measurement uncertainties.The used standard deviations are conservative estimates of the measurement uncertainty.They are based on measurement standards and discussions with plant engineers.

Table 3
Pearson correlation coefficients pairwise between variables: mass-specific heat generation rate , inlet aromatics concentration  in , outlet aromatics concentration  out , and liquid density  l .