Multi-objective optimisation with hybrid machine learning strategy for complex catalytic processes

.

• A hybrid machine learning strategy is developed for complex chemical processes.• Continuum lumping kinetic is embedded into the neural network framework.
• NSGA-II is used for multi-objective optimisation of the hydrocracking process.Catalytic chemical processes such as hydrocracking, gasification and pyrolysis play a vital role in the renewable energy and net zero transition.Due to the complex and non-linear behaviours during operation, catalytic chemical processes require a powerful modelling tool for prediction and optimisation for smart operation, speedy green process routes discovery and rapid process design.However, challenges remain due to the lack of an effective modelling and optimisation toolbox, which requires not only a precise analysis but also a fast optimisation.Here, we propose a hybrid machine learning strategy by embedding the physics-based continuum lumping kinetic model into the data-driven artificial neural network framework.This hybrid model is adopted as the surrogate model in the multi-objective optimisation and demonstrated in the benchmarking of a hydrocracking process.The results show that the novel hybrid surrogate model exhibits the mean square error less than 0.01 by comparing with the physics-based simulation results.This well-trained hybrid model was then integrated with non-dominated-sort genetic algorithm (NSGA-II) as the surrogate model to evaluate and optimise the yield and selectivity of the hydrocracking process.The Pareto front from the multi-objective optimisation was able to identify the trade-off curve between the objective functions which is essential for the decision-making during process design.Our work indicates that adopting the hybrid machine learning strategy as the surrogate model in the multi-objective optimisation is a promising approach in various complex catalytic chemical processes to enable an accurate computation as well as a rapid optimisation.

Introduction
CO 2 in the atmosphere is now reaching levels 412.5 ppm in 2020, which is 50% higher than before the industrial revolution [1].Global warming is likely to occur in conjunction with elevated atmospheric CO 2 .An immediate action aiming at reduction of emissions to achieve a net zero transition is required to mitigate the increase of average global temperature at 1.5 • C by 2050 as stated in the Paris agreement [2].Due to this urge, many world-leading oil and gas companies have committed to working in collaboration with energy industry partners and associations, government and regulators to switch the fossil fuels to renewable energy and aiming to become a net zero emissions by 2050 [3][4][5].
Catalytic chemical processes are the core of the business for both traditional oil and gas as well as the renewable energy sectors, where petroleum, biomass or other renewable feedstocks are converted into value-added energy and chemical products catalytically.Catalytic chemical processes involve a great number of complex and closely interconnected thermochemical process.For example, one of the most prominent technologies, biomass gasification, involves a complex chemistry and a large variety of reactions, such as Boudouard reaction, steam gasification, hydrogasification, oxidation reactions, water-gas shift reaction, methanation reactions and steam reforming reactions [6][7][8].Besides, catalytic chemical processes are highly complex and affected by many non-linear and underlying interactions.For instance, in hydrocracking, numerous factors such as the feed oil quality, catalyst degradation, and the operating conditions will affect the reaction process and product yields [9,10].In addition, there are multiple economic and environmental process metrics that must be considered during an optimisation of the catalytic chemical process.For example, in co-pyrolysis process, high yield of energy recovery cannot be improved without having a detrimental effect on the cost of catalyst [11,12] and carbon footprint [13].Due to the complexity and non-linear behaviour, it is still challenging to predict and optimise the catalytic chemical processes in a real industrial environment.There is a high desire to develop a novel modelling method empowered by machine learning which can integrate well with the multi-objective optimisation algorithms for a smart operation, sustainable science discovery and fast preliminary screening process in the catalytic chemical processes.
In the multi-objective optimisation of catalytical chemical processes, the actual industrial operations require concurrent optimisation of several design objectives which are often conflicting in nature.A powerful multi-objective optimisation algorithm called non-dominatedsort genetic algorithm (NSGA-II) has been designed to converge on the Pareto front using a Pareto ranking and crowding distance computations [14].Recently, this algorithm has been employed in the self-optimisation of the catalytic Sonogashira reaction to efficiently identify the optimal operating conditions in a multivariate parameter space [15].Besides, NSGA-II is a highly flexibility algorithm where it can integrate with other algorithms to allow a rapid and efficient optimisation.For example, NSGA-II was combined with deep learning [16,17], response surface methodology [18] and artificial neural network [19][20][21] to optimise the performance of catalytic energy devices.Notably, the algorithm is able to identify the trade-off curve between the objectives which aid the active learning process during process design.However, the optimisation work could not be robustly done without having an efficient and fast surrogate model.
A surrogate model is an approximate model that maps the process inputs to an objective function.They provide an inexpensive and faster alternative to suggest the next point of evaluation instead of the highthroughput experiments which is costly and time consuming.The recent development of artificial intelligence has created huge opportunities to adopt machine learning as a robust surrogate model [22,23].A well-established machine learning algorithm, known as artificial neural network (ANN) is one of the popular surrogate modelling approaches apart from support vector machines [17] and Bayesian network [24].ANN is also a powerful machine learning tool to effectively resolve the relationship between the process parameters and the products yield and quality.Studies have proven the capability of ANN approach in the prediction of product yields of catalytic chemical processes [25][26][27].However, the ANN is a data-driven 'black box' method where the descriptions of the reaction process in the ANN model is not clear with poor interpretability.Besides, this method depends heavily on the quality and quantity of data.Thus, the model could be easily over or under fitted, resulting in poor prediction and generalisation ability.
Lumped kinetic modelling is one of the most common physics-based methodologies used to describe the catalytic chemical process [9,28,29].Such classical 'white box' method is often derived from theoretical analysis of process principle and can effectively reflect the reaction phenomena.The two main lumping modelling strategies applied to the catalytic chemical process are the discrete lumping method [30][31][32] and continuum lumping kinetic method [9,33].Compared to discrete lumping method, continuum lumping kinetics describe the reaction rates with similar carbon number, rather than describing the kinetics of each single compound.This makes continuum lumping kinetics more flexible and being a more generic model compared to discrete lumping kinetics [34].This lumping method effectively simplifies the complex catalytic chemical process and makes them much tractable, which is of great significance to optimise the reactive system.In reality, the actual multi-physical catalytic chemical system may involve numerous interactions amongst these sub-processes.Such complexity will lead to long modelling cycle.In addition, the lumped kinetic model requires simplified assumptions and neglects some factors selectively which will

Table 1
The values of the kinetic parameters at different temperature, pressure and H 2 : Feed adopted from [9].lead to the loss of model accuracy.Therefore, the condition changes in the catalytic chemical process are difficult to tracked in time, which results in the ineffectiveness of this physics-based model.Adopting the pure data-driven framework or lumped kinetic method in the catalytic chemical process will lead to the loss of model efficiency due to these intrinsic limitations in both approaches.A technique that can harness the advantages of both black and white box while reducing their shortcomings is desirable for the optimisation of complex catalytic chemical process with understood subdomains.Here, we proposed a novel modelling strategy known as grey box, or hybrid model, to combine the advantages of domain knowledge and empirical information that enables better data correlation guided by a clear reflection of the process principles.It therefore significantly improves the efficiency and accuracy in the prediction and multi-objective optimisation in the catalytic chemical process.A validated continuum lumping model generates the sufficient amount of data required for the training of an ANN algorithm.A training database developed from a continuum lumping kinetic model allows the ANN algorithm to quickly predict the process performance with the given input parameters, making up for the inadequacy of the physical model in term of a fast response.With the adoption of a NSGS-II, a multi-objective optimisation of the system regarding the target parameters can be further realised.Based on this idea, the present paper demonstrates the concept of hybridizing ANN algorithm with continuum lumping kinetics model to predict and identify the trade-off curve between objective functions through the multi-objective optimisation of NSGA-II in one of the most remarkable examples from the catalytic chemical processes, known as hydrocracking process.The results from this work will contribute positively to the energy and AI research community.Apart from the catalytic chemical process, the method of integrating a physics-based model and datadriven approach also generally applicable in renewable energy technologies, fuel generation systems and decarbonisation processes.

Model development
Fig. 1 shows the workflow of the proposed scheme.Firstly, a hybrid surrogate model was developed by embedding the classic physics-based model into a data-driven machine learning framework.In the physicsbased model, i.e., continuum lumping kinetic model was employed to compute the product concentration of hydrocracking from the linear relation of process parameters and kinetic parameters [35][36][37].For the validation, the simulation result from continuum lumping kinetic model was compared with our existing experimental data [9].Meanwhile, a dataset of the inputs and outputs was generated from continuum lumping kinetic model.Using this dataset, an artificial neural network (ANN) was utilised to train and resolve the relationships between the process parameters and product distribution of hydrocracking process in the machine learning framework.New outputs were generated using the hybrid model, and parametric study and conversion were conducted.Finally, the trained hybrid model was used as the surrogate model in the optimisation of the hydrocracking performance with an evolutionary algorithm, non-dominated-sort genetic algorithm (NSGA-II).

Development of hybrid surrogate model
Our hybrid surrogate model was developed from the physics-based continuum lumping kinetic model and data-driven artificial neural network.

Continuum lumping kinetics
A continuum lumping kinetic (CLK) was utilised in the modelling of hydrocracking process.This method is used to describe the kinetic behaviour of lump of chemically-similar species.For example, in the hydrocracking of paraffin, the disappearance rate of total compounds having similar carbon number was measured, rather than describing the kinetic of each single compound.
Characterising the reactants represents the starting point in the continuum lumping methodology.Here, the properties of feedstock mixtures are characterised through the chain length of its components, which also unequivocally identified by their carbon number.The procedure was developed originally by Laxminarasimhan et al. [33] and later adopted by Mustafa et al. [9] for the hydrocracking process, where the true boiling point and carbon chain length are used, respectively as the continuous label [9,33].
The normalised chain length of each species is defined as: where L s and L l represent the shortest and longest chain in the reaction mixture.
Here, k max represents the rate constant for the species with the longest chain.The mass balance equation for the species with reactivity, k can be expressed as follows: A yield distribution function, P(k, K) has been introduced to determine the amount of formation of the species with reactivity k from the species with reactivity K greater than k.c(k, t) describes the concentration of species with a reactivity k.D(k) is the species type distribution where it represents the transformation from the space L (chain length, or carbon number) to the space k.The distribution transforms the discrete distribution of hydrocarbons into a continuous distribution, expressed by the following relation between genetic component i and θ: where N is the number of species.A simple power-law relation is one which has been widely used to describe the relationship of k and θ: where α is a model parameter.
D(k) is a power-law relation where it describes the reactivity increases as the carbon number increase.D(k) is obtained by substituting the derivative of Eq. (4) into Eq.(3) as follows: which clearly shows the role of k on the distribution.N is the total number of species in the mixture.α and k max are the equation parameters.
The yield function, P(k, K) is described as follows: where a 0 , a 1 are model parameters which determine the location of the peak in the interval k ∈ (0, K). δ is a small finite quantity that accounts for the possibility that P(k, K) could take a small finite value when k = 0.In this work, the reactants are assumed to undergo the cracking only.Since the carbon number is chosen as the label, therefore, it is impossible to distinguish between isomers.A quadrature algorithm method was used to evaluate the integral part in the main equation, Eq. ( 2) and the differential equation was solved by using Runge-Kutta method.
A linear equation between process parameters and kinetic parameters was employed based on the previous work from the work of Mustafa et al. [9].The kinetic parameters are demonstrated in Table 1.CLK model generated 625 datasets where each set composed of 4 studied operating condition and their corresponding concentration of 70  hydrocracking products ranging from C 1 to C 70 .This creates a data matrix of 625 × 74.

Artificial neural network
Here, an one-hidden-layer back-propagation artificial neural network (BP-ANN) was adopted with four inputs and seventy outputs.The inputs are process parameters that significantly affect the hydrocracking process, i.e., the temperature, pressure, hydrogen to feed ratio and weight hourly space velocity (WHSV).The outputs are product concentration after different process parameters applied in the hydrocracking simulation ranging from carbon 1 to carbon 70.The single hidden layer contains 10 nodes, with sigmoid function as the activation function.The cost function MSE (mean square error) is used to measure the difference between the real value and predicted value, and then, the parameters are updated in a back-propagation process.The MSE is defined as follows, where y i and ŷi represent CLK simulation results and ANN prediction results respectively, and the n stands for the number of samples.A 625 × 74 data matrix is split into two sets with a ratio of 0.8:0.2, the former of which is used for ANN training and the latter for testing.A desired error and learning rate are set as 0.01 and 0.05, respectively.After 1636 iterations, good prediction results of ANN can be obtained and readily to be utilised as the surrogate model in the optimisation.

Development of optimisation algorithm
Compared to other multi-objective optimisation algorithms such as simulated annealing and Bayesian optimisation, genetic algorithm has more advantages because of its ability in parallelism and larger set of solution space which make it well-suited for this case study [38,39].Non-dominated sorted genetic algorithm (NSGA-II) was adopted in the multi-objective optimisation.The previous well-trained ANN model was employed as the fitness function as follows,  Here, the aim of the optimisation was to optimise the yield and selectivity of the high-value ends (HE) product from C 5 to C 22 , which includes naphtha, kerosene and diesel.The design variables which were optimised are temperature, pressure, hydrogen to feed ratio and WHSV.
The key parameters of NSGA-II are the population size, number of generations, crossover distribution index, mutation distribution index, tournament size, the values of which are 100, 100, 20, 20 and 2, respectively.

Experimental background
The hydrocracking of Fischer-Tropsch waxes was conducted by us and presented in previous literature [9].The hydrocracking experiments were carried out in a bench scale tickle bed reactor operated in down flow mode.The reactor was filled with 9 g of powered catalyst with 0.625 mm average particle size.Liquid and gas products were both analysed by gas chromatography, GC HP-5890.The feed used throughout the experiments was a Fischer-Tropsch wax mixture ranging from C 5 up to C 70 .The operating ranges of a conditions were temperature (324-354 • C), pressure (40-54 bar), H 2 : feed ratio (0.06-0.15 kg/kg) and WHSV (1-3 h − 1 ).The catalyst was a typical bifunctional system made up of Platinum (0.6%) loaded with amorphous silica-alumina.

Continuum lumping kinetics model validation
For validation, the continuum lumping kinetic (CLK) model adopts the same operating conditions in accordance with the experiments, and the product distribution from CLK simulation and experiment are compared at an operating temperature of 324 • C, as indicated in Fig. 2. The detailed operating conditions used in the model are provided in Table 2. Fig. 3 has proven the reliability of CLK-generated data and readily to be utilised as the training dataset for the ANN model.

Hybrid model validation
During the training of hybrid model, the errors between physicsbased model results and hybrid model data is minimised until the desire mean square error of 0.01 is achieved.Here, the actual data represents the simulation data from physics-based continuum lumping kinetics.Fig. 3(a) shows the training progress for the first instance of data, where the mean square error of 0.009992 and its corresponding correlation coefficient of 0.990008 were obtained after 1636 iterations.The mean square error function behaviour over the 500 training data were demonstrated in Fig. 3(b).The average final correlation coefficient and mean square error are 0.990038 and 0.009962, respectively.The result of hybrid model and simulation data from the physics-based CLK model were compared and represents in Fig. 3(c).It has verified that the hybrid model is ready to be adopted as the evaluation function in the multi-objective optimisation empowered by NSGA-II.

Parametric study
The parametric effects on the significant parameters such as temperature, pressure and hydrogen to feed ratio were studied in the hybrid model.

Effects of temperature
The hybrid model had been utilised to study hydrocracking performance at different temperature from 324 to 354 • C, with 20 interval points, whilst the other operating conditions are kept constant (Pressure = 47.5 bar, H 2 : Feed ratio = 0.105 kg/kg, WHSV = 2 h − 1 ).Temperature was found to significantly affect the paraffin conversion and product distribution, as shown in Fig. 4. The hydrocracking conversion is defined as according to the following equation: Based on Fig. 4(a), as the temperature is increase, the general observed trend shows that the concentration of all components with longer chain decrease whilst the concentration of components with shorter chain length increases.A higher temperature results in higher rates of hydrocracking.The paraffins modelled conversion is reported in Fig. 4(b).The conversion increases approximately of 26% when the temperature increases of 30 • C in agreement with evidences reported in the literature [9,40].

Effects of pressure
The effect of the pressure on the product distribution and paraffin conversion are presented in Fig. 5(a) and (b), respectively.Different reactor pressures from 40 bar to 54 bar with 20 interval points were studied while the other operating parameters are kept constant (T = 342 • C, H 2 : Feed ratio = 0.105 kg/kg and WHSV = 2 h − 1 ).As the pressure increased, the concentration of the hydrogen increases which in turn increases the rate of hydrogenation rather than the rate of cracking.Besides, when the pressure increases, the concentration of the lighter components increases, making such components susceptible of cracking again.This means that components having length chain between C 13 and C 20 are cracking again to produce components with lower length chain.From Fig. 5(b), the conversion of C 22+ decrease with increasing the pressure after 45 bar due to the increase of the fugacity of hydrogen that negatively affects the dehydrogenation equilibrium of the feed [41].In other words, increasing the pressure increases the fugacity of hydrogen that affects the dehydrogenation equilibrium of the feed and then the total conversion [42,43].

Effects of H 2 : feed ratio
Model results for different H 2 : feed ratio from 0.06 kg/kg to 0.15 kg/ kg with 20 points in between at T= 342 • C, P= 47.5 bar and WHSV = 2 h − 1 are shown in Fig. 6(a).The reactivity of the components with longer chain increases as the hydrogen-to-hydrocarbon ratio increase.This results in the increase of the conversion when the ratio of hydrogen to feed increases which can be observed from Fig. 6(b).Dividing the components into groups based on the carbon number, namely C 1 -C 4 (fuel gas), C 5-C 9 (naphtha), C 10-C 14 (kerosene), C 15-C 22 (diesel), and C 23+ (residue) gives a better understanding on the behaviour of the mixture.Fig. 6(c) shows how the increase of the hydrogen to feed ratio affects the hydrocracking, when the product is divided into five groups.Increasing the hydrogen to feed ratio, the weight percent of four groups (namely, fuel gas, naphtha, kerosene, and diesel) will increase, whilst the weight percent of residue decreases.

Multiobjective optimisation using hybrid surrogate model and NSGA-II
NSGA-II is an elitist genetic algorithm which solves multi-objective optimisation problems using Pareto ranking and crowding distance computations.This algorithm is combined with the well-trained hybrid surrogate model as the acquisition function which are optimised instead of the real hydrocracking process to suggest the next point of evaluation.
In industrial practice, maximisation of kerosene, diesel and naphtha and minimisation of off-gas, liquified petroleum gas (LPG) and H 2 consumption are the desired in the hydrocracking process [44].Hence, a compromise is often needed between valuable products yield and H 2 consumption or production rate of light-ends.In this work, the maximisation of high-value end (HE) products and minimisation of low-value end (LE) products is studied in term of yield and selectivity, similar to the optimisation targets in the literature [44].Here, the yield is defined as the sum of HE products, including the naphtha (C 5-C 9 ), kerosene (C 10 -C 14 ) and diesel (C 15 -C 22 ) over the consumed reactants.Whereas the selectivity is the ratio of valuable HE products to the sum of LE products, such as LPG (C 1 -C 4 ) and unconverted reactants.
The optimisation target is the operating parameters including temperature, pressure, H 2 : feed ratio and WHSV.Meanwhile, the range of the operating parameters is given in Table 3.It is important to highlight the combination of NSGA-II with hybrid model algorithms significantly reduces the computational time.For NSGA-II, it takes about 0.02 s to predict an optimised point, whereas, for CLK model, it takes more than >60 s to perform only the calculation, let alone the tremendous time required for searching for the optimised operating condition.Fig. 7 shows the Pareto front is found by performing a multiobjective optimisation of the hybrid surrogate model predictions using the NSGA-II algorithm.Each point on the Pareto front curve corresponds to an optimal solution.They are better than the points which are not on the Pareto front.Besides, Pareto front is able to identify a trade-off curve between yield and selectivity of the hydrocracking process.An increase in the yield of HE products is at the cost of a decrease in the selectivity, and vice versa.Nevertheless, Pareto front provide a guideline for the optimal design of hydrocracking reaction.The optimal solution can be identified by subjecting the yield and selectivity to some requirements and constraints in reality, such as maximum value allowed for the desired and side products formation.The corresponding operating parameters for the optimum selectivity and yield of HE from Fig. 7 is demonstrated in Fig. 8 in a 4D space.The results of Pareto front were further validated by comparing them with the CLK simulation results with an average MSE of 0.012.Fig. 9 demonstrates the yield and selectivity of HE products at different conversion under optimal operating conditions.Based on Fig. 9, maximum yield of 74.5% could be achieved with the minimum value of selectivity at 0.95.While maximum selectivity of 1.07 could be obtained with the compensation of the smallest yield value of 70.3%.This has shown that two objective functions could not be improved without having detrimental effect on each other.

Conclusion
A hybrid model, combining a physics-based CLK method and datadriven ANN was developed and adopted as the surrogate model in the multi-objective optimisation of hydrocracking process.The CLK model was developed and validated by comparing with the experimental data.A 625 × 74 database was generated using the verified CLK model for the training of the ANN model.An effective neural network with a high correlation coefficient (>0.99) was obtained.The trained ANN model was used to analyse the hydrocracking performance at different operating conditions, including temperature, pressure and H 2 : feed ratio.The yield and selectivity were found to significantly affected by the temperature (324 -354 • C), pressure (40 -54 bar) and H 2 : feed ratio (0.06 -0.15 kg/kg).After the parametric study, the trained ANN model was applied in the subsequent multi-objective optimisation to optimise the yield and selectivity of the light and middle products from hydrocracking.Finally, an optimum yield and selectivity at 71.7% and 0.985, respectively were obtained at the optimum temperature, pressure, H 2 : feed ratio and WHSV are 343.97• C, 50.8 bar, 0.135 kg/kg, and 2.43 h − 1 , respectively.Notably, NSGA-II also able to identify the trade-off curve to aid the active learning process during process design.The proposed method provides an alternative strategy for fast and effective prediction and optimisation of hydrocracking process.This AI-based method shows a great potential and readily to be applied in the other complex catalytic chemical processes for energy and decarbonisation applications.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Fig. 1 .
Fig. 1.Overall workflow of the development of hybrid surrogate model and adopted as the evaluation function in the multi-objective optimisation empowered by NSGA-II.

Fig. 3 .
Fig. 3. (a) Desire error of 0.01 is achieved for the first instance after 1636 iterations.(b) Error function behaviour of 500 sets of training data (c) The comparison between hybrid model data and simulation data from the CLK model at the operating condition of 324 • C, 47.5 bar, 0.105 kg/kg and 2 h − 1 .

Fig. 5 .
Fig. 5. (a) Overall product distribution of the hydrocracking process (b) conversion at different pressure.

Fig. 6 .
Fig. 6.(a) Overall hydrocracking performance, (b) conversion, and (c) hybrid model result for the 5 groups analysis at different values of H 2 : Feed ratio.

Fig. 7 .
Fig. 7. Results for the multi-objective optimisation of the hydrocracking process by hybrid surrogate model and NSGA-II.

Fig. 8 .
Fig. 8. Optimal parameters for temperature, pressure, H 2 : feed ratio and WHSV correspond to the Pareto front.

Fig. 9 .
Fig. 9.The yield and selectivity at different conversion under the optimal operating conditions.

Table 2
Properties of adopted operating condition.
X.Y.Tai et al.

Table 3
Operating parameters for the optimisation using NSGA-II.