Machine learning meets continuous ଏ ow chemistry: Automated optimization towards the Pareto front of multiple objectives

ଏ between multiple objectives is identi ଏ ed without arbitrary weighting factors, but via true multi-objective optimization.


Introduction
Robotic automated chemistry development is the future of chemistry and chemical manufacturingincreasingly methods using robotics and machine learning are applied to discovering new chemical transformations [1], synthesizing organic compounds [2], and multiple process parameter optimization [3][4][5].The task of optimizing chemical reactions is highly challenging, since optimization response surfaces are often non-linear, and there are many simultaneous objective functions, such as reaction yield, process cost, impurity levels and environmental impact, which need to be considered [6].Especially the problem of optimizing the impurities proଏle is of huge signiଏcance for the pharmaceutical industry.The ability to perform eଏcient and automated multi-objective optimization represents a step-change advance in developing novel chemical processes.However, in the optimization community, the problem of multi-objective black-box optimization where the objective functions are expensive-to-evaluate, in terms of cost and time of conducting an experiment, which covers most problems of interest to the chemistry community, belongs to the class of 'orphan' problems, with very few advanced algorithms available.This paper demonstrates for the ଏrst time the use of true multi-objective machine learning methods for the self-optimization of two exemplar chemical reactions with competing economic and environmental objectives.We demonstrate that both objectives can be simultaneously optimized and that a set of optimal solutions corresponding to the trade-oଏ between reactor productivity and environmental impact can be identiଏed.Furthermore, the problem of minimizing product impurities was included, which has not been addressed in previous selfoptimizations [7][8][9][10][11][12].
Single-objective optimization algorithms, such as simplex [13,14] and SNOBFIT [15], have been successfully employed for the optimization of chemical reactions [16][17][18][19][20][21][22][23].However, it is important to consider multiple performance criteria when developing a chemical process.For example, Moore and Jensen observed low conversion at the self-optimized conditions corresponding to optimal productivity for a Paal-Knorr reaction, thus yielding an overall sub-optimal process.This was resolved by the addition of a penalty term for low conversion to their objective function [24].
The combination of multiple competing objectives into a single function is a common remedy.This was demonstrated by Houben et al. for the multitarget optimization of an emulsion polymerization process using a machine learning algorithm [25,26].However, the a priori determination of adequate weights for these objectives is diଏcult.For example, Fitzpatrick et al. combined throughput, conversion and consumption into a single-objective function that led to skewed results [27].
These examples highlight two major problems with the scalarization of multiple performance criteria: (i) quantitative a priori knowledge is needed which requires additional experiments; (ii) only one optimal result is obtained which is dependent on the chosen objective function and does not reveal the complete trade-oଏ between multiple performance criteria, i.e., their Pareto front (Fig. 1).
As economic and environmental objectives are generally competing, it is impossible to ଏnd one point where both objectives are at their optimal values [6].Rather, the solution of a multi-objective optimization problem is a set of non-dominated points where one objective cannot be improved without having a detrimental eଏect on the other.This set is called a Pareto front (Fig. 1) [28].The goal of this study is to explore the complete Pareto front of a reaction system and not only a single compromise point.This requires the simultaneous optimization of multiple objectives.Multiple (conଏicting) objectives are encountered in many chemical engineering applications, e.g., conversion and selectivity in a chemical reaction [29].The simultaneous optimization of those using multi-objective optimization techniques has also been reported in numerous literature examples [30][31][32][33][34][35].As solution strategies, parametric approach, epsilon constraint method or genetic algorithms like the NSGA-II algorithm are most commonly used [36].However, these methods are not well-suited for the automated chemical reaction system because they require many function evaluations and partly derivative information that is not (analytically) available.
Bayesian optimization methods are derivative-free global stochastic optimization methods that are particularly well-suited for expensive-toevaluate problems.They have successfully been used to optimize expensive-to-evaluate computer simulations in many disciplines [35,[37][38][39].To archive this, Bayesian optimization algorithms train Gaussian process (GP) surrogate models on available data and identify new samples based on the predictions and uncertainty of the surrogates.
There exist a few multi-objective Bayesian optimization algorithms that aim to approximate a Pareto front, including: Thompson Sampling Eଏcient Multi-Objective (TS-EMO) [40][41][42], ParEGO [43] and expected hypervolume improvement (EHI) [44].The quality of a Pareto front can be quantiଏed by its hypervolume, i.e., the area spanned by the Pareto front and a reference point in the 2-dimensional case.Data ef-ଏciency in this context is given by hypervolume obtained in a limited number of function evaluations.In this work, we use TS-EMO which has been shown to have comparable or better data eଏciency than both EHI and ParEGO.Further, TS-EMO has performed favorable compared to the commonly used genetic algorithm NSGA-II on a set of mathematical test functions for a given budget [40][41][42].The TS-EMO algorithm [40][41][42] has recently been applied to the optimization of a process ଏowsheet, combining targets of low cost and low carbon emissions over the life cycle [35].An open-source implementation of TS-EMO is available on GitHub [42].

Results and discussion
Herein, the recently developed TS-EMO algorithm is combined with an automated continuous reaction system.A small dataset is collected using Latin hypercube (LHC) sampling to initialize the TS-EMO algorithm [45,46].Within the algorithm, individual GP surrogate models are trained to approximate the unknown response surfaces of the objectives [46,47].The GPs are non-parametric regression models that can be understood as inଏnite dimensional generalizations of multivariate Gaussian distributions [46].The TS-EMO algorithm randomly samples functions from those GPs using spectral sampling.Then, a multi-objective genetic algorithm is called within TS-EMO and identi-ଏes the Pareto front of the random samples.Finally, TS-EMO identiଏes a set of experiments from that Pareto front (of the random GP samples), which aim to improve the hypervolume of the actual Pareto front (of the experiments conducted).After conducting the suggested experiments, the GPs are updated and the process is repeated iteratively for a desired number of experiments.Within the algorithm the randomness of sampling naturally accounts for the exploration and exploitation trade-oଏ desired in Bayesian optimization.
The TS-EMO algorithm was incorporated into the automated ଏow reactor (Scheme 1) and evaluated using two case studies: (i) S N Ar reaction between 2,4-diଏuoronitrobenzene 1 and morpholine 2 to form desired ortho product 3 and undesired para-4 and bis adduct 5 (Scheme Fig. 1.An example of a system with two competing minimization performance criteria A and B. It is infeasible to ଏnd the utopian point where both A and B are at their optimal values.The points on the Pareto front are non-dominated solutions, as A or B cannot be improved without having a detrimental eଏect on the other. 2); [47] (ii) N-benzylation of α-methylbenzylamine 6 with benzyl bromide 7 to form desired 2°amine 8 and undesired 3°amine 9 (Scheme 3) [48].In both cases, the product composition was determined by on-line HPLC and the data used as inputs for the TS-EMO algorithm.
To ଏnd environmentally acceptable and economic operating conditions for the synthesis of ortho-3, we aimed to maximize the spacetime yield (STY) and minimize the E-factor of the reaction simultaneously [Eq.( 1)], where the STY is a measure of reactor productivity [Eq.( 2)] and the E-factor is deଏned as the ratio of the mass of waste to the mass of product [Eq.( 3)] [49].
It is important to note that the product composition, and the resulting downstream processes (work-up etc.), will have a signiଏcant impact on the STY and E-factor of a process.However, such considerations were beyond the scope of this work, as non-reactive unit operations were not included in the optimizations.
The objectives were natural log-transformed as this is known to enhance response-surface-based optimization [50].Due to the logtransformation, the distances in the Pareto front of the algorithm are log-scaled and hence the algorithm favors a log-spaced Pareto front.The optimization was conducted with respect to four-variables: residence time (t res ), morpholine 2 equivalents, concentration of 1 and temperature (Table 1).The results of the optimization are shown in Fig. 2.
Herein, the automated setup was started in the evening and run overnight.The algorithm was terminated manually in the morning under the criterion that a dense front of at least 20 experimental Pareto data points were collected.The initial LHC size was 20, and results were in the region of solutions corresponding to high E-factors and low STYs.Nevertheless, the subsequent 48 experiments designed by the TS-EMO algorithm rapidly converged to a dense Pareto front consisting of 26 points.The optimal STY was 13,120 kg m −3 h −1 with an E-factor of 1.6.Conversely, the optimal E-factor was 0.2 with a STY of 3650 kg m −3 h −1 .Therefore, the data shows the inherent trade-oଏ between STY and E-factor.The Pareto front can be divided into two sections.The left section where the gradient is shallow, the STY can be signiଏcantly increased whilst having a relatively small eଏect on the Efactor.This corresponds to decreasing the t res at the lower limit of morpholine equivalents.The STY can be further improved by increasing the morpholine equivalents at the lower t res limit.However, this results Table 1 Optimization limits for the self-optimizations.a Optimization parameters directly input in terms of ଏow rates and ratios.

P2 HPLC
7:6 is related to the benzyl bromide 7 equivalents and solvent:6 is related to the concentration of 6.As the direct alkylation of amines with alkyl halides is prone to byproduct formation via over alkylation [48], we chose the N-benzylation of 1°amine 6 as a second case study.N,N-diisopropylethylamine (DIPEA) was selected as the base for this reaction to suppress the formation of the quaternary ammonium salt [51].Thus, we aimed to simultaneously maximize the STY of 2°amine 8 and minimize the yield of the 3°amine 9 impurity [Eq.( 4)].

Ȃ minimize [ ln(STY), ln(% impurity)]
As previously, the optimization was conducted with respect to fourvariables: 6 ଏow rate, 7:6 ratio, solvent:6 ratio and temperature (Table 1).The results of the optimization are shown in Fig. 3. Again, the experimental system was run autonomously overnight and was manually terminated in the morning using the same termination criteria as previously.
The results from the initial 20 LHC experiments were better distributed in the objective plane compared to the ଏrst case study.Of the 58 experiments designed by the TS-EMO algorithm, 20 points formed a dense Pareto front.The optimal STY was 331 kg m −3 h −1 with an impurity yield of 10.0%.Conversely, the optimal impurity yield was 2.2% with a STY of 142 kg m −3 h −1 .Therefore, the data shows the inherent trade-oଏ between STY and % impurity, similar to that observed for the S N Ar reaction between STY and E-factor.Similar to case study one, the STY can initially be increased whilst having a relatively small eଏect on the % impurity.This corresponds to increasing the concentration of 6 at the lower temperature limits.Any further increase in STY is achieved by increasing the temperature, which results in a substantial increase in the % impurity (operating conditions for each result are provided in the ESI).It should be noted that there was no reduction in reactor performance observed throughout the course of either optimization, indicated by low variability in the results between experiments with similar reaction conditions.
In both case studies, multi-objective optimization successfully identiଏed the target trade-oଏ curve.However, it should be noted that although the proposed algorithm searches globally, stochastic methods cannot give any guarantee that the global Pareto front is approximated to a given tolerance within any ଏnite number of iterations.The main advantage of the Pareto front is that the information it contains can be utilized for process design.For example, it may be beneଏcial to accept slightly higher impurities in one reaction step if it archives a sig-niଏcantly higher STY that more than oଏsets the additional downstream processing costs.In contrast, constrained single-objective optimization, such as those used by Reizman [19] and Baumgartner [52], only identify one solution point and reveals no knowledge regarding shape of the Pareto front.In addition, single-objective optimization may ଏnd points that are optimal with respect to one objective but that are still dominated by the Pareto front.For instance, Fig. 2 shows several points with a low E-factor and diଏerent STY.A single-objective optimization with respect to E-factor cannot diଏer between those points but the proposed approach identiଏes points that improve STY without worsening E-factor.
The surrogate models of the underlying objectives include hyperparameters that are provided by the TS-EMO algorithm, which provide qualitative information about the relevance of the input variables.This is referred to as automatic relevance determination [46].The hyperparameters for both case studies are shown in Table 2.The hyperparameters θ i correspond to the input variables where lower values indicate a greater contribution to the objective.In the S N Ar case study, the temperature and concentration are signiଏcantly more relevant for E-factor compared to STY, however the residence time and morpholine equivalents are relevant to both objectives.This is consistent with the Pareto optimal points, where the residence time and morpholine equivalents are the decisive variables in determining the trade-oଏ between STY and E-factor.In the N-benzylation case study, 6 and 7:6 are more relevant for the % impurity than for the STY.In contrast to the ଏrst case study, the temperature is relevant to both objectives.This is consistent with the Pareto optimal points where temperature is the decisive variable in determining the trade-oଏ between % impurity and STY.
Furthermore, the σ 2 n hyperparameters correspond to the noise of the system.The low values observed for the systems indicate high quality and consistent data.As a result, we were able to generate precise GP surrogate models of the data, which can be used to predict the response of additional experiments.In the ESI, we show that the GP surrogate models can be further optimized to provide a denser Pareto front.This is useful for the optimization of processes involving high value reagents, where the number of actual experiments is limited by cost/ availability.

Conclusion
In conclusion, we have demonstrated the application of a machine learning global multi-objective optimization algorithm for the self-optimization of reaction conditions.Two case studies using exemplar reactions have been presented, and the proposed setup was capable of simultaneously optimizing productivity (STY) and environmental impact (E-factor) or % impurity.The four-parameter optimizations  eଏciently converged to dense Pareto fronts within 68 and 78 experiments respectively.These revealed the complete trade-oଏ between the objectives, which is valuable information when identifying a good compromise between multiple performance criteria.The developed approach is suitable for any robotic optimization procedure with continuous optimization variables and is readily extended to more than two simultaneous objectives.The use of Gaussian process models provides additional knowledge about the nature of interactions within the system, i.e., the contribution of input variables to the objective functions, as well as numerical characterization of the quality of the experiments.

Conଏicts of Interest
There are no conଏicts to declare.

Scheme 1 .Scheme 2 .Scheme 3 .
Scheme 1. Reactor set-up for case studies.Reagents were pumped using JASCO PU980 pumps (P) and were mixed in Swagelok tee-pieces.A Polar Bear Plus Flow Synthesiser was used for heating and cooling of the tubular reactor.Aliquots of the reaction mixture were delivered to the HPLC mobile phase using a VICI Valco 4 port sample loop (SL).The reaction was maintained under ଏxed back pressure using an Upchurch Scientiଏc back pressure regulator (BPR).PTFE tubing (1/16″ OD, 1/32″ ID) provided by Polyଏon was used throughout the reactor.Swagelok unions and ଏttings were used throughout apart from the sample loop (VICI) and BPR (Upchurch).Quantitative analysis was performed on an Agilent 1100 series HPLC instrument.The automated reactor was controlled by a custom written Matlab program, within which the TSEMO algorithm was implemented.See ESI for more experimental details.

Fig. 2 .
Fig. 2. Results of the four parameter multi-objective self-optimization of the S N Ar reaction (case study one).The initial LHC size was 20.The TS-EMO algorithm conducted 48 additional experiments, 26 of which formed a dense Pareto front highlighting the trade-oଏ between the STY and E-factor.2 LHC points were omitted for clarity: STY = 370 kg m −3 h −1 , E-factor = 5.15 & STY = 500 kg m −3 h −1 , E-factor = 7.07.

Fig. 3 .
Fig. 3. Results of the four parameter multi-objective self-optimization of the Nbenzylation (case study two).The initial LHC size was 20.The TS-EMO algorithm conducted 58 additional experiments, 20 of which formed a dense Pareto front highlighting the trade-oଏ between the STY and impurity yield.

Table 2
Hyperparameters of GP surrogate models.Lower values of θ i indicate a greater contribution to the objective.