Automated pH Adjustment Driven by Robotic Workflows and Active Machine Learning

Buffer solutions have tremendous importance in biological systems and in formulated products. Whilst the pH response upon acid/base addition to a mixture containing a single buffer can be described by the Henderson-Hasselbalch equation, modelling the pH response for multi-buffered poly-protic systems after acid/base addition, a common task in all chemical laboratories and many industrial plants, is a challenge. Combining predictive modelling and experimental pH adjustment, we present an active machine learning (ML)-driven closed-loop optimization strategy for automating small scale batch pH adjustment relevant for complex samples (e.g., formulated products in the chemical industry). Several ML models were compared on a generated dataset of binary-buffered poly-protic systems and it was found that Gaussian processes (GP) served as


Introduction
2][3][4][5][6] The process itself is often very time intensive due to complex proton partitioning equilibria and represents a challenging control problem resulting from the intrinsic non-linearity of the pH value. 7Additionally, buffer chemicals, weak acids or bases that can donate or accept protons, often used in the formulations to maintaining the pH within a narrow margin upon acid/base addition, complicate the process of pH adjustment.
While single buffered systems can be described using the Henderson-Hasselbalch equation, developing models for multiple poly-protic buffers (e.g., phosphate and citrate) remains an ongoing challenge. 8,9 mercial and literature-reported pH adjustment strategies are typically either based on proportional-integral-derivate (PID) control or model predictive control (MPC); both come with limitations.The PID control strategy continuously calculates deviation of the measured value from the target value, and applies a correction based on a proportional, integral or derivative correction strategy. 10,11 3][14][15] While no chemical information except the continuous measured pH is needed for PID, a loss of information needs to be accepted since insights into the chemical system cannot be implemented into future pH adjustments, as opposed to model based strategies.More recent pH adjustment approaches are based on MPC arrays, e.g., Altinten et al. describe a generalized predictive control for continuous flow pH adjustment 16 , Helmy et al. relied on multi-linear regression 17 and Alkamil et al. used a fuzzy artificial neural network (ANN) outperforming a PID-control. 18Others have also expanded on using ML based MPC strategies for this same purpose. 19,20  major challenge associated to MPC-based pH adjustment is operating in a low data regime, particularly of interest for highthroughput small-scale pH adjustment, as opposed to continuous pH adjustment.Aside from automated strategies, pH adjustment process is also often conducted manually in a R&D stage which is time consuming, requiring approximately five to seven minutes per sample (Table 1: entries 7, 10, 13).
The modern digitalization of research facilities allows the relatively fast and easy accumulation of experimental data, which can be used to accelerate subsequent workflows by employing transfer learning (TL).TL represents the method of pretraining ML models for one task and subsequentially using the trained model for a similar prediction task. 213][24][25] Process chemists often modify the composition of formulated products (e.g., liquid laundry detergents) to fine tune product properties (e.g., viscosity).While small modifications do not change the overall composition greatly, the pH response (titration curve) does change and thus the sample requires a new titration strategy every time.
Herein, we aim to employ a data-driven strategy for pH adjustment, benefitting from active learning and robotic facilities for experimental evaluation.We compare different surrogate models, machine-readable data representations and initialization strategies for the development of an active ML-based pH adjustment strategy of multi-buffered poly-protic mixtures.By employing TL, benefiting from previously generated data, we aim to demonstrate this novel strategy for pH adjustment and keep the process efficient, even under an extreme low data regime.
Our approach for active ML-driven closed loop optimization is shown in Figure 1a.A chosen ML model is initially trained within a low data regime (here three datapoints) and used for predicting the unknown (ground truth) full titration curve (Figure 1b).Subsequently, conditions towards the target pH are selected using a custom acquisition function.In active learning and Bayesian optimization the acquisition function is typically a trade-off between exploration to reduce model uncertainty and exploitation towards the target value.For the pH adjustment we choose a purely exploitative approach by selecting the minimizer of the difference between the model predictions and the target pH as the next experimental condition (Figure 1c).This is possible as the pH curve is monotonous, hence the algorithm will converge to the target pH.Until the target pH has been reached, the dataset is continuously updated and the model is retrained for the next iteration.We coupled our active ML-guided pH adjustment approach with a liquid handling robot and successfully adjusted a set of chemically different binary buffered mixtures.After a comparison of different surrogate models, we identified Gaussian processes (GP) as the best performing model.Moreover, we managed to boost efficiency of the process by utilizing TL strategies, thus decreasing the required iterations of pH adjustment.

Active ML-driven Closed-loop Optimization
7][28] The predictive model performance is subsequentially evaluated on a held-out test dataset -the model predictions are compared to the true values and the deviation is typically quantified via the residuals metric of root mean squared error (RMSE).0][31] In the case of pH adjustment this refers to the amount of acid or base to achieve a target pH.Algorithm 1 illustrates the control code used for automated pH adjustment.
To assess the performance of different surrogate models, particularly within a low data regime, and to deliver promising predictions, we conducted a comparative study between several models.Four commonly used ML models were chosen to understand their respective benefits and limitations: linear regression, random forest (RF), 32 Gaussian process (GP) 33 and artificial neural networks (ANN). 34Hyperparameters for each model were optimized a priori, see SI for more detail (Section 3).

Robotic Platform
Based on the need to generate training data, as well as to demonstrate the active ML-based closed-loop pH adjustment process, we developed a robotic platform capable of mixing buffer solutions, measuring pH value and automatically conducting pH adjustment (Figure 2).Here, the X/Y/Z labels refer to buffer stock solution that can be pumped into glass vials (24 x 15 mL) positioned on the robotic wheel, acting as an auto sampler.On subsequent positions of the wheel, pH measurement and addition of acid/base can be conducted.After each pH adjustment process, the electrode is cleaned with deionized (DI) water to avoid crosscontamination between the samples.Technical design of the bespoke robotic platform was based on previous studies. 35,36 e utilized FLab, a Python-based library, for facilitating communication between the motors, pumps, the pH electrode and the implementation of the ML based optimization algorithm. 37See SI (Figure S1) for more detailed images and information on the robotic platform.

Closed-loop Optimization pH-Adjustment
The performance of pH adjustment can vary significantly, depending on the complexity of the system response towards the addition of a titrating agent.To demonstrate the broad applicability of the pH adjustment strategy, we tested our approach on a variety of different chemical systems.18 experimentally generated datasets of binary buffered mixtures, containing the acid/base volume addition as the input and the measured pH value as the output were used, see Table 1.Based on the existence of this experimental data, simulated closed-loop optimization was conducted.The majority of the datapoints were held out and only a randomly selected batch of datapoints for initializing the model was used.The strategy (Figure 1) was applied, and the target pH was set to pH 6, with an acceptable deviation of a pH value of ± 0.2.For mixture 6 the initial pH of the sample was already within the target pH margin so the objective was set to pH 8.This workflow was conducted 10 times for each dataset and the mean/standard deviation of the number of iterations needed to achieve the target pH were calculated.A broad comparison of the 18 buffer systems and the choice of the ML model was conducted to assess the required number of iterations to conduct pH adjustment (Figure 3c).Chemical systems with multiple protons tend to have more linear areas whereas a system comprised of two single chemicals (e.g., ammonium, acetate) tends to be less smooth, see SI Figure S2.
In addition to the number of protons, the pKa values are also important as they indicate the location of the inflection point that will likely influence the system response of the binary mixture.Systems containing two poly-protic buffers (e.g.citrate and phosphate) tend to require fewer iterations compared to systems containing monoprotic buffers such as ammonium or acetate.Given the variety of the tested buffer chemicals, we believe that a wide variety of buffer systems can be represented using the dataset, i.e. the strategy should be applicable to samples containing other pH-sensitive chemicals which are not directly represented in this study.1 for indexed buffer system positions.
Linear regression clearly seems not to fit the datapoints well due to non-linear pH response, but it was conducted as a reference.It requires a high number of iterations for systems containing many polyprotic components, e.g., citrate, see Figure 3a,b.Overall, most of the systems could be adjusted within 3-4 iterations using three datapoints to initialize the optimization, thus giving a total of 6-7 required steps.On average, RF required 3.4 ± 0.3 iterations, ANN required 5.6 ± 1.0 iterations and GP required 3.1 ± 0.6.Here and in the following the reported values refer to the mean and the error of the mean value of 10 single iterations, see SI Eqn.S2.Our analysis shows that using the GP model gives the best results with the lowest number of iterations within the optimization loop.

Featurization Effects
Representing chemical compounds in a machine-readable format is considered a challenge in chemoinformatics due to its effect on different surrogate models and, thus, their predictive performance. 39Previous literature has led to ambiguous outcomes on whether the addition of chemical information within low data regimes, such as the initialization of active-ML search strategies, is beneficial. 31,40 o learn more about featurization effects on this specific application, we compared two input feature sets.The large feature set contains information on the components' concentrations, component pKa values, number of protons a buffer can accept/donate and the initial pH value of the buffer mixture (prior to any acid/base addition).
The small feature set contains only information on the components' concentrations but no chemical insights.
As one can see in Figure 4, the performance of different informative features only minimally varies across the set of 18 systems.On average, using the large feature set resulted in 3.1 ± 0.6 iterations and the small feature set in 3.2 ± 0.6.While the results of the GP performance without chemical information might seem surprising, it must be noted that additional features increase the number of model parameters that need to be learned, as shown in other previously reported active ML studies by Pomberger et al. 40 The model initialization for each single system was conducted with 5% of the training data instead of a consistent number of datapoints.While the number of initialization datapoints varies, the focus is on the relative comparison of the different feature sets.

Figure 4.
Illustration of the required active learning iterations using GP to reach target pH for a set of buffer systems using two feature sets.Error bars represent the error on the mean value.See Table 1 for information on the indexed buffer system positions.
As a result of the very similar outcome of the experiments (addition or exclusion of chemical information) it can be assumed that the strategy can be applied to chemical systems in a generic manner, specifically without exact knowledge of the chemical composition or chemical structure -a challenge faced when e.g., working with confidential industrial data.
Due to the slightly better performance, all further experiments were conducted using the large feature set within this study.

Variation of the Number of Datapoints for Model Initialization
The choice of the number of datapoints (obtained via random selection) for initializing the closed-loop cycle impacts the preliminary surrogate model's prediction performance.While more initial datapoints could be considered as advantageous to train more accurate surrogate models, using fewer datapoints accelerates the overall adjustment process and might allow to selectively choose the subsequent datapoints based on the model's prediction instead of initial random allocation.Within this case study we aim to identify this effect by comparing a GP, initialized with two, three and four random datapoints.
We investigated different sized initialization datasets for all 18 binary buffer systems, see Figure 5.When analyzing the results, we want to directly compare the total number of datapoints (i.e.pH measurements) required to obtain the target pH, hence the sum of the number of datapoints within the initialization dataset and the number of datapoints obtained during the experimental iterations.Overall, using only two initial datapoints resulted in the fastest method, requiring on average 5.8 ± 0.6 total pH measurements, followed by 6.3 ± 0.6 and 6.8 ± 0.5 pH measurements for three and four initial datapoints, respectively.When initializing a model with two datapoints, the subsequent two datapoints are chosen selectively as opposed to using four random datapoints for initialization.The results indicate that the selective choice of the active ML strategy seems to be beneficial over random datapoint allocation, irrespective of the fact that the preliminary model is solely trained on two datapoints.
Figure 5. Illustration of the effect of variation in the number of initialization datapoints on the total number of pH measurements necessary, using the large feature set and GP model.The deviation represents the calculated error on the mean value.See Table 1 for information on the indexed buffer system positions.

Transfer Learning-Accelerated Closed-Loop Optimization
Harvesting existing data to facilitate knowledge transfer was explored, to measure if preliminary models have a better understanding of the system response to acid/base additions, thereby accelerating the process of pH adjustment.In detail, we investigated whether prior knowledge of the pH response of single components may accelerate closedloop pH adjustment of binary buffered mixtures.For example, information on pH response of ammonium and acetate was provided when conducting the pH adjustment of an ammoniumacetate sample.The titration information of the pure single-component buffer chemicals was combined with the initialization data and used for training the initial model.As shown in Figure 6, the observable trend is that the addition of prior information improves the optimization performance.1 for information on the indexed buffer system positions.
Overall, using the GP alone without any prior information (just the initialization data) required 3.1 ± 0.6 iteration cycles, whereas, when implementing prior information of the single components, the number of iterations could be decreased down to 2.2 ± 0.4.Particularly challenging chemical systems, such as ammonium-acetate could be adjusted in significantly fewer number of iterations.

Real-Time Automated pH Adjustment
After developing a strategy for automating the experimental workflow via a robotic platform along with an algorithmic strategy for controlling the addition of acid/base separately, we then aimed to merge both efforts.Using Flab, the control code of the liquid handling robot allows direct interaction with the algorithmic pH adjustment strategy -the measured data is directly used for ML surrogate model training.The results of the subsequent decision making (next conditions to evaluate experimentally) is passed to the liquid handling robot.The adjustment process and decision making can be monitored in real-time, as shown in Figure 1b.
While previous experiments were initiated with two -four randomly selected datapoints, we now initiated the pH adjustment process with a single datapoint aiming to decrease the overall number of required experimental observations.After initial pH measurement (volume of added acid/base = 0) the selected volume of titrant is added, and data acquisition commences.Figure 7 illustrates the results of the automated pH adjustment, representing the average of three single experimental evaluations.The plot indicates the clear differences between various buffered systems, ranging from two iterations (citrate-phosphate) to eight iterations (acetate-citrate).To demonstrate the performance of our approach for a chemically extremely complex equilibrium system and the feasibility of the GP to model the data we conducted successful pH adjustment of a sample containing up to four buffer chemicals.For a mixture of citrate, phosphate, ammonium and acetate the target pH 6 was achieved within 3.7 ± 0.4 iterations, thus demonstrating the versatility of the presented data-driven strategy.
Overall, 4.7 ± 0.4 iterations were required to adjust the sample mixtures to the target pH 6.

Conclusions
Within this study, we present a method to adjust the pH of several multi-buffered polyprotic solutions to aid chemical laboratories dealing for formulation chemistry.A set target pH can be achieved via an iterative workflow in a fully automated manner, using a robotic platform informed by an active machine learning-based optimization strategy.
Specifically, a Gaussian process was used to predict the titration curves of several mixtures and guide the pH adjustment towards a set target pH.Chemical inputs were featurized containing increasing levels of chemical information, delivering only marginally better efficiency.This can be regarded as advantageous since it allows to implement this approach for systems without the requirement of molecular information, particularly beneficial when dealing with confidential industrial formulation samples or when the composition of the In an attempt to balance the number of initially randomly chosen datapoints to selectively chosen datapoints it was observed that the overall sample efficiency improved when using less initial data points.Finally, the strategy was demonstrated within a real experimental study with chemical systems containing up to four buffers -connecting the optimization algorithm and a robotic platform for conducting sample preparation and fully autonomous pH adjustment.
The developed workflow can be particularly beneficial for small scale high-throughput pH adjustment experiments as required by R&D facilities in formulation chemistry and may incentivize data accumulation and management for pH adjustment processes.Moreover, we see a great potential of this technique in the age of personalized cosmetics and medicine as well as all other small batch formulation processes.

Materials
Unless mentioned otherwise, all solvents and chemicals were purchased from commercial suppliers and were used as received.Compound names are based following the IUPAC nomenclature.
For initial manual experiments a Metrohm 716 DMS Titrino was used and was calibrated via three-point calibration using buffers of pH 4, 7 and 10.For the automated pH measurement studies the pH meter (VWR Model 662-1767) was also calibrated with the latter buffers.

Machine Learning Models
This section details the identification of suitable hyperparameters of the surrogate models.
As opposed to traditional hyperparameter tuning -where the objective is to find the parameters that deliver a low prediction error -we aimed to decrease the average number of iterations to reach the target pH within the active ML-driven closed-loop optimization.The presented values are the mean (Eqn.We trained ML models on the experimental titration data of the pure chemicals (e.g., ammonium and KH2PO4) and evaluated ML predictions for the binary mixture (experimental data was existent), thus performing an extrapolative prediction.The large feature set, using chemical information was used.Figure S2 illustrates true vs predicted titration curves -as visible the predictions are capable of predicting the trend, however, often clearly miss the ground truth.While we initially attempted to develop a purely predictive model for extrapolation across different systems, we understood the underlying challenges and limitations of accurately modelling the pH of multi-buffered mixtures within extrapolative predictions.Eventually, we changed our strategy from a purely predictive approach to an iterative strategy, involving repeated sampling and model training.Table S3.Summary of the results of activation function experiment for the ANN.The activation function Tanh gave the best results.However, it performed very low during the benchmark in combination with all optimized parameters.Therefore, we decided to use the second-best activation function, ELU which delivered a better outcome.Due to computational expenses, we stopped at 40 neurons per layer -this value also delivered a suitable performance.According to the table above, it seems that 2 hidden layers gave the best result.However, in combination with the optimized value for the number of neurons per layer (40 neurons) the accuracy decreased significantly.For this reason, we decided to increase to 3 hidden layer, which better results.

Random Forest
We implemented the random forest surrogate model using the package scikit learn, version 0.23.0 -this version was used for all subsequent modelling.Based on preliminary insights the suitable number of estimators was found to be 400.All other parameters were kept as default.

Figure 1 .
Figure 1.An overview of the ML-driven pH adjustment strategy (a) Design of the closed-loop optimization toward pH adjustment (b) Illustration of ML model predictions and decision making using the acquisition function, see minimum at 4. (c).The numbers represent the order of the observations, and the red font color represents the datapoint to be acquired in the next iteration.Both acid and base volume addition are represented on the x axis, where the negative values account for acid volume and the positive values account for base volumes.

Figure 2 .
Figure 2. Schematic (a) and image (b) of the robotic pH adjustment platform.X/Y/Z indicate the stock solutions of buffer chemicals.For simplification not all 24 vials have been drawn on the robotic wheel.See SI for detailed labelled explanation of all components.

Figure 3 .
Figure 3. Illustration of the active ML pH adjustment (a) Insights into prediction performance of four models after four observations (buffer system 1) and comparison to the ground truth.(b) Optimization trajectory towards the target pH 6 (buffer system 2) using a ANN surrogate model (c) Comparison of the required iterations to reach target pH using four different ML models for 18 different binary buffered systems.The features include the pKa values as well as the initial pH values.The error bars represent the error on mean value.See Table1 for

Figure 6 .
Figure 6.Comparison of active ML-driven pH adjustment using GP and the full feature set with and without the implementation of prior information.The error bars represent the error on the mean value.Model initialization was conducted with 5% of the training data.See Table

Figure 7 .
Figure 7. Results of the experimental case study using the robotic platform and the developed active ML closed-loop algorithm to conduct automated pH adjustment of unknown buffered systems.For phos-am-ac 1:1:1 the target pH was set to 8 since the initial sample already yielded approximately a pH of 6.The error bars represent the error on the mean value, see SI Eqn.S2.Abbreviations: am: ammonium, phos: KH2PO4, ac: acetate, ci: citrate.
sample has not yet been characterized in detail.Applying transfer learning to the optimization cycle significantly boosted the performance, thus highlighting the main advantage of MLdriven pH adjustment over PID controlled or manual pH adjustment.Since it is common for samples in high-throughput formulation preparation to differ in only one or a few parameters in their compositions, learning from previous pH adjustments and transferring the obtained system knowledge into a new pH adjustment process has been quantitatively shown to benefit the overall workflow.

Figure S1 .
Figure S1.Robotic platform used for conducting automatic pH adjustment.(a) Overall view of the system (B) Detailed view of the pH electrode (d) Detailed view of the washing position (d) Detailed view of the dosing position and pH electrode.

Figure S2 .
Figure S2.Comparison of predicted vs true pH curves of given systems using base titration data.Each datapoint represents a single measurement/prediction, the difference in datapoint density is due to the experimental workflow.

Figure S5 .
Figure S5.Comparison of different activation functions for the ANN. 10 single iterations, error bars are reported according to Eqn. 2.

Figure S6 .
Figure S6.Comparison of different neuron numbers per layer for ANN. 10 single iterations, error bars are reported according to Eqn. 2.

Figure S9 .
Figure S9.Comparison of different numbers of estimators for random forest model.10 single iterations, error bars are reported according to Eqn. 2.

Table 1 .
38list of buffers used in this study.Featurization included the concentration of both buffer chemicals, the volume of acid/base as well as chemical information such as pKa values, the number of protons per buffer and the initial pH value.Figure3aillustrates the varying prediction performance of four chosen surrogate models, using four observations for training.By comparing the single surrogate model predictions against the ground truth it becomes visible that linear regression delivered the worst fit, as expected, whereas GP delivered the best performance.Moreover, the characteristic piece-wise constant predictions, arising from the decision-tree based model architecture of the RF are visible, see Loh.38Figure3billustrates the optimization trajectory of ANN (buffer system 2) towards the target pH 6, i.e. how the model conducts sampling of experimental datapoints to find the target pH 6.It is visible that the algorithm initially requires approximately two iterations to explore the response and then starts to exploit towards the objective.
1) and standard deviation of 10 single experiments, to allow for generalizability.Error bars are reported in error on the mean value, see Eqn. 2.We started with an externally generated pH dataset, containing acid and base titration information of 18 binary mixtures, giving a dataset of 1956 single datapoints (single pH measurements).Due to the density of datapoints, a simple random split of the data -where the data is split up randomly in training and test data -would not allow to understand whether the ML model is able to perform useful predictions for realistic applications, such as extremely low amount of data.Thus, we selectively chose the data which is present in the test and training partition and designed the preliminary experiments as extrapolative prediction tasks.

Table S4 .
Summary of the results of the neuron number experiment for ANN.