CamOptimus: a tool for exploiting complex adaptive evolution to optimize experiments and processes in biotechnology

Multiple interacting factors affect the performance of engineered biological systems in synthetic biology projects. The complexity of these biological systems means that experimental design should often be treated as a multiparametric optimization problem. However, the available methodologies are either impractical, due to a combinatorial explosion in the number of experiments to be performed, or are inaccessible to most experimentalists due to the lack of publicly available, user-friendly software. Although evolutionary algorithms may be employed as alternative approaches to optimize experimental design, the lack of simple-to-use software again restricts their use to specialist practitioners. In addition, the lack of subsidiary approaches to further investigate critical factors and their interactions prevents the full analysis and exploitation of the biotechnological system. We have addressed these problems and, here, provide a simple‐to‐use and freely available graphical user interface to empower a broad range of experimental biologists to employ complex evolutionary algorithms to optimize their experimental designs. Our approach exploits a Genetic Algorithm to discover the subspace containing the optimal combination of parameters, and Symbolic Regression to construct a model to evaluate the sensitivity of the experiment to each parameter under investigation. We demonstrate the utility of this method using an example in which the culture conditions for the microbial production of a bioactive human protein are optimized. CamOptimus is available through: (https://doi.org/10.17863/CAM.10257).

growth and towards the production of the recombinant protein once its synthesis had been 26 induced. This fermentation regime suggested that four main biological objectives should be 27 selected for our system: (i) maximising cell density prior to induction of recombinant protein 28 synthesis, (ii) maximising the biological activity of the recombinant enzyme at the end of the 29 fermentation, (iii) maximising the culture's specific productivity, and (iv) minimising any 30 further increase in cell density during the induction phase. flask experiments revealed that HuLy activity was lost from batch cultures whose pH was not 41 controlled throughout the course of the fermentation, and so pH was included as another 42 factor to be optimised.

43
The maximal and the minimal concentrations of medium components previously reported in 44 the literature were used to set the allowable ranges for each factor [4,5,7-9] Although pH 45 values between 5 and 6 have been used in most studies, K. phaffii can grow in a wide range 46 pH from 3 to 7 [10]. The pH of the culture can affect proteolytic activity, secretion, and 47 protein production [11]; therefore, we kept the pH range as wide as possible in order to 48 identify the optimum value for HuLy production.

49
Investigation of the effect of the citrate/phosphate buffer on culture performance 51 The effect of the addition of buffer components to maintain constant pH on the growth and 52 production characteristics of the culture was investigated at different stages of the analysis.

53
Preliminary analysis involved growing the microbial culture in shake flasks, where the 54 culture pH was maintained constant in buffered conditions, or was allowed to decrease in 55 non-buffered defined medium. The protein activity diminished in cultures where the pH was 56 not kept constant, thus demonstrating the necessity for pH control. For the operating 57 conditions investigated in this study, the working pH was in the range of 3-7.

58
Citrate/phosphate buffer, with reported functionality in the pH range of 2.6-7 [12], was 59 selected as a suitable option for our purposes, since its constituents did not pose a threat of 60 toxicity for the culture. Since citric acid and dibasic sodium phosphate were used to prepare 61 the buffer, possible interference due to buffer components acting as macronutrients was 62 investigated to rule out such complications. Varying the concentration of citric acid (5-fold) 63 to account for the possible variation it would display in the pH range of 2.6 -7 (<5-fold) did 64 not yield a change in growth or protein activity. In order to investigate the effect of the 65 phosphate group acting as a macronutrient, a similar experiment was carried out as described 66 for testing the effect of the citrate component of the buffer. Furthermore, in order to test the 67 phosphate group specificity of the analysis, dibasic sodium phosphate was replaced by 68 sodium sulphate, to provide equivalent Na molarities. The difference in growth or protein 69 activity always remained within 10% of one another and that of the reference study, 70 regardless of the concentrations employed within the limits that were of interest for the 71 purpose of this study (Table A). Therefore the citrate/phosphate buffer was considered as an

105
The slopes of the models indicating the existence of possible trends were negligibly small.

107
Fine-tuning the global optimum by population profiling 108 We carried out fine-tuning of the optimised environmental conditions that we had obtained at 109 the end of the 3 rd "generation" in the GA study by employing a population profiling 110 methodology to determine whether the performance of the system could be improved even 111 further. We investigated the change in how many individuals assumed each level over the 3 112 "generations" for each one of the 9 factors ( Figure 2). We calculated the percent occupancy 113 of the levels in the better-performing, i.e. the "fitter" fraction of the "population" in the last 114 "generation", which corresponded to the half of the "population" with the highest "fitness" scores, and employed this as a footprint for identifying the optimal levels for each factor 116 ( Figure C). The footprints of methanol, sorbitol and pH in the 3 rd "generation", as well as 117 their convergence profiles through the course of the GA search, indicated that the levels that 118 these factors assumed converged towards unique values at 6.75g/L, 7.60g/L and 6.74, 119 respectively (Figure 2(a-c) and Figure C(a-c)). In the case of ammonium, potassium and 120 glycerol, although the convergence profiles displayed bi-modal behaviour, the footprints in 121 the 3 rd "generation" indicated a pronounced convergence towards unique levels at 6.55 g/L,