A Two-Level Factorial Design for Screening Factors that Influence the Growth of Bacillus sp. Strain UPM2021n isolated from a Mangrove Sediment on Acrylamide

Acrylamide; a neurotoxicant, is an emerging pollutant of global importance. As a strategy for bioremediation, the breakdown of acrylamide by the action of microbes has seen a gradual but consistent increase in attention all over the world. An acrylamide-degrading bacterium tentatively identified as Bacillus sp. strain UPM2021n was screened for significant parameters contributing to optimized growth on acrylamide using a two-level factorial design. The two-level factorial design was adopted in screening of five independent factors influencing the growth of the bacterium on acrylamide. These factors include pH, temperature, incubation time, acrylamide concentration and glucose concentration. A total of 32 experiments with three replications of the centre points were carried out. The two-level factorial design was successful in finding important contributing parameters in the growth of this bacterium on acrylamide, which were pH and incubation time (p<0.05) that can be further optimized using RSM in future works. The important contributing factors or parameters were analysed using ANOVA, Pareto’s chart and pertubation’s plot and other diagnostic plots. The diagnostic plots such as half-normal, Cook’s distance, residual vs runs, leverage vs runs, Box-Cox, DFFITS, DFBETAS all supported the two-level factorial conclusion with the exception of potentially two outliers that meant the experiment should either be repeated again using blocks or the potential outliers removed from analysis. This significant factors in this study are well within the range reported in many acrylamide-degrading microorganisms. The significant factors obtained in this study will be further processed using Response Surface Method (RSM)


INTRODUCTION
When foods that are high in carbohydrates are cooked at a high temperature, a chemical process that is known as the Maillard reaction may take place. Acrylamide, a molecule that is capable of causing cancer as well as damage to the nervous system, could be formed as a result of this interaction. The Maillard reaction has the potential to produce acrylamide in some foods, particularly those that include a lot of carbohydrates. These are the types of meals. When sugars and amino acids are combined in the correct amounts, a chemical process known as the Maillard reaction will take place. This is the initial step in a series of processes that will ultimately lead to the production of acrylamide [1]. On the other hand, acrylamide may be produced from a variety of other carbonyl compounds [2]. Cattle and fish both perished in Sweden and Norway as a direct result of acrylamide contamination in streams in the surrounding area. In the manufacturing of adhesives, plastics, and printed materials, as well as for the treatment of drinking water, the most common application for acrylamide is in the formation of polyacrylamide, abbreviated as PAM. As of the year 2005, commercial polyacrylamides are frequently tainted by the toxic monomer of acrylamide. This situation has had a significant impact on our food supply chain as a direct result of the widespread use of these substances, and it is a direct result of the fact that polyacrylamides are commercially available. A concentration of thirty percent polyacrylamide may be found in the herbicide Roundup, which is responsible for the contamination of agricultural soil with acrylamides. In order to address this issue, which has to be addressed in order to be fixed, the acylamide in the environment needs to be remediated by a biological process [3].
Acrylamide, which has a high solubility in water, may be absorbed via the skin, the lungs, the digestive system, and even the placental barrier. Its ability to dissolve in water gives it this versatility. By analyzing the quantity of acrylamide adducts that are present in haemoglobin, it is feasible to calculate the amount of acrylamide that members of the general public are subjected to as a direct result of the occupations that they have. According to the findings, a total of 41 employees working in an acrylamide manufacturing plant had levels of neurotoxicity that were linked to the biomarker haemoglobin adducts. In the Chinese factory that produces acrylamide, the levels of haemoglobin adducts increased, which is an indication that the employees had been exposed to exceptionally high amounts of acrylamide [4]. Multiple cases of acute acrylamide poisoning have been reported in Japan as a consequence of acrylamide contamination in the country's water supply. Igisu et al. [5] reported that a well that had been polluted by a grouting operation that was 2.5 meters deep had an acrylamide content that was as high as 400 mg acrylamide/L. This finding was made after the well had been tested. According to the findings, five people who drank the acrylamide-poisoned drinking water experienced symptoms such as truncal ataxia and disorientation.
Acrylamide enters the body either by breathing in air that is polluted or ingesting or drinking anything that is contaminated in some manner. It is then either be absorbed via the mucous membranes found in the lungs, the digestive system, or the skin. On the other hand, it will be flushed out of the body once it has been metabolized [6][7][8]. The presence of acrylamide in biological fluids and the dispersion of acrylamide throughout the body both contribute to the facilitation of the impact that acrylamide has. Acrylamide is present in biological fluids. In spite of the fact that it is rapidly metabolized and eliminated after being exposed to it, acrylamide poses a risk to people and employees due to the high degree of reactivity it exhibits toward proteins. This is the case even though it is quickly metabolized and eliminated after being exposed to it. Because of this, researchers have been motivated to develop ways to eliminate acrylamide, particularly the pollution it causes in soils. However, acrylamide remediation in soils is challenging, if not impossible, due to the complex matrix of the soil. The utilization of microorganisms in the degradation and cleaning up of acrylamide is attractive due to the fact that the metabolism of microorganisms, particularly under aerobic circumstances, permits the complete conversion of acrylamide to non-hazardous water and carbon dioxide.
The Plackett-Burman (PB) experimental design is a prominent screening approach that is used to uncover key components early on in the experimentation phase, when comprehensive system knowledge is typically lacking. This method was named after its creators, Plackett and Burman. It was developed in 1946 by statisticians Robin L. Plackett and J.P. Burman with the goal of identifying active variables with the fewest feasible experiments. Two-factor interactions can be confusing to major effects when using a Plackett-Burman design. When there is little to no potential for two-way interaction, these are the kinds of designs that should be employed. Although the Plackett-Burman design is helpful in detecting large main effects in two-level multi-factor experiments with more than four factors, PB does not verify if one factor's effect depends on another, and because it is the smallest design, not enough data has been collected to know what those effects are. Because it takes into account how the different components interact with one another, the two-level factorial design is a superior strategy than the PB method in the screening step. Using this method results in a more accurate estimation of the optimal condition and calculates the interconnections between significant cultural factors. In the literature, numerous screening processes have benefitted from two level factorial design [21,[21][22][23][24][25][26]. Here we describe the use of a two-level factorial design to screen for significant factors that influence the growth of Bacillus sp. strain UPM2021n on acrylamide.

MATERIALS AND METHODS
In the course of the investigation, all of the chemical reagents were utilized in their unpurified states despite the fact that they had been manufactured in substantial quantities. In addition, the analytical quality of all of the materials that were used in this inquiry was preserved throughout the entirety of the process. Experiments were carried out in triplicate in each and every instance, unless otherwise noted in the notes that accompanied the study.
Growth and maintenance of acrylamide-degrading bacterium Bacillus sp. strain UPM2021n was isolated from a mangrove sediment near the mouth of the Juru River's bank, Penang, Malaysia in 2021 [27]. The results of an OFAT characterization for this bacterium is published elsewhere. The bacterium was maintained on Minimal Salts Medium agar supplemented with 1% glucose (w/v) as the carbon source and 0.5 g/L (w/v) of acrylamide as the sole nitrogen source and the culture was incubated at 150 rpm for 72 h at 25 ℃ on an incubator shaker (Certomat R, USA). Minimal salt medium (MSM) for growth was supplemented with 0.5 g acrylamide g/L as the sole nitrogen source, glucose 10 g/L as the carbon source, MgSO4·7H2O 0.5 g/L, KH2PO4 6.8 g/L (buffering species and source of phosphorous), FeSO4·H2O 0.005 g/L and 0.1 mL of trace elements [3]. The presence of the phosphate in the medium acts as a buffer system, maintaining a pH range that spans from 5.8 to 7.8.
Acrylamide was the only source of nitrogen that was employed for the sterilisation process, and PTFE syringe filters with a pore size of 0.45 micron were used. In order to determine the number of bacteria present, samples of one milliliter each were successively diluted in sterile tap water and plated on nutrient agar.

Screening of significant parameters using two level factorial design
The two-level factorial design is one of the best screening methods to find the relative importance of a number of different factors such as pH, temperature, acrylamide and glucose concentrations. We carried out the 2-factorial design with the five components above set at the lower level represented as code -1, and the greater higher value was represented as code 1. The response was bacterial growth, which was determined using the log CFU/mL scale. The tests were planned and carried out in accordance with the sequence that is presented in Table 1. The experiment consisted of two coded levels in a randomized trend with a total of 32 experiments with three replications of the centre points. Every experiment was carried out twice, and the results of both sets are shown below along with their means. In order to determine which of these parameters are significantly more important than the others, the data were run through a software (Design Expert 7.0, Stat-Ease, Inc.'s (trial version).

Statistical Analysis
Experiments are carried out in triplicate unless stated otherwise and values are means ± standard deviation of triplicate. When comparing amongst mean values, a one-way analysis of variance (with post hoc analysis by Tukey's test) and when comparing only two means, the Student's t-test is more appropriate. P-value of < 0.05 was considered as significant. Whenever appropriate, values will be truncated to three decimal points.

Two-level factorial design for screening the operational factors
In factor screening study, five operational parameters (pH, temperature, incubation time, acrylamide concentration and glucose concentration) were considered for a regular two-level factorial design. Within the range of minimum and maximum values that were investigated, the bacterial growth rate ranged from 5.53 log CFU/mL to 9.76 log CFU/mL. Table 2 provides an illustration of the design plan, which includes the actual values of the variables that were used in the experiment, as well as the experimental values, projected or predicted response values and residuals. Tests that evaluate the statistical significance of the model include the F-test, analysis of variance (ANOVA), and the Pvalue, and the results are shown in Table 3. The findings showed that the model is highly significant, as shown by the F value of 26.63 and the low P value of <0.0001. This is clear from the fact that the model has a low P value. Calculating the correlation coefficient (R2: 0.786, which is closer to unity) and the adjusted correlation coefficient (AdjR 2 : 0.7565), which implies that 75.65 percent of the overall variance in response data, are used to verify the model's dependability. The result for the adequacy accuracy was found to be 12.9273, which indicates that the model has an appropriate signal that can be utilized to traverse the design space. Moreover, the significance of model terms is verified by P-values <0.05 and in this case A-pH, C-Acrylamide and E-Incubation time, and the absence of interacting significant parameters. By applying two-factor interactive method, the predicted bacterial growth as the response can be obtained and given in terms of coded and actual factors equation (Table 4).  Table 5 contains an inventory of the estimated coefficients of the components that were investigated, as well as RSM's diagnostics which include the associated standard errors, confidence limits, and variance inflation factors (VIF). In the group of selected components, only incubation time and pH exhibit positive coefficients, with incubation time providing a greater positive value than pH. This suggests that both parameters have a beneficial effect on the development of this bacteria on acrylamide, with the incubation duration having a higher beneficial effect or influence than the other element. On the other hand, the coefficient estimate of the acrylamide concentration reveals a negative value, which suggests that a greater acrylamide concentration than the ideal is detrimental to the growth of this bacteria when it is fed acrylamide. The variance inflation factor, or VIF, is a statistic that determines how much a lack of orthogonality in the design increases the variance of a certain model coefficient. When specifically comparing the standard error for a model coefficient in an orthogonal design to the standard error for the same model coefficient in a VIF design, the standard error for the VIF design is greater by a factor equal to the square root of the VIF.
In general, a VIF of 1 is considered to be optimal since it indicates that the coefficient is orthogonal to the other model components; in other words, the correlation coefficient is 0. On the other hand, VIFs that are greater than 10 might raise some red flags. In addition, VIFs that are greater than one hundred are reason for concern since they indicate that coefficients were calculated incorrectly owing to multicollinearity, and VIFs that are greater than one thousand are the result of severe collinearity. The value of the variance inflation factor (VIF) was found to be 1, which suggests that the regression analysis has a significant amount of multicollinearity [28][29][30]. Based on the result obtained, out of five screened parameters, only three forms a major influential factor as obtained through two-level factor analysis. It is obvious from the Pareto charts that were created for the study of each response coefficient for its statistical significance and which are displayed in Fig. 1. Bonferroni limit line (t-value of effect: 3.467) and t-limit line are the names of the two limit lines that are used in the Pareto chart to categorize the t-value of the effect (t-value of effect: 2.043). There are three distinct categories for determining the importance of coefficients. The first coefficient to have a t-value of effect that is higher than the limit set by Bonferroni is the one that is regarded as most significant, which were A-pH, E-Incubation and the interacting factors AE-pH-incubation time. The second coefficient with a tvalue of effect that falls between the Bonferroni line and the tlimit line is referred to as coefficients likely to be significant, of which C-Acrylamide is the sole factor in this range, and the third coefficient with a t-value of effect that falls below the t-limit line is a statistically insignificant coefficient that could be removed from the analysis of which all the other factors including B-Temperature and D-Glucose that were in this range. These findings are reflected to what was found when using the coefficient estimate.
The acrylamide concentration, pH, and incubation duration were the main contributing parameters in the development of this bacteria on acrylamide. These are characteristics that have been discovered in numerous OFAT-based approaches as being crucial in contributing high growth of microorganisms on acrylamide. This work was carried out using acrylamide concentrations that were well within the range that was known to be tolerated by the majority of microorganisms capable of acrylamide degradation. Acrylamide concentrations that are greater than 1000 mg/L are normally harmful to acrylamidedegrading microorganisms [11][12][13][14][15][16][17][18][19][20][31][32][33][34][35][36][37].  The perturbation plot is useful for contrasting the influence of all of the design variables at a single location. One variable is varied over its range while the others are held constant, and the resulting response is plotted. The plot (Fig. 2) exhibits the comparative effect of all the operational parameters at a particular point in the design space. From the plot, it can be observed that factors A-pH and E-incubation period had the steepest slopes. The perturbation plot reveals the presence of interaction between the factors. Interacting effects is a feat that the Plackett-Burman screening method would not be able to detect [38][39][40][41]. In this regard, a half-normal probability plot of the residuals was generated and evaluated (shown as Fig. 3) in order to verify that the normality assumption was accurate. All of the internally studentized residuals values were found to be within 2 (with the exceptions of two at the extremes) and along the straight line, which suggests that there is no requirement for a transformation of the response. As can be seen in Fig. 4, the graph comparing the actual experimental results to the values predicted by the model indicates that there is a strong match. The Box-Cox plot, which can be found in Fig. 5, offers a helpful guidance for choosing the appropriate power law transformation based on the value of lambda. Due to the fact that the 95% confidence interval has a value of 1 that corresponds to the value that was designed into the model, it is not advised that any further transformations be made to the observed response in order to fit the model. The leverages vs run plot shown in Fig. 6 reveals that all of the acquired numerical values fall within the usual limits range of 0-1.
This indicates the possibility that a design point will have an effect on how the model fits. If there is an issue with the data point, such as an unanticipated error, a high leverage point value more than one is considered "bad" since the error has a significant impact on the model. According to the plot of leverages vs runs, there are no data that are higher than the average leverage since data that are higher than this would impact at least one model parameter. (Fig. 7). A measurement of the response outlier that is equivalent to an experimental trial may be obtained from the plot of Cook's distances. Cook's distances are values that cannot be negative, and the higher these values are, the more significant an observation is. For the majority of researchers, the threshold for determining whether or not an observation can be considered important is three times the dataset's mean value of Cook's D. The values of the Cook's distances are determined to be within a value of 1, and the diagnosis do not recommend any transformation methods. The plot of the residuals against the run data (Fig. 8) shows the presence of potentially two outliers, at runs 16 and 27 giving residual values of 1.99 and 1.26, respectively, which can also be seen in the a half-normal probability plot above. On the other hand, the plot does not reveal any signs of serial correlation, which leads one to believe that the data is random in terms of its features [21,22,25,42,43].    It's not always a problem when influential points are brought up, but it is important to follow up on observations that are marked as extremely influential. A high result on an influence measure may indicate a number of different things, including a mistake in the data input process or an observation that is clearly not typical of the population of interest and so need to be excluded from the analysis. During the process of fitting a model, the inclusion of one or more data points that are sufficiently important might cause coefficient estimations to be thrown off and muddle the model's interpretation. In the past, before conducting a linear regression, the potential of outliers in a dataset would be evaluated using histograms and scatterplots. This was done before running the linear regression. Both approaches of evaluating data points were subjective, and there was little way to determine how much influence each possible outlier had on the data representing the outcomes. This resulted in the development of a number of quantitative metrics, such as DFFIT and DFBETA. The DFFFITS algorithm assesses how much of an impact each particular example has on the value that was anticipated. It is possible to translate it to the distance according to Cook. DFFITS, in contrast to Cook's distances, can take either a positive or a negative value. When the value is "0," the point in question is located precisely on the regression line. Leverage is what makes this possible. Mathematically speaking, it is the difference between the expected value with observation and the predicted value without observation. According to the alternative formula, DFFITS is the externally studentized residual (ti) with strong leverage points multiplied by it and low leverage points reducing it [41,44,45]. The plots show the DFBETAS values ( Fig. 9) and the DFFITS values were within the cut-off values ( Fig. 10) with the exception of the two potential outliers discussed above that were above the threshold range.   To reiterate, in fundamental research, the planning of experiments frequently takes a "intuitive" approach. Experiments in biology have always been conducted on a "one factor at a time" basis (OFAT), of which all of the factors and variables are kept the same, with the exception of the thing that is being investigated, and that thing's output is analyzed. This strategy has the potential to disclose significant "major effects" in biological research, however the interactions between components will result in incorrect words. Due to the intricacy of the process, regulating a large number of input factors is required in order to get optimal results. The results of an experiment could be noisy, and there might be a lot of intriguing data coming in. In situations like this, the selection of data points may be tweaked to optimize the amount of relevant information obtained through the use of statistically based experimental design, which can result in significantly more interesting data. The basic issue structure utilized by the DOE takes into account a number of aspects that are thought to impact process output. The design of the experiment that is ultimately selected is determined by which of several feasible designs yields the most amount of expected information.
This criterion is frequently determined according to the precision or accuracy of the fitted model's estimates of the input variable or its forecasts of the output variable. In most cases, the dynamics of this partnership are complicated. Even though numerous research on process optimization have employed OFAT to increase responsiveness, it will be important to understand the connections between components in order to optimize increasingly complicated procedures. Using an OFAT strategy, one axis would be optimized first, followed by the other. If, by some stroke of good fortune, the beginning of the investigation was reasonable in the first place, then the global maximum that maximizes the output variable may be identified. One thing to keep in mind, though, is that there is a significant possibility that the search can be terminated at a local maximum or pseudo-optimum. This is the reason why RSM most often gave better results compared to OFAT [46][47][48][49][50].

CONCLUSION
The two-level factorial design was adopted in screening of five independent factors influencing the growth of a bacterium on the toxicant acrylamide. These factors include pH, temperature, incubation time, acrylamide concentration and glucose concentration. The two-level factorial design was successful in finding important contributing parameters in the growth of this bacterium on acrylamide, which were pH and incubation time that can be further optimized using RSM in future works. The important contributing factors or parameters were analysed using ANOVA, Pareto's chart and pertubation's plot and other diagnostic plots. The diagnostic plots such as half-normal, Cook's distance, residual vs runs, leverage vs runs, Box-Cox, DFFITS, DFBETAS all supported the two-level factorial conclusion, with the exception of potentially two outliers that meant the experiment should either be repeated again using blocks or the potential outliers removed from analysis. This study was carried out using a pH range well within the range reported to be optimum by most acrylamide-degrading microorganisms. Incubation time is another expected result since longer incubation time allows more growth and incubation time ranging from two to five days for optimized growth has been reported in many acrylamide-degrading microorganisms. Most of the acrylamide-degrading microorganisms grow well in nearneutral conditions, of which the results obtained in this study conforms to published literature trends.