Modelling Biogas Fermentation from Anaerobic Digestion: Potato Starch Processing Wastewater Treated Within an Up flow Anaerobic Sludge Blanket

Herein, a modeling approach to predict biogas yield within a mesophilic (35 ± 1°C) upflow anaerobic sludge blanket (UASB) reactor treating potato starch processing wastewater (PSPW) for pollutant removal was conducted. HRTs and seven anaerobic process-related parameters viz; chemical oxygen demand (COD), ammonium (NH4 +), alkalinity, total Kjeldahl Nitrogen, total phosphorus, volatile fatty acids (VFAs) and pH with average concentration of 4028.91, 110.09, 4944.67, 510.47, 45.20, 534.44 mg/L and 7.09, respectively, were used as input variables (x) to develop stochastic models for predicting biogas yield from the anaerobic digestion of PSPW. Based on the prediction accuracy of the models, it was established that, prediction of biogas yield from the UASB with the combination of COD, NH4 + and HRT, or COD, NH4 +, HRT and VFAs as input variables proved more efficient as opposed to HRT, alkalinity, total Kjeldahl Nitrogen, total phosphorus and pH. Highest coefficient of determination (R2) observed was 97.29%, suggesting the efficiency of the models in making predictions. The developed models efficiencies concluded that the models could be employed to control the dynamic anaerobic process within UASBs since prediction of biogas obtained in the UASB agreed with the experimental result. using several leachate parameters [17]. In another study, Ozkaya et al. presented a neural network model for predicting the methane fraction in landfill gas originated from field-scale landfill bioreactors [18]. These models had provided more detail insight into the biological mechanism in anaerobic digestion [4]. Deterministic models also provide a good insight into the mechanism of biological relationships, but fluctuation of kinetic parameters and wastewater characteristics normally results in a laborious calibration, comprehensive computer analysis as well as laboratory work [14]. The International Water Association (IWA) Anaerobic Digestion Model No.1 (ADM1), a typical deterministic model, has been successfully used for modeling the whole anaerobic digestion process [19]. However, its mathematical complexity associated with extreme analytical difficulty of measuring kinetic parameters turns out to be laborious and time consuming [20,21]. On the other hand, stochastic based non-linear multiple regression model is preferably easy to handle as well as capable to estimate the relation between variables and numerical parameters [22,23]. The advantage of a regression based model compared to other models such as neural networks is its ability to write down relationships and to relate with underlying processes, whereas neural networks only produce an approximation that is opaque. Citation: Antwi P, Li J, Shi E, Boadi PO, Ayivi F (2017) Modelling Biogas Fermentation from Anaerobic Digestion: Potato Starch Processing Wastewater Treated Within an Up flow Anaerobic Sludge Blanket. J Bioremediat Biodegrad 8: 388. doi: 10.4172/2155-6199.1000388 Volume 8 • Issue 2 • 1000388 Page 2 of 9 J Bioremediat Biodegrad, an open access journal ISSN:2155-6199 Volatile fatty acids (VFAs) were measured by a gas chromatograph (SP6890, Shandong Lunan Instrument Factory, China) equipped with a 30 m capillary column (Stabilwax-DA, i.d. 0.32 mm, 11054, Restek) and a flame ionization detector (FID). The operational temperatures of the injection port, oven and detector were 210°C, 180°C, and 210°C, respectively. Nitrogen gas was used as the carrier gas, with a 0.75 MPa column head pressure. The split ratio was 1:50. Liquid sample of 1 mL collected from the top most sampling port was centrifuged at 13000 rpm for 3 min, and 0.5 mL of the supernatant was pipetted and acidified with 25% H3PO4 and then 1 μL of the final solution was injected. The VFAs were measured in terms of CH3COOH. A 0.5 ml of biogas was sampled from the headspace of the reactor to determine CH4 and CO2 fractions. Fraction of CH4 was analyzed by another gas chromatograph (SP-6800A, Shandong Lunan Instrument Factory, China) equipped with a thermal conductivity detector (TCD) and a 2 m stainless column packed with Porapak Q (60/80 mesh). Temperatures of the injector, column and the TCD were 80°C, 50°C and 80°C, respectively. Data preparation and correlation analysis The experimental data was used as an open database connectivity data source for the regression analysis. MINITAB (version 17) and Sigmaplot (version 13) statistical computing environment were used to carry out Pearson's correlation analysis. The few irregular biogas yield data were omitted prior to further analysis. Significances were corrected to avoid multiple comparisons [25]. A probability (p-value) less than 0.05 was used to determine the statistical significance of the regression coefficients during the correlation analysis and the prediction.


Introduction
Potato is one of the most valuable food crop grown in many countries [1]. It has been reported that, a considerable proportion of the potato cultivated globally consumed through starch processing which subsequently generates tons of wastewater that goes to pollute water bodies [1][2][3]. Wastewater of raw potato processed into starch are classified as complex wastewater [4,5], and its concentration of chemical oxygen demand (COD), total suspended solid (TSS) and volatile suspended solid of (VSS) can yield concentrations of 50000, 9700 and 9500 mg/L, respectively [6]. Arhoun et al. argued that recovering valuable resource such as bioenergy (biogas) from such wastewater to supplement energy needs will be beneficial to humans and society at large [6,7]. Anaerobic digestion has severally been reported as a successful bioprocess treating various organic wastewaters and subsequently generating biogas [7][8][9][10][11][12]. However, the biological mechanism of anaerobic digestion is not well understood due to the complexity of the bacterial community structure and bioconversion [13]. Hu et al. asserted that process modeling is a good tool for predicting and describing the performance of biological processes [13]. Other reports also confirmed that process modeling based on previously acquired data is one technical route to enhancing the performance of anaerobic processes. These process models are often developed [14,15]. Nonetheless, modeling of anaerobic digestion is quite challenging and tough because performance of anaerobic systems is complex and varies considerably with influent characteristics and operational conditions [16]. Some predictive models have been developed in the past decades for biogas estimation during anaerobic treatment processes. For instance, a regression analysis model for estimating biogas generated in a landfill leachate treatment process was developed by Akaya et al.
using several leachate parameters [17]. In another study, Ozkaya et al. presented a neural network model for predicting the methane fraction in landfill gas originated from field-scale landfill bioreactors [18]. These models had provided more detail insight into the biological mechanism in anaerobic digestion [4].
Deterministic models also provide a good insight into the mechanism of biological relationships, but fluctuation of kinetic parameters and wastewater characteristics normally results in a laborious calibration, comprehensive computer analysis as well as laboratory work [14]. The International Water Association (IWA) Anaerobic Digestion Model No.1 (ADM1), a typical deterministic model, has been successfully used for modeling the whole anaerobic digestion process [19]. However, its mathematical complexity associated with extreme analytical difficulty of measuring kinetic parameters turns out to be laborious and time consuming [20,21]. On the other hand, stochastic based non-linear multiple regression model is preferably easy to handle as well as capable to estimate the relation between variables and numerical parameters [22,23]. The advantage of a regression based model compared to other models such as neural networks is its ability to write down relationships and to relate with underlying processes, whereas neural networks only produce an approximation that is opaque. Volatile fatty acids (VFAs) were measured by a gas chromatograph (SP6890, Shandong Lunan Instrument Factory, China) equipped with a 30 m capillary column (Stabilwax-DA, i.d. 0.32 mm, 11054, Restek) and a flame ionization detector (FID). The operational temperatures of the injection port, oven and detector were 210°C, 180°C, and 210°C, respectively. Nitrogen gas was used as the carrier gas, with a 0.75 MPa column head pressure. The split ratio was 1:50. Liquid sample of 1 mL collected from the top most sampling port was centrifuged at 13000 rpm for 3 min, and 0.5 mL of the supernatant was pipetted and acidified with 25% H 3 PO 4 and then 1 µL of the final solution was injected. The VFAs were measured in terms of CH 3 COOH.
A 0.5 ml of biogas was sampled from the headspace of the reactor to determine CH 4 and CO 2 fractions. Fraction of CH 4 was analyzed by another gas chromatograph (SP-6800A, Shandong Lunan Instrument Factory, China) equipped with a thermal conductivity detector (TCD) and a 2 m stainless column packed with Porapak Q (60/80 mesh). Temperatures of the injector, column and the TCD were 80°C, 50°C and 80°C, respectively.

Data preparation and correlation analysis
The experimental data was used as an open database connectivity data source for the regression analysis. MINITAB (version 17) and Sigmaplot (version 13) statistical computing environment were used to carry out Pearson's correlation analysis. The few irregular biogas yield data were omitted prior to further analysis. Significances were corrected to avoid multiple comparisons [25]. A probability (p-value) less than 0.05 was used to determine the statistical significance of the regression coefficients during the correlation analysis and the prediction.

Model description
The general form of the models used in this study is expressed as Eq.1 [26][27][28][29][30]. The output variable y, written as a function of k, has input variables (x 1 , x 2 , …, x k ) and a random error term ε that is In this study, the evaluation, feasibility and efficiency of biogas production from anaerobic digestion of PSPW within an upflow anaerobic sludge blanket (UASB) reactor was conducted. Thereafter, dynamic multiple non-linear regression models were developed for the timely prediction of biogas yield from the UASB. The proposed models could identify the most influential parameter(s) that could guide the control and operation of anaerobic systems. Based on residual analysis and diagnostic statistics, the best fit models were identified and their output performance compared with that of experimental data. The activated sludge for inoculation of the reactor was collected from a local municipal wastewater treatment plant operating in an anaerobic-anoxic-oxic (A 2 /O) condition and were characterized as 11.5 g/L mixed liquor suspended solids (MLSS) and 5.6 g/L mixed liquor volatile suspended solids (MLVSS).

Experimental setup and reactor operation
The experiments were conducted in an UASB that was constructed with Plexiglas ( Figure 1). The reactor was 120 cm high with an internal diameter of 10 cm. There were a conical bottom of about 0.4 L and a gas-solid-liquid separator at the upper part. The total volume and the effective working volume was 8.8 and 7 L, respectively. Five sampling ports at 25 cm interval from each other were allocated along the vertical height of the reactor. The first sampling port was 1 cm above the conical bottom whiles the topmost port was 3 cm below the reactor's head.
The UASB was operated under mesophillic condition (35 ± 1°C) and the heating source was obtained from a heat conducting wire wound around the stem. The heat conducting wire was connected to a temperature controller. The diluted raw PSPW was fed to the reactor by a peristaltic pump (BT100-2J, Langer Instruments, United Kingdom). The evolved biogas was collected by the gas-solid-liquid separator and was measured daily using the wet gas meters (Model LML-1, Changchun Filter Co., Ltd., China). The reactor was started up with an inoculum of 3.52 g/L MLVSS and a hydraulic retention time (HRT) of 48 h was kept during the first 49 days. The reactor was continuously operated with a decreased HRT of 24 h since the 50 th day.

Analytical methods
The influent and the effluent of the reactor were sampled daily for the analysis of COD, ALK (in terms of CaCO 3 ), TKN, NH 4 + and TP in accordance with the Standard Methods for the Examination of Water and Wastewater, APHA [24]. pH was determined using a DELTA 320 (Mettler Toledo, USA). added to make the model probabilistic rather than deterministic. In particular, the value of the coefficient β i determines the contribution of the independent variable x i , given that the other (k−1) independent variables are held constant. β 0 represents the y-intercept. The coefficients β 0 , β 1 , …, β k are usually unknown since they represents population parameters.
The input variables can represent higher-order terms for quantitative predictors. Since most statistical tests are reliant on assumptions about the variables used [31], 5 assumptions were considered in this study which included linearity, independence among errors, nonmulticollinearity, homoscedasticity and non-autocorrelation [32]. The first-order model (Eq.2), second-order model (Eq.3) and complete second-order model (Eq.4) were used in the model development. These models comprised two or more independent variables in different combinations and interactions among the input variables and the estimated unknown coefficients (β 0 , β 1 , …, β k ) [26,33].
where β 0 is the y-intercept of (k+1); β 1 is change in y for one unit increase in For Eq.3 and Eq.4, β 1 and β 2 cause surface shift along the x 1 and x 2 axes, β 3 controls the rate of twist in the ruled surface, β 4 and β 5 controls the type of surface and the rates of curvature, respectively.

Selection of input and output variables for model fitting
Seven process related parameters obtained from the feed, together with the HRT, were used as input variables (x n ) [34]. The seven parameters were influent COD, pH, NH 4 + , ALK, TKN, VFAs and TP. Variables which did not correlate with biogas yield significantly (p >0.05) were omitted [23]. A stepwise iterative variance inflation factor (VIF) analysis as described by Mac Nally was used to evaluate all of the input variables and the VIF values less than 5 were considered [26]. Thus, the developed models would contain fewer but consistent variables.
With the detected biogas yield (BgY) in the UASB as the output variable (Y), descriptive statistics of the model variables are given in Table 1. It was noticed that some input variables fluctuated remarkably. Akaya et al., however, argued that difference in values of input parameters is a preferable positive indicator in arriving at positive results for a general biogas prediction model [17].

Evaluation and selection of the models
All of the input variables were used to obtain the regression coefficients Five unique model equations with multiple input variables in various combinations and interactions were developed and selected based on Goodness-of-Fit [35]. Nine regression coefficients, i.e., β 0 , β 1 , …, β 8 , were considered and its respective values were estimated and used in the models. The optimum models were selected based on the following statistical performance criterion: standard error of the estimate (SEE) [36] (Eq.9), sum of squared residuals (SSR) (Eq.8), coefficient of multiple determination (R 2 ) (Eq.6), adjusted coefficient of multiple determination (Adj-R 2 ) [37] (Eq.7), VIF (Eq.10), Durbin-Watson statistics (DWS) (Eq.11) and p-value [15] (Eq.12).
where, Y o , Y p and Y denotes experimental data, predicted values and arithmetic mean of the observed data; n and m is the number of data points and parameters in the regression model, respectively; k is the number of independent regressors excluding the constant term; î l e yi y = − , and y i and ˆl y were, respectively, the observed and predicted values of the response variable for individual i; TS is random variable associated with the assumed distribution; ts is the test statistics calculated from sample, and cdf is the cumulative density function of the assumed distribution.

UASB performance
Performance of the UASB treating PSPW at 35 ± 1°C with HRTs of 48 h and 24 h by stages was presented in Figure 2 and Table 2. With an average influent COD of 3799 mg/L and an average organic loading rate (OLR) of 1.50 kgCOD/m 3 ·d for HRT 48 h, the effluent COD averaged 267 mg/L with a removal ranged from 83.5% to 92.0% was obtained in the reactor (Figure 2a). As the influent COD was increased to about 4185 mg/L along with the shortened HRT of 24 h, the COD removal ranged from 90% and 94.5% with an effluent COD of about 280 mg/L, though the OLR had been increased to about 4.23 kg COD/m 3 ·d. The higher COD removal at HRT 24 h resulted in an increase in biogas yield in the UASB. As shown in Figure 2b, the influent and effluent pH ranged from 5.35-8.05 (mean pH 7.00) and 7.35-8.86 (mean pH 8.00) for HRT 48 h and 24 h, respectively. The illustration in Figure  2c depicted biogas yield that ranged from 3.4 to 9.6 L/d obtained at HRT 48 h, while 11.3 to 17.4 L/d in HRT 24 h. The methane fraction throughout the performance of the reactor ranged from 56.2% and 84.5%.
Throughout the operation of the UASB, observed pH in both HRTs were almost similar in value even though a remarkable difference in ALK was observed in the reactor. Figure 2d indicated that no observable difference in NH 4 + concentration was found when the reactor was operated at HRT 48 h or 24 h, with an influent and effluent concentration averaged 109 and 241 mg/L, respectively. The average influent and effluent ALK at HRT 48 h were 6010 and 10948 mg/L, while that of 3592 and 8638 mg/L for HRT 24 h, respectively (Figure 2e). The feasible pH and ALK enhanced the acetogenesis and methanogenesis in the reactor, resulting in the few VFAs observed in the effluent [38].
The average influent and effluent TKN at HRT 48 h were found to be 466 and 307, respectively (Figure 2f). With the shortened HRT 24 h, the influent and effluent TKN were increased to about 518 and 507, respectively. Within the 112 days' operation, the UASB showed no TP removal with the same concentration of about 45 mg/L in both influent and effluent (Figure 2f).

Correlations between output and input variables
Correlation analysis was performed during the data preparation to identify the potential input variables to build the model. The results as shown as Table 3 showed that influent COD, pH, NH 4 + , ALK, TKN, VFA, TP and HRT had remarkable influence on the biogas yield in the UASB. The eight variables correlated with biogas yield were therefore used as input and output variables in the models. Observably, NH 4 + was the only variable included in all model types (Eq.13 to Eq.17), but it has seldom been used in predictive models before [23,39].
The final structure of the model equations expressed in Eq.18 and Eq.19 were rewritten and given in Eq.20 and Eq.21, respectively.
Accordingly, independent variables x 1 , x 2 , x 7 and x 8 were used in M3 and M4 as shown in Eq.18 and Eq.19. Table 4 showed the results of the diagnostics statistics and performance criterion. Obviously, COD, NH 4 + , VFAs and HRT were more useful than ALK, TKN, TP and pH in model M3 (Eq.15) and M4 (Eq.16). Conversely, NH 4 + , ALK, TKN, TP and VFAs were more important in M1 (Eq.13). On the other hand, COD, NH 4 + , TKN and HRT related better in M2 (Eq.14). Model M5 (Eq.17) engaged all eight variables as illustrated in Table 4.
Based on the diagnostics statistics, p-values associated with the variables in model M1, M2, M3 and M4 were statistically significant (Table 4). However, the entire variable set in M5 recorded high p-value. In particular, pH, TKN, ALK and TP used in M5 were 0.75, 0.74, 0.8 and 0.9 respectively, which were extremely > 0.05. This phenomenon was evident to conclude that M5 was not a good model to be considered by all standards although its R 2 yielded 97.30%. On the contrary, the multiple coefficient of determination (R 2 ) for M3 and M4 were 97.29% and 96.99%, respectively, with only 2.71% and 3.01% of the total variations not explained by both models in predicting biogas yield. For M1 and M2, about 13.46% and 3.85% of the variation existing among dependent variables were respectively not explained by these models, suggesting biogas yield predicted by M1 and M2 were unfit to the experimental data.
As shown in Table 4, the VIF for HRT and TP as regressors in M5 were 10.07 and 9.61 respectively. These huge values violated the assumptions specified in this study. In clarity, the VIF values confirmed that HRT and TP were highly correlated multicollinearity. Similarly, in M1, interaction variable of x 2 x 7 (NH 4 + and VFAs) exhibited some multicollinearity as a value of 6.26 was estimated as VIF. In terms of   The correlation analysis demonstrated that some input variables correlated significantly with the biogas yield (r = 0.31 to 0.98) whiles others correlated poorly with r ranging from 0.01 to 0.18. Similarly, influent pH (x 3 ), ALK (x 4 ), TKN (x 5 ) and TP (x 6 ) were non-significant as their p values were relatively high (p>0.05). All other variables had a p<0.001 or <0.05 (Table 3). Among the input variables, COD (x 1 ), x 5 , x 6 , and the product terms of COD and TKN (x 1 x 5 ), TKN and TP (x 5 x 6 ) and COD (x 1 2 ) were positively correlated with the biogas yield. On the other hand, influent NH 4 + (x 2 ), pH (x 3 ), x 4 , total VFAs (x 7 ), HRT (x 8 ) and the interaction terms x 1 x 2 , x 1 x 8 , x 2 x 5 , x 2 x 4 , x 2 x 6 , x 2 x 7 , x 6 x 7 , and x 2 2 were all negatively correlated with biogas yield.

Variable importance and model validation
There were 8 variable predictors (x 1 , x 2 , x 3 , x 4 , x 5 , x 6 , x 7 , x 8 ) in the proposed 5 models (Eq.13 to Eq.17), and the entire sets of explanatory variables for the 5 models differed partially. Among the 8 variable predictors, NH4+ was noticed as a common predictor. Regression coefficients determined from the multiple regression analysis were

Further analysis and applicability of selected optimum models
Base on the results shown in Table 4, M3 (Eq.15) and M4 (Eq.16) had been identified as the best models among the 5 proposed models in predicting biogas yield of the UASB. Residual analysis was carried out to determine the adequacy of the models and their compliance with the assumptions of regression. The normal probability plots for M3 ( Figure 3a) and M4 (Figure 3e) showed some minimum deviations of data points from the straight line at the extremes. The less visible pattern observed on the plot of standardized residuals against the fitted (predicted) values (Figure 3b and 3f), verified the assumption that the residuals are randomly distributed and has constant variance.
In terms of residual distribution with histogram, M3 ( Figure 3c) and M4 (Figure 3g) were relatively well distributed with no trace of outliers. The standardized residuals for M3 and M4 are shown in Figure  3d and 3h, respectively. No obvious increasing or decreasing, cyclical or sudden shift of the data points was observed. The residuals versus order plot verified the assumption that the residuals were independent from one another. The linear regression analysis of the MnLRM output and the corresponding experimental data had a relatively good cohesion (Figure 4). The head-to-head comparisons of predicted data versus experimental data illustrated that model M3 (Figure 5a) and M4 (Figure 5b) were in perfect agreement, indicating their effectiveness in making predictions from the UASB treating PSPW.
Above all, the developed model M3 and M4 could serve as a valuable and practical management tool that could support the control of anaerobic wastewater treatment processes for biogas generation. All parameters used in the model development could be obtained from experimental observations or by rapid measurements. Nevertheless, to ensure reliable performances, the introduced model(s) could be fitted with large dataset to offer a significant improvement in prediction accuracy. Therefore, future research should target at collecting prolonged time-series data to improve the model performance and to minimize effects, errors or/and possible unrealistic predictions.

Conclusion
The UASB was feasible and efficient in treating potato starch processing wastewater. With an average organic loading rate (OLR) of 1.50 kg COD/m 3 ·d, COD removal efficiency ranging from 83.5% to 92.0% was obtained when HRT was 48 h. As the influent COD was increased to about 4185 mg/L along with the shortened HRT of 24 h, the COD removal reached 94.5%, although organic loading rate (OLR) had been increased to about 4.23 kg COD/m 3 ·d. The higher COD removal at HRT 24 h resulted in an increase in biogas yield in the UASB. Biogas yield at HRT 48 h ranged from 3.4 to 9.6 L/d, whiles 11.3 to 17.4 L/d were observed at HRT 24 h. The methane fraction throughout the performance of the reactor reached 84.5%. No signs of acidity were encountered in the UASB as effluent pH observed ranged from 7.35-8.86 (mean pH 8.00) for both HRT of 48 h and 24 h.
To predict the biogas yield in the UASB treating potato starch processing wastewater (PSPW), the dynamic relationship among PSPW parameters, reactor operational parameters and the biogas yield were modeled based on MnLR model and validated with residuals analysis. Among the 5 developed models, M3 and M4 were identified as the optimum ones due to their superior predictive performance on biogas yield. The R 2 emerged from M3 and M4 were 97.29% and 96.99%, respectively. COD, NH 4 + , VFAs and HRT were the most useful and favourable predictive parameters compared to ALK, TKN, TP and pH. Both model M3 and M4 turned out to be a good tool for predicting biogas yield in UASBs. These models can also contribute to the understanding of the factors that influence anaerobic processes, and subsequently be used as a guide to control the processes to enhance biogas yield.