Differences in the ASM model caused by data structure

The process of designing and exploiting municipal sewage treatment plants has become much simpler and more efficient thanks to mathematical modeling. The ASM model family is able to simulate the operation of existing or designed objects in a satisfactory manner. The basic problem in Poland is the insufficient amount of data for simulations coming from plant monitoring. It is provided to create unstable model results with difficulties in calibration and validation. The aim of this article is to confirm how the amount of data and its completeness will affect the quality of the simulation performed in the ASM model. The study object is a sewage treatment plant located in Chicago in the USA. It is a sewage treatment plant operating with activated sludge technology, with regular monitoring of the quality of raw and treated wastewater. For modeling, a variant of the ASM model built into the BioWin 5.2 software was used.


Introduction
Wastewater treatment plants (WWTP) are large and complicated systems.To design welloperated and economically reasonable treatment plant, a mathematical aided model is necessary [1].Also, rational exploitation requires decisions made on the basis of mathematical models.Activated sludge models (ASM) and models being their development version belongs to the most popular used in the design, service, and analysis of a wastewater treatment plant [2].ASM models contain equations for biological and chemical reactions, interactions between bacteria, and nitrogen and phosphorous transformation [3].The problem is that such a complicated system needs a large amount of data.A precise measurement campaign lasts a long time and is expensive.There are some guidelines on how many measurements should be taken, from a week with hourly interval to months with daily intervals [4].However, in Polish realities, such conditions are difficult to accomplish.The aim of this work is to verify how the size of the dataset will affect model results.Data for the study was collected from Calumet WWTP in Chicago.In the first stage of the study, the whole treatment plant was scaled to one single line with three aeration tanks, to simplify the model and to give it much more visual clarity.In the second stage, seven different exploitation variants were created (the ratio of returned sludge was changed).The aim of this study was to verify how the period of modelling will affect calculation results in different exploitation parameters.The third stage of the study was to verify how much shortened data will affect the final output results.Nine new datasets were created from a raw initial dataset (containing measurements over 365 days).Three sets with 50, 100, 200 random measurement days were created.Then, with this datasets model the dynamic simulation was run.

Description of the Chicago Calumet WWTP
The wastewater treatment plant (WWTP) investigated in this study is located in the city of Chicago, Illinois, USA.The Calumet Water Reclamation Plant (CWRP) on the South Side is the oldest of the seven water treatment facilities in the Chicago area.In operation since 1922, it serves a population of more than 1 million people.The Calumet plant is designed for an average influent dry-weather flow rate of 1.34 million m 3 d -1 , the design maximum flow is 1.63 million m 3 d -1 .The plant is composed of three parallel lines called AB, C, and E1E2.Primary treatment units are 6 screens, 8 grit tanks and 12 primary clarifiers (46,5 m diameter, 4.65 m side wall depth).The main treatment line is divided into three routes.Line The anoxic and anaerobic zones are not assigned.Sludge return is external, from the secondary clarifier to the beginning of the aeration tanks.The last stage of the treatment process is disinfection using chlorine [5].For the simulation study, only routine operational data from the Calumet WWTP were used, no additional measurements were performed.
Table 1.Descriptive statistics of raw wastewater from the Calumet WWTP.

Simulation environment and model configuration
In this work, simulations were carried out in BioWin® software version 5.2 developed by Envirosim® Canada.BioWin is a user-friendly platform, with a visual style (flow layout design).BioWin uses ASDM (Activated Sludge Digestion Model) for calculations.Presently, BioWin is one of three main types of software for modeling activated sludge in wastewater treatment plants, with two others being GPS-X® by Hydromantis® and Mike WEST® by DHI.
All simulation processes were carried out with standard equations and coefficients.No additional models combined with the ASDM model.Due to the plant`s large scale (48 aeration tanks, 52 secondary clarifiers), some reductions were taken.Whole treatment plant system and sewage flow were scaled to one technological line which contains: one grid chamber (volume -1570 m3, percent of capture inert suspended solids -65%), three primary settling tanks, three aeration chambers: first with AB line sizes, second with C line sizes third with E1E2 line sizes, the flow was split proportionally: 40% to the first and third reactor, and 20% to the second reactor.That kind of simplification makes the model much more useful and does not make changes to the calculation results.The aeration parameters were constant, the setpoint of dissolved oxygen was 2 [mgO2 dm -3 ].Return sludge in the secondary clarifiers was set up from a 0.5 to 2.5 ratio.All equation coefficients and fractions were set up as default.The calculation interval was set as one day, the time period was set to 365 days, and steady-state and dynamic (with seed values) simulations were performed.The Calumet WWTP configuration in the BioWin software is presented in Fig. 1.The blue line is wastewater, the dotted blue line is the returning sludge process, and the green lines are for sludge.

Statistical methods
A significance level α = 0.05 was used in all analysis.CRAN R [6] software with RStudio GUI [7] was used for all analysis.For the prepared random data sets, a formula sample was used.Goodness-of-fit (GOF) measures between the observed and simulated values were performed with the hydroGOF R package [8].Five statistics were used: MAE, RMSE, PBIAS, d, and KGE.

MAE, Mean Absolute Error is a model evaluation metric used with regression models.
A smaller value indicates better model performance.
RMSE, Root Mean Square Error gives the standard deviation of the model prediction error.
A smaller value indicates better model performance.
PBIAS, Percent BIAS, measures the average tendency of the simulated values to be larger or smaller than observed values.0.0 is the optimal value.
d, Index of Agreement developed as a standardized measure of the degree of model prediction error and varies between 0 and 1.A value of 1 indicates a perfect match, and 0 indicates no agreement at all.The authors present d as quite flexible and making it applicable to a wide range of model-performance problems [9].
KGE, Kling-Gupta Efficiency is a variant (decomposition) of Nash-Sutcliffe Efficiency (NSE), which analyzes the importance of components like correlation, bias, and variability.A value brought closer to 1, indicates better model performance [10].
Where: µS and σS are the mean and standard deviation of simulated values, while µO and σO are the mean and standard deviation of observed values.

Results
Due to the scarcity of output data, the goodness-of-fit models were tested for four parameters: total nitrogen -TN [mgN dm -3 ], total phosphorous -TP [mgP dm -3 ], total Kjeldahl nitrogen -TKN [mgN dm -3 ] od pH [-].Actual wastewater output statistics are in Table 2. To see how the exploitation parameters will change model results, the return sludge ratio was changed in every trial.The ratio had values: 0.5, 0.8, 1, 1.2, 1.5, 2.0, 2.5.Fit statistics between observed and the simulated data are presented in Table 3.For TN, TP and pH there were no great differences between the returning sludge ratio.Only TKN values changed across ratio differences.The statistics values of ratio 1.5 and 2 are nearly the same and got a better score than the rest of variants.Overall, there are no significant differences between the results, although the best results have a 1.5 ratio, so this configuration was used in model trials with shortened random data.

Conclusions
The created models were the "quick ones", without a special monitoring plan and changes in the monitoring program.No special calibrations or fractioning were done.The time cycle (365 days) and data interval (1 day) were quite large as an ASM model.The results from this model are even satisfying.The first conclusions are that in the case of specific data placement, parameters such as pH, TP or TN do not undergo significant change under the influence of a specific treatment process.
Only Kjeldahl nitrogen was sensitive to changes in the returning sludge ratio.The second conclusion concerns pH values, normally pH is a difficult modelling parameter [11], but in a situation were the range is small (in Calumet is from pH to 7.4), BioWin simulates pH values quite easily.The third conclusion, a big dataset gives better simulation results, in the case of one-day interval and one-year period differences between 365 days and 200 are not large, but the 50-day version gives the poorest results.
The Metropolitan Water Reclamation District of Greater Chicago (MWRD) for sharing the data of the Calumet WWTP.
AB contains: 22 conventional one-pass aeration tanks (size 128 m L x 10.35 m W x 4.65 m D), 16 secondary clarifiers (27.3 m L x 27.3 m W x 3.6 m D), and 8 radial secondary clarifiers (26 m diameter, 3.6 m side wall depth).Line C contains 6 conventional aeration tanks (79.5 m L, 10.2 m W, 4.5 m D) and 8 secondary clarifiers (33 m diameter, 3.9 m side wall depth).Line E1E2 contains 20 conventional one-pass aeration tanks (size 128 m L x 10.35 m W x 4.65 m D) and 10 secondary clarifiers (45 m diameter, 4.5 m side wall depth).

Table 2 .
Descriptive statistics of clear wastewater (output) from the Calumet WWTP.

Table 3 .
Statistics of goodness-of-fit between ratio variants.