A comprehensive artificial neural network model for gasification process prediction

• Artificial Neural Network model has been used to predict 10 key gasification outputs. • Model suitable across a wide range of feedstock, reactor, and gasifying agent options. • The importance of categorical predictor variables was demonstrated. • Optimisation routine successfully identified the best network structure.


Introduction
The provision of sustainable energy and the transition to more sustainable waste treatment methods are two major challenges of the 21st century. In 2018 The World Bank estimated an annual production of 2.01 billion tonnes of municipal solid waste (MSW), a third of which is not managed in environmentally safe ways [1]. Due to climate change concerns, non-fossil fuel alternatives for the generation of energy are of greater importance than ever. Biomass as an abundant resource has shown great potential to aid with this transition in both developed and developing countries [2].
Thermochemical processes, such as gasification, represent a suitable way to tackle both problems by recovering energy from biomass and waste feedstocks [3]. Gasification converts a carbonaceous feedstock into a raw gas, which is often referred to as syngas, composed of N 2 , H 2 , CO, CO 2 , CH 4 , C 2 H n and a solid product often referred to as biochar.
Whilst the gasification of various biomass and waste feedstocks has been studied extensively, the technology's wider environmental, economic, and social impacts need to be better understood to allow for scale-up and efficient implementation. For this, accurate process models are essential. In the past thermodynamic equilibrium, kinetic, and computational fluid dynamics (CFD) models have widely and successfully been used for this purpose. A more recent alternative has been the use of machine learning (ML) models, such as artificial neural networks (ANNs), which can effectively map highly non-linear input-output relationships and thus have the potential to predict gasification outputs more accurately than conventional models [4].
The use of ML within the field of bioenergy has just started gaining momentum. ML methods have been shown to be an effective tool to optimise thermochemical processes [5]. Prediction models can also ensure the generation of products with desired properties by identifying suitable operating conditions [6,7]. Other opportunities for innovation through ML are, among others, on-line process control and waste-toenergy network design. For instance, image data has successfully been used to quantify the major gas species concentrations in a gasifier in real-time [8]. Robust data-driven optimisation models for waste-toenergy network designs can aid with finding the most optimal wasteto-energy networks in the light of uncertainties such as varying feedstock prices [9]. A neural network can refer to many different models. In this field a simple multilayer perceptron structure has generally been employed, which is made up of neurons arranged in an input and output layer and one or more hidden layers [4,[10][11][12][13][14][15]. These existing studies showed that ANNs can predict gasification outputs with a high prediction accuracy. For example, George et al. trained an ANN model on lab scale data from a bubbling fluidised bed gasifier [11]. Their model was shown to predict the syngas production from local biomass with a high accuracy with a regression coefficient of R = 0.987 and mean squared error of MSE = 0.71. Baruah et al. developed an ANN model to predict the H 2 , CO, CO 2 , and CH 4 yields from a downdraft gasifier [12]. The model predicted the outputs excellently, with coefficients of determination of R 2 > 0.98 and an average relative error of 2.65% when compared to experimental data.
However, existing studies were generally narrow in scope and used small data sets for model training. This means the developed models are often only applicable to a few select feedstocks [11]. Additionally, existing models are unable to predict the influences of different operational choices, such as the reactor design and gasifying agent, on the gasification process. Combined with the fact that many models do not predict all important outputs, this leads to limitations regarding the applicability of existing models towards optimal system and process design. For example, Serrano and Castelló modelled the tar production from the gasification of woody biomass with good prediction accuracy (R 2 > 0.97) using an ANN [10]. They achieved this by collating a data set of 120 samples from literature. They reasoned that only samples with the same meta-data (e.g. bubbling fluidised bed reactor, reactor size between 50 and 300 mm, and operation at atmospheric pressure) could be added to the model. Shenbagaraj et al. developed a model to predict a wider range of outputs (syngas composition and yield) with a high accuracy of R 2 > 0.97 [15]. However, the developed model is specific to supercritical water gasification for a single feedstock type (food waste). The narrow scope of their work meant that only a small data set of 40 samples could be collected from literature. Li et al. trained a gradient boosting model on a larger data set of 295 data points, but their model is only suitable for hydrothermal gasification of three different waste feedstocks [16]. Similarly, Kargbo et al. developed a model for a specific reactor design and feedstock combination. Their model was trained on a small data set of 40 samples obtained from a two-stage gasifier using waste wood [17]. It has previously been identified that none of the existing ML-based gasification models are applicable across a wide range of feedstocks, gasifying agents, and reactor designs [4]. Hence, this work aims to significantly improve on existing studies by training a feedforward ANN on a large data set covering a broad range of feedstocks and operational conditions. For this, a highly generalisable model was developed. To the best of our knowledge, it is the first model applicable to a wide range of feedstock types (woody biomass, herbaceous biomass, plastics, municipal solid waste, and sewage sludge), gasifying agents (air, steam, oxygen, and combinations of these), and reactor options (fixed bed and fluidised bed). The model's generalisation capability has been maximised by the novel use of categorical data so that the model is highly practical for general gasification process design. Categorical variables (e.g. gasifying agent, reactor type, and bed material) allow factors that are critical to gasification, but which have been omitted by previous models, to be captured. To further improve the model's practicality for general process design, it was formulated to predict a wide array of outputs that are essential for further modelling such as life cycle sustainability assessment (LCSA), which studies a system's environmental, economic, and social impacts. This in turn allows policy makers and investors to make optimal decisions by knowing the wider impacts of the system. The choice of ANN input features has been optimised. Furthermore, a novel optimisation superstructure to systematically compare and identify the best performing network structures and hyperparameter choices was employed to improve on previous studies which employed trial and error methods.

Data collection and description of the data set
A data set of 312 samples was collected from literature  to train the proposed ANN model. To ensure a model with high generalisation capability, a wide range of feedstocks, gasifier types, and operating conditions were considered. The procedures for collecting data from literature were: (i) parameters were converted to the basis used in the data set (e.g. ultimate composition data given as wet basis (wb) would be converted to dry ash-free basis (daf) to fit in with the rest of the data set); (ii) the feedstock's lower heating value (LHV) was calculated from the feedstock's higher heating value (HHV) where necessary [47]; (iii) the quoted particle size refers to the lowest dimension of the particle (e.g. a pellet of 2 × 2 × 10 mm would be quoted as 2 mm); (iv) all higher order hydrocarbons (C 2 H n ) in the syngas were treated as C 2 H 4 , as C 2 H 4 is generally the dominant species [39,48]; (v) cold gas efficiency (CGE) and carbon conversion efficiency (CCE) were calculated where necessary (as defined in Appendix A). Due to this, some efficiencies of > 100% were obtained. The complete data set may be found in Appendix B.
Tables 1 and 2 summarise the predictor and target variables collected for the database. In total 18 continuous and 8 categorical predictors were collected as part of the database. Three sets of predictors (i.e. ALL, OPT, and -CAT) were considered in this work to assess the effect of different parameter combinations on model training. Predictor set ALL uses all predictors for model training, except for predictors with a significant number of missing values (i.e. the feedstock's lignocellulosic composition (cellulose, hemicellulose, and lignin) and residence time). For predictor set OPT a range of additional parameters were excluded from the data set used for model training: (i) feedstock type and feedstock shape were excluded as it was assumed that the same information was already captured by the continuous feedstock composition data and feedstock particle size data, respectively; (ii) volatile matter content and fixed carbon content were excluded as they are considered dependent variables which are related to other compositional parameters [12]; (iii) feedstock LHV, feedstock oxygen content, and feedstock nitrogen content were excluded due to their strong correlations with other input parameters as identified in Section 2.3. Predictor set -CAT is used to assess the effect of removing categorical variables from the model. Hence set -CAT is a copy of set OPT but excludes all categorical variables and the remaining set only contains continuous variables. The three different predictor combinations considered in this work are summarised in Table 3.
The target variables used in the model are all variables listed in Table 2, except for the process' CGE and CCE. These were not considered as target variables during modelling, as they often could be estimated based on other predictor and target variables. In other words, they were considered functions of some of the other predicted parameters.

Encoding of categorical variables
As previously mentioned, the use of a range of categorical variables to improve the ANNs prediction accuracy is a major point of novelty of this work. Unlike other ML methods, such as random forest, categorical variables cannot be directly fed into an ANN [49]. Thus, it is necessary to develop methods through which the categorical data can be represented by numerical values.
To the best of the authors' knowledge, only Serrano et al. previously made use of categorical variables to improve a gasification model [13]. In their study the bed material of a fluidised bed gasifier was considered a categorical variable. The authors encoded this as an ordinal variable. However, the values of the variable follow no intrinsic order or scale and, therefore, using an encoding method such as one-hot encoding would seem more appropriate. Thus, here we used one-hot encoding for our nominal variables. A categorical variable which can take on n distinct categorical values is transformed into n binary variables, where values of 1 and 0 indicate the presence and absence of the variable, respectively. This means a categorical input variable with five categories will require five input neurons. An example of this is shown in Appendix C for the variable gasifying agent. The following variables were encoded using this method: feedstock type, feedstock shape, gasifying agent, reactor type, and bed material.
All categorical variables that can only take on two values (e.g. the process' operation mode (batch/continuous), system scale (lab/pilot), and catalyst use (present/not present)) were encoded as binary values. This means the variable can take on a value of 0 or 1 (-1 and 1 after normalisation). In the current database all data came from lab or pilot scale systems and, therefore, the system scale was encoded using this method. One-hot encoding or ordinal encoding could be employed if data from e.g. industrial scale plants were to be added. Ordinal encoding simply encodes each label as an integer, retaining the feature's order. Many papers which were used to compile the presented database simply stated whether the studied system was of lab/bench or pilot scale. However, some papers quoted the system's power output instead. In this case, a system was considered of lab scale for power ratings of < 100 kW e or < 150 kW th . A pilot scale system was defined as < 1 MW e or < 1.5 MW th .
It is obvious that the addition of a catalyst can strongly affect the    [20,41,50,51]. However, it can be hard to quantify the exact effect of a catalyst numerically. Hence the authors chose a simplified model which encodes the use of a catalyst as a binary as previously explained (i. e. 0 for no catalyst present and 1 for catalyst present). This way the network can draw useful information and learn from whether a catalyst is present during the process.

Data normalisation
Data normalisation is an essential prerequisite for ANN modelling because it speeds up learning and ensures that all input features are within a common range. Eq. (1) has been used to normalise the data set between − 1 and 1: where x is the observed value and x norm is the normalised value. x max and x min represent the maximum and minimum values of the parameters, respectively. l and u are the lower and upper bound values between which the data is to be normalised. In this work, a lower and upper bound of − 1 and 1 were selected.

Mean substitution of missing values
As the database used for model training was collected from many studies (29), it is unavoidable that there are inconsistencies in the format and types of data reported and some studies do not quote all parameters collected as part of the database. However, missing values can lead to issues during model training. For instance, the authors noticed a large loss of target-prediction pairs when a significant number of missing values was apparent in the training data set. This means missing predictor data can result in the ANN being unable to make a prediction. In turn this led to fewer comparisons between the network's predicted value and the experimental target value which caused issues regarding the reproducibility of results. For this reason, the mean substitution method was employed for all predictors before model training, where a variable's missing values are replaced with the variable's mean value [52]. As the testing stage is the key stage of interest to evaluate the model's performance, this method was not employed for target variables and thus only affected model training.

Preliminary data analysis
Some initial analysis of the raw data was conducted by calculating the mean and standard deviation (SD) of variables as reported in Tables 1 and 2. Then Spearman's correlation coefficients (SCC) were calculated for the continuous predictor data to measure the monotonic relationship between data pairs. SCC was employed instead of the more commonly used Pearson correlation coefficient as the data was not known to follow a Gaussian distribution and SCC is less affected by outliers. Eq. (2) was used to calculate SCC: where N is the sample size, R(x i ) and R(y i ) are the ranks of individual samples of the two tested variables, and R(x) and R(y) indicate the mean ranks of the two variables. A SCC of 0 indicates that two parameters are not correlated. The closer to ± 1 the stronger the monotonic relationship. In this work, a SCC of ≥|0.6| was used to indicate a strong relationship which would lead to the removal of one of the parameters before model training. Fig. 1 shows the SCC matrix of all continuous predictors, except for parameters with a significant number of samples missing. The analysis of SCC led to the subsequent removal of the feedstock's nitrogen content due to a significant correlation with the feedstock's sulphur and ash contents. Additionally, the feedstock's LHV and oxygen content were removed due to their correlations with the feedstock's carbon content. Highly correlated features are removed before training as they contain similar information and retaining both generally reduces model performance by adding additional noise to the model [53].

Artificial neural network background
Conceptually, an ANN is a non-linear model which can fit any given function making it an universal function approximator [54]. It can be used to predict an arbitrary number of response variables given one or more explanatory variables. The type of ANN utilised in this work is often referred to as a feed-forward multilayer perceptron (MLP). A MLP is made up of an input and output layer and one or more hidden layers. Each layer contains nodes with weighted connections which allow for the transfer of information from one layer to the next.
Training is generally achieved through the backpropagation of errors [55]. This means, through an iterative process the optimum weights which minimise the prediction error for an independent test data set are  found. Over the years various training algorithms have been developed. In this work, Levenberg-Marquardt backpropagation (LM) and Bayesian regularisation backpropagation (BR), which are implemented as part of the MATLAB deep learning toolbox, were considered [56][57][58]. The output of a fitted ANN is given by Eq. (3).
where y k is the network's estimation of the response variable [59]. This is calculated from the sum of products of the weights w for i input variables x for the hidden nodes h. These are transferred by the activation functions f o and f h for the output and hidden nodes, respectively [60].

Artificial neural network optimisation and performance assessment
To find the best possible network configuration, five essential hyperparameters affecting the performance of the ANN were varied. The optimised parameters were: (i) number of neurons in 1st hidden layer, (ii) number of neurons in 2nd hidden layer, (iii) data split for test set, (iv) training function, (v) hidden layer transfer function. Initial tests resulted in the use of the parameter ranges and options indicated in Table 4. A test data split of 15% has frequently been employed in literature, hence a range from 10 to 20% was selected for optimisation purposes [4]. By initially considering various training functions, LM and BR were found to be the two preferred options which were further considered in the optimisation routine. Furthermore, initial tests allowed for the elimination of all but the hyperbolic tangent sigmoid (TANSIG) and logistic sigmoid (LOGSIG) function as hidden layer transfer function options. The step sizes (given in brackets in Table 4) for the numerical parameters (i)-(iii) were selected by weighing required accuracy against computational demands. Other parameters, such as the output layer's transfer function and the network's performance function were kept constant throughout and are also given in Table 4. Finally, 5-fold cross validation was used to minimise the risk of overfitting and improve the confidence in the results.
The model's prediction performance was measured by considering the root mean square error (RMSE), R 2 , and adjusted coefficient of determination (R 2 adj ). R 2 adj is a modified version of the R 2 which considers the number of predictors in the model. Whilst R 2 generally increases with the addition of new predictors, despite potentially not adding any explanatory power, this is not the case for R 2 adj . Thus, by using R 2 adj an increase in R 2 due to the addition of unnecessary predictors is avoided, allowing for a fairer comparison between models trained with different input features. Eqs. (4)-(6) describe the three performance indicators: where N is the number of samples and y o i and y p i are the observed and predicted values, respectively. y o and k represent the mean of all predicted values and the number of explanatory variables in the model (i.e. number of predictor variables used for model training), respectively.

Workflow summary
This study's workflow is illustrated by Fig. 2. The figure shows how the various steps and procedures introduced throughout Section 2 are linked to each other. It is shown how continuous and categorical variables have initially been pretreated separately before they were combined again after the normalisation step. At this point the data was ready Fig. 2. Flowchart of the procedure employed for the model development.
for model training. Data was fed into the optimisation and cross validation stage of this work which is shown by the dashed box in the bottom half of the figure. Here the cartesian matrix contains all possible hyperparameter combinations introduced in Section 2.4.2. Models have been trained and cross validated for each of these combinations until the code had iterated through all combinations. Finally, the trained models' performance has been evaluated by considering their RMSE, R 2 , and R 2 adj to find the best performing model. All data analysis and modelling work has been implemented in the MATLAB programming environment [61].

Preliminary data analysis
The raw data's range, mean, and SD were calculated and are shown by Tables 1 and 2. Some of the key findings are highlighted here. When considering the syngas composition, it was found that the nitrogen content showed the highest variation of all variables with a SD of 25.76 vol% db. This can be explained by the use of various gasifying agents which result in syngas with vastly different nitrogen contents. The syngas LHV is similarly varied for the same reason. For instance, using air as a gasifying agent produces a syngas of a lower LHV, as compared to steam or oxygen, due to a strong dilution by nitrogen.
Studying the gasifiers operational conditions reveals a high variation in temperatures with a SD of 80.88 • C. This was expected as the temperature is one of the key factors widely varied during many experimental studies. The ER on the other hand showed very little variation with a SD of 0.11. Many studies considered an ER of 0.30 which was also found to be the parameter's mean value.
Looking at the feedstocks' compositional data the particle size showed some outliers (70 mm); however most the samples were > 10 mm which led to a small SD of 6.71 mm. When studying the feedstocks' elemental composition, the carbon contents were found to differ heavily with a SD of 9.23% daf. The high SD can be explained by the large range of different feedstocks being collected as part of the database. Whilst plastic waste was found to have carbon contents of up to 86.03% daf, most herbaceous biomass feedstocks have carbon contents < 50% daf.
The number of samples collected for a given variable is another important factor to consider. ML relies on the availability of large amounts of data and models tend to have a harder time learning the relationships effecting a variable when fewer samples are available. For instance, gasification is mainly considered a technology to produce syngas. Hence, many authors did not report on the process' char yields as shown by the relatively small number of samples (87 total) collected for this variable. This in turn was found to diminish the model's prediction accuracy for this variable as further discussed in Section 3.2. Fig. 3 shows the five-fold cross validated results of the best performing model structures trained on the three different data sets described in Table 3. Results are shown in normalised form as they are shown across all output variables. Considering the best performing model trained with all predictors collected as part of the database (set ALL) resulted in a well performing model with RMSE = 0.1283, R 2 = 0.9329, and R 2 adj = 0.9199. This is clearly illustrated by the fact that the data points in Fig. 3 (a) are tightly distributed around the dotted perfect fit line with unity slope. Using an optimised selection of predictors (set OPT) yielded a model with a similarly high prediction accuracy and low error (RMSE = 0.1307, R 2 = 0.9310, and R 2 adj = 0.9254). Generally, a model with more predictors will spuriously achieve a higher R 2 than a model with fewer predictors. However, when considering R 2 adj instead, where excessive predictors are penalised, the model trained on set OPT was found to be the best performing model. The removal of all categorical variables (set -CAT), as shown by Fig. 3 (c), led to a significant drop in model accuracy as illustrated by the wider dispersion of data points from the perfect fit line and increased RMSE = 0.2126 and decreased R 2 = 0.8016 and R 2 adj = 0.7959. This highlights the importance of incorporating key categorical variables to improve the model's accuracy. In conclusion, these findings indicate that the use of a well optimised combination of predictors, including meaningful categorical variables, can significantly increase the prediction accuracy of an ANN model. A summary of these findings is presented in Table 5.

Artificial neural network model optimisation
Based on these findings, set OPT was considered as the most suitable combination of predictors for model training and the model's optimised structure is illustrated by Fig. 4. Hence models trained on this data set were further evaluated. When considering the next best performing network structures, it becomes clear that various network structures and hyperparameter choices can result in a well performing model. For example, a model trained with a double hidden layer structure of 24 and 15 neurons utilising the LM training function and a model trained with the BR training function and a single hidden layer with 16 neurons resulted in a similar predictive performance to the best performing model with R 2 > 0.92 and RMSE < 0.14. This agrees with existing literature where both BR and LM were found to be suitable training functions for the ANN modelling of various thermochemical processes [13,[62][63][64]. Furthermore, various data splits for testing and validation ranging from 12.5 to 17.5% were found to produce acceptable results. Finally, the TANSIG hidden layer transfer function was found to generally be the best choice for the trained models.
A limitation of this work is the use of lab and pilot scale data only for model development. At the time of development, no industrial scale data has been available to train the model on. It remains uncertain how well the model performs on unseen industrial scale data. However, through the novel use of categorical variables, the presented approach could easily be adapted were industrial scale data to become available as highlighted in Section 2.2.1.
In general, adding a 2nd hidden layer was not shown to improve the model's prediction accuracy. This agrees with theory and the universal approximator theorem which states that a single hidden layer ANN is sufficient to represent any given function arbitrarily well [54]. Thus, the increased complexity from adding a 2nd hidden layer cannot be justified. When it comes to the choice of training function, it was found that the BR function generally resulted in a better performing model than LM. This was in particular the case for single hidden layer network structures, albeit at the cost of an increased computational demand. Fig. 5 shows the prediction accuracy for all ten individual target outputs for the best performing model structure trained with set OPT. The model was found to predict some outputs with greater accuracy than others, as highlighted by the wider dispersion of data points for some parameters. The performance indicators for the individual model outputs are summarised in Table 6. It is clear that some factors (e.g. the process' syngas yield and H 2 content) can be predicted with higher accuracy than others (e.g. char yield and syngas C 2 H n content). The highest R 2 of 0.9791 and lowest RMSE of 0.0731 were obtained for the syngas N 2 content and tar yield, respectively. Conversely, the lowest R 2 of 0.5202 and highest RMSE of 0.3020 were obtained for the syngas C 2 H n content and char yield, respectively. Understanding and accounting for the uncertainty in certain predictions is essential for further system analysis such as LCSA.
When considering R 2 adj , only a minor decrease from R 2 was found for the outputs which the model predicted well. However, the two outputs which the model predicted poorly, the syngas C 2 H n content and char yield, showed a significant decrease. The former decreased by 13% and the latter by 21%. This decrease and poor performance can be explained by the significantly smaller sample size available for the training of these two features compared to other features. For instance, only 87 samples could be collected for the process' char yield, whereas the data set contained 312 samples for the excellently predicted features syngas LHV and H 2 content. Adding additional samples for these outputs would likely allow the model to better learn the relationships affecting the outputs, leading to an increase in prediction accuracy.
As previously mentioned, one of the key aims of this study is to create a highly generalisable model which is suitable across a wide range of feedstocks and gasifier design choices. For this reason, a like-for-like comparison between existing studies and this work is not always possible or sensible due to vastly different scopes and aims. Existing studies often developed gasifier specific models or models specific to a select few feedstocks [65][66][67]. For instance, Kargbo et al. optimised a two-stage gasification process using an ANN [17]. Their model was found to generally predict most gasification outputs excellently with R 2 = 0.99. However, the tar production was predicted less well with R 2 of 0.74-0.82. Whilst their model generally performs better than the model presented in this study, their model is only applicable for the specific The developed model could be used for a variety of applications. Its power to make predictions over a wide range of feedstock types, gasifying agents, and reactor options makes it particularly suitable for modelling and designing optimal gasification systems and process  conditions. Since the model can predict more gasification outputs than existing models, it has the potential to simplify further modelling and analysis relying on these outputs. For instance, techno-economic analysis or LCSA could be streamlined by using the predictions made by the model, which can be further combined with multi-objective optimisation to decide on optimal gasification process conditions.

Conclusions
A highly generalisable artificial neural network (ANN) model of excellent prediction accuracy (R 2 = 0.9310 and RMSE = 0.1307) has been developed to model the gasification of biomass and waste. The developed model is the first model able to predict gasification outputs over a wide range of feedstock, reactor, and gasifying agent options. Thus, investors and policy makers can quickly and conveniently compare a multitude of system options and their performance. This in turn reduces the need for costly and time-consuming experimental studies. Since the model predicts many key gasification outputs, the suggested approach has the potential to facilitate the development of integrated gasification design models by combining the developed model with e.g. life cycle sustainability assessment (LCSA), cost-benefit analysis, and multi-objective optimisation.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.