Analysis of the Correlation between Combustion Products in Biomass Thermal Power Plant Using Association Rule Mining

: The biomass combustion process is inevitably accompanied by the emission of pollutant gasses. This paper gives a comprehensive analysis of the external variables and combustion products of the biomass plant. The analyzed data were collected from 18 MWt boiler in combined heat and power plant Sremska Mitrovica over a period of four months. The correlations between the recorded data were determined using a unique methodology, which is based on association rule mining. The results of the study can be further used for the reduction of the harmful combustion products, as well as for the optimization of the operation process.


Introduction
Climate changes are caused by nature and human activity, which are mostly related to energy production and industry [1]. Furthermore, the constant population growth implies an increase in energy consumption [2][3][4]. The combustion of fossil fuels has a harmful effect on the environment, primarily due to the emission of CO 2 and air pollution, which can lead to significant climate changes in the future. The common goal of energy policy is to maintain the reliability of energy supply, which today mostly depends on import, price and availability of fossil fuel. Additionally, the effect of greenhouse gasses has gained public and political attention, which leads to the even greater importance of energy supply [5]. The limited reserves of fossil fuels impose the need for new approaches and technologies. Based on these reasons, renewable energy sources (RES) have been attracting attention over the past 30 years [4][5][6]. Besides energy, renewables are also used to obtain chemicals and fuels. The lignocellulosic biomass has increased over the last years and became an important sector for the chemical industry [7,8]. Globally, the use of RES systems is one of the strategies for mitigating CO 2 emissions, which is better known as conventional mitigation efforts (CME) in the literature. Besides CME, two other approaches are used in order to mitigate climate change. The second strategy, negative emissions technologies, includes bioenergy carbon capture and storage, soil carbon sequestration, etc. The third strategy, radiative forcing geoengineering, which is still more theoretical, is focused on the stabilization of the temperature or reduction [1,9]. the combustion process by-products. Correlation between these parameters, to the best of the author's knowledge, has not been sufficiently researched.
In this paper, the results of the long-term measurement of the biomass combustion process and products together with external variables of 18 MWt boiler in CHP plant "TE-TO Sremska Mitrovica" (CHP Sremska Mitrovica) are analyzed. The collected data were processed, and association rule patterns were extracted from the processed data. Some dependencies of the recorded parameters were determined, thus creating a knowledge platform for optimization of the biomass combustion process.

Overview of CHP Sremska Mitrovica
The Republic of Serbia aims to utilize locally available energy sources, such as biomass, which represents a huge potential in the country. In addition, according to European Directive 2009/28/EC, Serbia has undertaken to provide at least 20% of the total final consumption of electricity from RES by 2020 [30]. A significant contribution to these efforts has been made by CHP Sremska Mitrovica [31], located on the territory of municipality Sremska Mitrovica, AP Vojvodina. The facility generates energy in a modern cogeneration process, which enables savings of fuel up to 25% compared to the separated generation of heat and electricity. CHP Sremska Mitrovica, parts of which are shown in Figure 1, was initially projected as an industrial heat plant for supplying industrial complex and city of Sremska Mitrovica with heat energy. Upon closure of the industrial complex in 2008, CHP Sremska Mitrovica works primarily as a heating plant for the city, as well as a backup source in electrical energy production for PU Elektroprivreda Srbije. To achieve economic progress in terms of heat energy production, in 2012, a hot water boiler plant was built. After this, CHP Sremska Mitrovica uses biomass (sunflower seed shells) as fuel for the city heating in this newly installed boiler, instead of natural gas. The installed boiler power is 18 MWt and is the first and currently only high-power boiler in Serbia which uses biomass. For now, sunflower seed shells are used, but the boiler construction can adapt to other types of biomass. Hence, the achieved price of heat production is about 20% lower than it was by using natural gas as fuel. The total technical capacity of CHP Sremska Mitrovica implies two steam boilers, two auxiliary steam boilers, one hot water gas boiler of 15 MWt and one hot boiler of 18 MWt for heat production based on biomass. Furthermore, it contains a thermal turbine with a nominal electrical power of 32 MW and a flow of technological steam of 150 t/h. Energies 2020, 13, x FOR PEER REVIEW 3 of 13 In this paper, the results of the long-term measurement of the biomass combustion process and products together with external variables of 18 MWt boiler in CHP plant "TE-TO Sremska Mitrovica" (CHP Sremska Mitrovica) are analyzed. The collected data were processed, and association rule patterns were extracted from the processed data. Some dependencies of the recorded parameters were determined, thus creating a knowledge platform for optimization of the biomass combustion process.

Overview of CHP Sremska Mitrovica
The Republic of Serbia aims to utilize locally available energy sources, such as biomass, which represents a huge potential in the country. In addition, according to European Directive 2009/28/EC, Serbia has undertaken to provide at least 20% of the total final consumption of electricity from RES by 2020 [30]. A significant contribution to these efforts has been made by CHP Sremska Mitrovica [31], located on the territory of municipality Sremska Mitrovica, AP Vojvodina. The facility generates energy in a modern cogeneration process, which enables savings of fuel up to 25% compared to the separated generation of heat and electricity. CHP Sremska Mitrovica, parts of which are shown in Figure 1, was initially projected as an industrial heat plant for supplying industrial complex and city of Sremska Mitrovica with heat energy. Upon closure of the industrial complex in 2008, CHP Sremska Mitrovica works primarily as a heating plant for the city, as well as a backup source in electrical energy production for PU Elektroprivreda Srbije. To achieve economic progress in terms of heat energy production, in 2012, a hot water boiler plant was built. After this, CHP Sremska Mitrovica uses biomass (sunflower seed shells) as fuel for the city heating in this newly installed boiler, instead of natural gas. The installed boiler power is 18 MWt and is the first and currently only high-power boiler in Serbia which uses biomass. For now, sunflower seed shells are used, but the boiler construction can adapt to other types of biomass. Hence, the achieved price of heat production is about 20% lower than it was by using natural gas as fuel. The total technical capacity of CHP Sremska Mitrovica implies two steam boilers, two auxiliary steam boilers, one hot water gas boiler of 15 MWt and one hot boiler of 18 MWt for heat production based on biomass. Furthermore, it contains a thermal turbine with a nominal electrical power of 32 MW and a flow of technological steam of 150 t/h. The simplified scheme of the CHP "TE-TO Sremska Mitrovica" is shown in Figure 2, with symbols that are in accordance with EN ISO 1062 and EN 62,424 standards. Furthermore, it is necessary to monitor the temperature in certain parts of the combustion process. The temperature in the cyclone pre-firebox is 279 °C, while the flame temperature ranges from 1100 to 1200 °C. The outlet temperature of water from the biomass boiler is 79 °C. That water is mixed with return water from the city in a ration of 2:1, thus the temperature at the outline line, which leads to the city, is 64 °C. The temperature of the return water from the city has a temperature of approximately 49 °C. The simplified scheme of the CHP "TE-TO Sremska Mitrovica" is shown in Figure 2, with symbols that are in accordance with EN ISO 1062 and EN 62,424 standards. Furthermore, it is necessary to monitor the temperature in certain parts of the combustion process. The temperature in the cyclone pre-firebox is 279 • C, while the flame temperature ranges from 1100 to 1200 • C. The outlet temperature of water from the biomass boiler is 79 • C. That water is mixed with return water from the city in a ration of 2:1, thus the temperature at the outline line, which leads to the city, is 64 • C. The temperature of the return water from the city has a temperature of approximately 49 • C.

Biomass Characteristics Used in the Respective CHP
The efficient combustion process of biofuel implies that almost all of the organic material is transformed into CO2 and H2O [32]. Therefore, only a small amount of residue, mostly inorganic material, is formed. On the other hand, in the case of inefficient combustion, which is almost always the case in practice, a significant amount of powdery substances arises as a result of incomplete combustion of fuel, which significantly increases the emission of these pollutants. According to the above explanations, the incomplete combustion process may occur due to inadequate mixing of fuel and oxygen in the combustion chamber, an insufficient amount of oxygen in the furnace, low combustion temperature, or insufficient fuel residence in the combustion zone. Based on these facts, it can be concluded that the high efficiency of boiler plants is in close correlation with the reduction of pollutant emission [33]. Numerous studies show that the construction of the combustion chamber, control of the process and appropriate selection of biomass fuel can increase the efficiency of the combustion process and thus considerably reduce the amount of pollutant emission [34,35].
The boiler in the respective CHP is designed as a diaphragm boiler with a cyclone pre-firebox. A gas burner is placed at the head of the pre-firebox used to heat the pre-firebox and to create conditions for the sunflower seed shells to burn. When the desired temperature is reached, the sunflower seed shells are inserted into the pre-firebox, while the gas burner is extinguished. The sunflower seed shells start combustion in a vortex motion in the cyclone pre-firebox and continue combustion in the boiler firebox. The flue gasses from the boiler firebox pass through the pipe network at the exit of the boiler and turn into a descending convective tract containing five packages of water heaters designed as pipe snakes in the corridor and chess layout. The boiler is operated with an underpressure in the firebox. Under the firebox and convective tract are placed funnels in which the remaining ash is collected. Funnel ash falls on snails, which further take the ashes to containers outside the boiler room and transport them to the landfill. The connection between funnel and container is gas tight.

GAS GAS
Air "II"

Biomass Characteristics Used in the Respective CHP
The efficient combustion process of biofuel implies that almost all of the organic material is transformed into CO 2 and H 2 O [32]. Therefore, only a small amount of residue, mostly inorganic material, is formed. On the other hand, in the case of inefficient combustion, which is almost always the case in practice, a significant amount of powdery substances arises as a result of incomplete combustion of fuel, which significantly increases the emission of these pollutants. According to the above explanations, the incomplete combustion process may occur due to inadequate mixing of fuel and oxygen in the combustion chamber, an insufficient amount of oxygen in the furnace, low combustion temperature, or insufficient fuel residence in the combustion zone. Based on these facts, it can be concluded that the high efficiency of boiler plants is in close correlation with the reduction of pollutant emission [33]. Numerous studies show that the construction of the combustion chamber, control of the process and appropriate selection of biomass fuel can increase the efficiency of the combustion process and thus considerably reduce the amount of pollutant emission [34,35].
The boiler in the respective CHP is designed as a diaphragm boiler with a cyclone pre-firebox. A gas burner is placed at the head of the pre-firebox used to heat the pre-firebox and to create conditions for the sunflower seed shells to burn. When the desired temperature is reached, the sunflower seed shells are inserted into the pre-firebox, while the gas burner is extinguished. The sunflower seed shells start combustion in a vortex motion in the cyclone pre-firebox and continue combustion in the boiler firebox. The flue gasses from the boiler firebox pass through the pipe network at the exit of the boiler and turn into a descending convective tract containing five packages of water heaters designed as pipe snakes in the corridor and chess layout. The boiler is operated with an underpressure in the firebox. Under the firebox and convective tract are placed funnels in which the remaining ash is collected. Funnel ash falls on snails, which further take the ashes to containers outside the boiler room and transport them to the landfill. The connection between funnel and container is gas tight.
Prior to the analysis of the interaction of all parameters, it is necessary to carry out an analysis of the content of the used biomass sample, in order to determine all relevant ingredients. The sunflower seed shells were used as research material in this study. Therefore, the physicochemical analysis of the sunflower seed shells used in CHP Sremska Mitrovica was performed at the Mining Institute Ltd. Belgrade in the solid fuel laboratory. The delivered sample was prepared according to SRPS B.H9.003:1983 standard and submitted to the institute. The physicochemical analysis of sunflower seed shells, which consists of technical and elementary analysis, is performed once a month on one sample. Within this paper, technical analysis of the sunflower seed shells sample is presented, which includes the analysis of moisture content, amount of ash, sulfur, coke, carbon, heat of combustion and fuel analysis. Each analysis was performed for three scenarios: with total moisture, without moisture and without moisture and ash. The total moisture is the amount of moisture contained in the biomass before the sample is prepared for analysis. The tests were performed in accordance with the appropriate standards, which include the indirect gravimetric method, the Eski method and the calorimetric bomb method, among others. One technical analysis of a sunflower seed shell sample is shown in Table 1. Volatile, with a share of 67.47-77.19%, includes gases such as light hydrocarbons, carbon monoxide, carbon dioxide, hydrogen and moisture [36]. From the shown fuel analysis, it can be concluded that carbon, nitrogen and oxygen contents have the highest value. Specifically, carbon content varies from 45.07% to 51.56%, while nitrogen and oxygen range from 37.22% to 42.58%. Behind them is hydrogen with a share of 5.5% and combustive sulfur with 0.01%. It is important to emphasize that this is the analysis of one sunflower seed shell sample and these numbers can vary.

Data Analysis
Association rule mining was used as a methodology for data analysis. The goal of the data analysis process is to yield rules, i.e., data patterns in the form of X ⇒ Y. X ⇒ Y holds true for a given record that contains all the items from Y if all the items from X are present. Rules were generated against the given dataset using the implementation of the a priori algorithm from the arules package for R. Each rule was evaluated based on its statistical significance. Significance was measured by following the process, which included creating a contingency table for each candidate rule and applying the statistical test.
Fisher's exact test was used as a statistical test instead of χ 2 because of its better properties when there are very small values in the contingency table.
Over the period of four months, data were collected at CHP Sremska Mitrovica. From available daily reports of four months, as shown in Figure 3, a Microsoft Excel spreadsheet file with hourly measurements was created for each month. In addition to measurement date (Date) and time (Hour), the records in these files contain values for twelve variables: external temperature in • C (ExtTemperature), heat production in the biomass hot water boiler in MWht (Heat), consumed biomass in t (Biomass), SO 2  The collected data were processed before analysis and the processed data were mined for patterns in the form of association rules. Data processing was performed in two steps. In the first step, data were extracted from the four source spreadsheet files, cleaned and transformed. The resulting data were saved in a single comma-separated values (CSV) file to facilitate additional processing and analysis. Except for the date and time variables, variable values in the spreadsheet files are predominantly numeric. However, there are some instances containing character values as well. These values had to be replaced so that analyses could be performed on purely numeric variables. The relevant character values comprise "W", which denotes that the equipment was under maintenance during measurement; "-", which denotes that the equipment was shut down; "X", which denotes that the corresponding variable was not measured on that particular day; and "S", which denotes that the analyzer was not functioning correctly. Records containing "W", "-"or "S" as variable value were omitted from the final CSV file. Each value that contains "X" was replaced with a null (empty) value. A numeric identifier variable (Id) was generated and added to the data. Based on relevant domain knowledge, additional actions were performed. Values outside of three times the standard deviation were discarded. In this way, all possible incorrect measurements of the boiler were neglected. Within records in which powdery matter concentration is above 1.2 mg/m 3 , all CO values above 2000 mg/Nm 3 were set to null. The values for O 2 and CO 2 concentration were both set to null if their sum exceeded 35 vol. % in a single record. Additionally, for each variable, all outlier values were set to null. The value was considered an outlier if it was not in the range of ±3 standard deviations away from the mean of the corresponding variable.
In the second step of data processing, the data from the CSV file were further processed for the purpose of association rule mining using the R environment for statistical computing [37]. All records that had at least one null value were removed. The variables Id, Date and Hour were removed, as well as the SO 2 variable, since it dominantly featured zero values. The remaining eleven variables were replaced by their discretized versions, whose names were formed by adding a "Level" suffix. The discretization process was performed using the classInt package for R [38] and the version of the Jenks' algorithm included therein. The number of classes for each variable was chosen between two and three based on the visual analysis of the corresponding variable histograms. Two classes were formed for the variables ExtTemperature, Heat, Biomass, CO, HF

Association Rule Mining
Association rules are patterns that may be expressed in the form X ⇒ Y, where both X, i.e., the left hand side (LHS), and Y, i.e., the right hand side (RHS), are item sets, for which additional restrictions may be placed [39,40]. In the present study, an item is a pair consisting of a variable and one of its values. In general, a rule X ⇒ Y holds true for a given record if both items sets X and Y hold true for the record. An item set S holds true for a record if all the items from S are represented in the record. An item is represented in a record if the record value for the variable from the item matches the value from the item.
In practice, association rule mining, i.e., the discovery of association rules in data, generally results in a very large number of association rules. The ordering and selection of potentially relevant association rules may be based on one or more interestingness measures [41], of which support [35], confidence [39] and lift [42] are prominently used. In a discussion on the usage of statistical tests when examining patterns discovered in data, the holdout-evaluation approach was proposed [43]. In this approach, patterns are first generated from the exploratory subset of the available data and then evaluated on the holdout subset of the available data using statistical tests with a correction for multiple testing. The main steps of the holdout-evaluation approach were followed in the present study. The processed dataset was split into two almost equally sized subsets: an exploratory data subset and a holdout data subset. Association rules were generated from the exploratory data subset as candidate rules for the next stage, in which the candidate rules were evaluated on the holdout data subset.

Generation of Candidate Rules
Association rules were mined from the exploratory data subset using the implementation of the a priori algorithm [39,40] from the rules package for R [44]. The following parameter settings were used for the algorithm: minimum support = 0.05, minimum confidence = 0.55 and 2 ≤ number of items ≤ 4. The algorithm was run four times and each time a different target group of association rules was mined. The four target groups of association rules were: Group A: Rules whose RHS item set contains only the item NO X _Level = 3 Group B: Rules whose RHS item set contains only the item CO_Level = 2 Group C: Rules whose RHS item set contains only the item Powdery_Level = 2 Group D: Rules whose RHS item set contains only the item Powdery_Level = 1 These four RHS item sets were selected by domain experts as pairs of a variable and its value that could appear in rule patterns relevant for the present study. Each group of association rules was pruned separately. This was generally conducted according to the overall pruning idea presented in [41], but, instead of confidence, lift was used to measure the interestingness of association rules. The pruning was performed in the following manner. If, for some rule P, there was some rule Q that encompassed all the items of rule P and at least one additional item, and had the same or lower lift when compared to rule P, then rule Q was considered redundant and was removed from its group. For each pruned group of association rules, only the 20 association rules of the highest confidence value were kept as candidate rules, while all the other association rules were removed.

Evaluation of Candidate Rules
The candidate rules from all the groups were evaluated on the holdout data subset. For each candidate rule, values of support, confidence and lift were calculated for the holdout data subset. The significance of each candidate rule was measured using a statistical test of significance. This was performed in the manner described in [45], which included creating a contingency table for each candidate rule and applying the statistical test. However, instead of using χ 2 test as in [43], Fisher's exact test was used because of its better properties when there are very small values in the contingency table [42,46]. The significance was checked with respect to the following three significance levels: 0.05, 0.01 and 0.001. Because multiple significance tests had to be conducted, the standard Bonferroni procedure was applied as a correction for multiple testing, i.e., each of the three significance levels was divided by the total number of candidate rules and the resulting values were used as thresholds. This section is not mandatory but can be added to the manuscript if the discussion is unusually long or complex.

Results
The processed dataset with 11 discretized variables contained 703 records. Value ranges that match the levels of the discretized variables, as well as standard deviation, are given in Table 2. The exploratory data subset contained 351 records, while the holdout data subset contained 352 records. There were 45 candidate rules: 5 from Group A, 20 from Group C and 20 from Group D. These candidate rules are listed in Table 3, together with their support (Supp), confidence (Conf) and lift (Lift) and the result of statistical testing (Test), all calculated based on the holdout data subset.  The test column indicates the level of significance of each candidate rule, as determined by the evaluation procedure described in Section 3.1. Support of a rule is the number of records for which the whole rule holds true divided by the total number of considered records. Confidence of a rule is the number of records for which the whole rule holds true divided by the number of records for which the LHS of the rule holds true. During association rule mining, the minimum value of the support was set to 0.05 and the minimum value of the confidence to 0.55. Lift of a rule is the product of the number of records for which the whole rule holds true and the total number of considered records divided by the product of the number of records for which the LHS of the rule holds true and the number of records for which the RHS of the rule holds true. In this study, support, confidence and lift were used as descriptive measures of rule interestingness. Results in which lift is above 1 may potentially indicate a positive relationship between the LHS and the RHS of a rule.

Conclusions
The paper presents a unique methodology for investigating the interdependence of the biomass boiler operation parameters, the external temperature and the products of the biomass process. The presented methodology, which uses association rule mining, was applied on a concrete CHP plant based on monitoring of the thermal energy production process, all relevant parameters of this process, as well as the accompanying biomass combustion products, over a period of four months. During the data analysis, the effect of SO 2 was practically ignored and was not taken into account. Furthermore, the increased level of CO occurred only in the individual cases and the methodology was not able to determine the dependence of the increased concentration of CO in relation to other parameters.
The increased concentration of CO in certain operation modes can be avoided by optimizing boiler operation, controlling the moisture content, as well as continuously controlling the combustion process. At an increased level of the powdery content, CO and NOx, the low level of CO is noticed. The high level of NO x is in correlation with the external temperature and high concentration of CO 2 on one side and a high concentration of powdery on the other. To reduce the NO x concentration, it is necessary to apply one of the pre-treatment methods before the treatment of biomass combustion. The high level of powdery has a very high correlation with a number of parameters. The low level of powdery has a lot of connection with other parameters, but the level of correlation is much lower than the high level of Group C. The obtained correlations can be used in the automatization of biomass boiler plant operations in order to reduce the number of harmful combustion products. Furthermore, the developed methodology represents a universal tool that can be applied to all plants that produce biomass. The presented model in this paper can help to understand the process of biomass combustion in boiler plants, which can contribute to better design of the boiler at the design stage as well as optimal management in the exploitation phase.