Expert System for Stable Power Generation Prediction in Microbial Fuel Cell

Expert Systems are interactive and reliable computer-based decisionmaking systems that use both facts and heuristics for solving complex decision-making problems. Generally, the cyclic voltammetry (CV) experiments are executed a random number of times (cycles) to get a stable production of power. However, presently there are not many algorithms or models for predicting the power generation stable criteria in microbial fuel cells. For stability analysis of Medicinal herbs’ CV profiles, an expert system driven by the augmented K-means clustering algorithm is proposed. Our approach requires a dataset that contains voltage-current relationships from CV experiments on the related subjects (plants/herbs). This new approach uses feature engineering and augmented K-means clustering techniques to determine the cycle number beyond which the CV curve stabilizes. We obtain an excellent estimate of the required CV cycles for getting a stable Voltage versus Current curve in this approach. Moreover, this expert system would reduce the time needed and the money spent on running additional and superfluous CV experiments cycles. Thus, it would streamline the process of Bacterial Fuel Cells production using the CV of medicinal herbs.


Introduction
As technologists worldwide focus on zero-emissions energy for power generation, fuel cell (FC) technology paves the way for it. Like a battery, the fuel cells undergo an electrochemical reaction to generate electricity. Unlike battery cells, fuel cells require a continuous supply of fuel and oxygen to sustain the chemical reaction. A significant difference between fuel cell and other electrochemical power generation devices is that FC can be used only after system integration. The fuel cell unit consists of a stack of individual cells. Each cell in the stack has one positive electrode (cathode), one negative electrode (anode), a solid or liquid electrolyte for permitting appropriate ions from one electrode to another and a catalyst for accelerating the reactions at the electrodes. The fuel cells are classified based on various parameters like temperature range, type of electrolyte used, the fuel's physical state, etc. As fuel cells possess different operational variables, it is difficult to classify them. The fuel cells are classified as hydrogen-oxygen, fossil, alkaline, alcohol and hydrocarbon fuel cells based on the fuel type. Primary and Secondary fuel cells are classified based on the reactants passed and their reaction in the fuel cell. In the former, the reactants are added only once. The by-products are discarded whereas, in the latter, reactants are passed several times as they are regenerated using various methods. The fuel cells like polymer Electrolyte Membrane, solid oxide, direct methanol, proton exchange membrane, molten carbonate, phosphoric acid and microbial fuel cells are commonly used in the market. The fuel cell technologies right from Humphry Davy's fuel cells to hydrogen fuel cells focus mainly on lower environmental impact with high efficiency. Thus, these replace the fossil fuel systems where fuel is generated from the decomposition of buried dead organisms and eliminate pollution.
Fuel cells are more effective than gas engines. They are ideal for toys, vehicles, commercial buildings like medical health centres, educational institutions, financial corporations, residential facilities, military applications (battlefield power) and large industrial corporations. Various fuel cells have an extensive assortment of applications, ranging from robust backup power systems to power consumption for mobile transports like electric buses, trains, and heavy chemical transport trucks.
Moreover, the productive use of hydrogen fuel cell will reduce the greenhouse gases emitted. As fuel cells do not require any gas or oil, non-oil-producing nations will benefit, reducing their expenditure. Microbial Fuel cells (MFC) use bacteria as the catalyst, thus oxidizing organic and inorganic matter from the synergistic interactions of immobilized cell populations, significantly attenuating electron transfer resistance biofilm and solid electrode for adequate power generation. The taxonomy of the microbial fuel cell is depicted in Fig. 1. MFC requires a basic understanding of various fields of study ranging from microbiology, electrochemistry, material engineering to environmental sciences. Detailed mechanisms to influence the bioelectricity generation of MFCs remained open to be deciphered. Therefore, for system optimization, most of their functionality requires more profound analysis. MFC, on its emergence, uses a mediator-a chemical for transferring electrons from the bacteria to the anode termed as mediated MFC. In the 1970s, non-mediated or mediator, fewer MFCs were designed in which the bacteria directly transfers the electrons to the anode through the redox proteins in them [1]. As MFC uses biological electrochemical reaction (which uses microorganisms) to generate electricity, it reduces atmospheric carbon dioxide and environmental contamination. MFCs are the most green-sustainable option for wastewater treatment as the microbes well acclimatized to sewage. The bacteria consume the sludge present in the wastewater, thus saving the cost incurred in that. So we predict the power generation by the biological fuel cells using feature engineering and clustering techniques.
The feature engineering (FE) technique is used to obtain the exact features required for processing the data from our dataset and predict the results. It assists in creating a new feature set from the existing features. As FE helps isolate and highlight the critical information to make the algorithm focus on the significant key aspect, data scientist often prefers FE as the best model for improving the performance. The clustering technique is used to group the data based on the similarity. Thereby, more similar data lies in one group and dissimilar data in another. Because of this capability, data reduction can be attained, thus making the prediction better. K-means, an unsupervised learning algorithm, is used for clustering. We intend to choose this for our non-categorical data as it works better.
Conventional fuel cells have many system-specific deficiencies like methanol cross-over, CO 2 evolution, and anode kinetics to be considered before preferring other power generation techniques like internal combustion engines [2]. The conventional fuel cell PEMFC, the polymer electrolyte membrane-based fuel cells, use noble metal electrodes, which incur a considerable cost [3,4]. Alkaline fuel cells eliminate CO 2 from fuel and oxidants by the use of potassium hydroxide. Because of their biodegradable fuel and mild operating conditions, microbial fuel cells have been observed appropriate fit for low-power applications with environmental friendliness like bioelectricity, biosensors, wastewater treatment and bio-hydrogen production .
Moreover, in this work, augmented K-means clustering-based modelling is used for systems-driven stability analysis of cyclic voltammetry (CV) profiles of medicinal herb extracts (Citrus reticulate, Syzygium aromaticum, Lonicera japonica, Zingiber officinale). Besides, the literature [28] suggests that these reactions do not follow the Nernst Equation (which gives the cell potential under non-standard conditions). These reactions seem to follow some dynamics of their own, which do not seem easily comprehensible. Therefore, it is challenging to determine the voltage versus current relationship stability in these CV experiments. However, with the researches carried out in this field, it has been observed that this almost erratic relationship of Voltage and Current in CV experiments seems to become more stabilized after a certain number of experiment cycles. Each cycle starts from +1.5 Volts, goes till -1.5 Volts (reaches some intermediate point), then from -1.5 (some intermediate point), it returns to +1.5 Volts (endpoint). Usually, the CV experiments are executed a random number of times (cycles) to get a stable power production. Nevertheless, there are not many algorithms for predicting the power generation stable criteria in microbial fuel cells. Therefore, in this work, generating the model phaseplane portraits of current output vs. applied voltage via CV scan cycles was accomplished through the augmented K-Means Clustering Machine Learning approach. Therefore, this work attempted a novel augmented K-means Clustering-based model to predict the stabilization time-duration during the power generation process in the Microbial fuel cell. In particular, since green sustainability is the top priority for resource utilization globally, apparently analyzing natural bioresource characteristics containing diverse chemical compositions should be developed for further applications without dispute. Thus, this firstattempt clustering-based model provides a valid protocol to decipher complicated herbal features for bioresource exploration. Cyclic voltammetry was adopted to explore medicinal herbs' electrochemical characteristics (Citrus reticulate, Syzygium aromaticum, Lonicera japonica, Zingiber officinale) as suggested by Chen et al. [38]. Further, in the electro-analytical study, it is a well-known fact that to explicit the required statistics for the reduction-oxidation process thermodynamics, the kinetics of heterogeneous electron-transfer reactions or/and the coupled chemical reaction or adsorption approaches are deployed. Nevertheless, it is rare for statistical and machine learning models to present the CV outcomes and accomplish judicious forecasts by utilizing these CV profiles. Moreover, the theoretical concept of CV focuses on the electric current generated due to the cathode potential with the applied voltage (considered as the required factor).
The electrochemically energetic breed concentration produced and eradicated in the reactions is directly proportionate to the reductive and oxidative potential peak current. Henceforth, it becomes essential to apply incessantly; several electrical potential cycles over time ensure the stable CV profiles' realization using the adequate original information. Besides being a very trying, resource and time-consuming activity, it turns out to be significant for the CV analysis to obtain the exact measures predicting the accurate value of the electric current vs. applied voltage. Because of this fact, this research devises a feature engineering-driven augmented K-means clustering based ML system for modelling the current acquired from the CV curves. Further, this work also establishes the pragmatic stability characteristics criteria. The voltage (E) versus current (I) graph of the medicinal herbs is shown in Figs. 2-5. Implementing serial cyclic voltammetric (CV) analysis upon test sample (e.g., herbal extract) could electrochemically simulate applications of repeated electron-donating and withdrawing processes to reflect sequential responses of electrochemical reduction and oxidation. As Figs. 2-5 indicates, CV responses tended to be gradually decayed, simply revealing that the electrochemically active and unstable species would be irreversibly reduced and/or oxidized. Such response-attenuating characteristics could often be exhibited during electrochemical testing upon herbal extracts. Plant extracts contained abundant chemical species with oxidative and reductive potentials in different levels (e.g., polyphenolic antioxidants). That is the reason for Figs. 2-5 to exhibit such electrochemically unbalanced and irreversible outcomes in CV analysis. However, after several CV testing cycles (ca. 80 cycles), the electrochemically reactive species present in herbal extracts were nearly reacted or consumed. That is, CV profiles could be asymptotically converged to be more electrochemically stable, suggesting that only reversibly electroactive species (e.g., electron shuttle or electrochemical "catalysts") could still be stably maintained in the test sample.

3D Visualization of the Medicinal Herbs CV Data
The graphs portrayed in Fig. 6 give a clear picture of how the current-voltage relationship varies at various cyclic voltammetry stages. The depth, breadth and height are the parameters that can be perceived from the 3D visualization of the medicinal herb data. Further, this 3D visualization helps to understand  the qualitative and quantitative details concerning the object-besides, the three-phase procedure of scene, geometry, and rendering aids in visualizing the 3D information. Moreover, when the datasets' size increases, it becomes apparent for deploying such data visualization tools. Also, from any angle, the information generated by this 3D visualization can be observed.

Feature Engineering
As the dataset taken does not provide meaningful categories, feature engineering offers the solution to produce significant deductions. The following six new features were generated to test on the available medicinal herb dataset.  The cumulative voltage variance The cumulative current variance Product of cumulative voltage variance and cumulative current variance The cumulative variance of the cumulative voltage variance The cumulative variance of the cumulative current variance The cumulative variance of the product of cumulative voltage variance and cumulative current variance.
The significance of these features is that they can give the approximate trend of changes in voltagecurrent pairs. This was quite evident from the values of the new generated features.

Proposed Methodology
Multiple features and their permutations were tried, and since, there is a solid relationship between every feature as they were handcrafted. It becomes relatively easy to find the decision boundary on observing the trends, and in fact, our algorithms were also supportive of these trends. Moreover, before any clustering, feature scaling was applied. The augmented K-means Clustering approach applied here makes use of the L2 norm. Euclidean distance is the shortest distance between two points in an N-dimensional space, also known as Euclidean space. It is used as a standard metric to measure the similarity between two data points. It is also referred to as the Euclidean norm, Euclidean metric, L2 norm, L2 metric and Pythagorean metric.
K-means algorithm is an iterative algorithm used for partitioning the non-categorical data into K clusters (a group of similar data items). It iterates until all data items are grouped into distinct non-overlapping clusters (with duplication). K-means clustering is very popular in cluster analysis where it partitions ninput data points into K clusters fixed apriori. The data points are assembled into a cluster based on the nearest mean value, i.e., the data points in the cluster will have the closest mean value. This is how the data is partitioned in Voronoi cells. Each cluster will have one cluster head. When all the data points are placed into K clusters, the cluster heads' positions are recalculated. The classical K-means algorithm can be sensitive to the initial cluster centre. Moreover, distinct initial cluster centres might have a significant influence on the final clustering outcomes. Due to this reason, this algorithm is considered to be highly volatile. Therefore, to overcome this limitation, we introduce the augmented K-Means Clustering Algorithm, as shown in Fig. 7.

Experimental Results
The augmented K-Means Clustering machine learning approach is deployed in this research for generating the model phase-plane sketches of current output versus applied voltage via the CV profile scan cycles. Consequently, to devise appropriate power generation and storage systems, we need to investigate this unstable condition. Hence, this work establishes the augmented K-Means Clustering machine learning approach to predict the microbial fuel cell's time duration to generate a stable power   Fig. 8.
Moreover, from the slope present in Fig. 8, we can predict the time (cycle number) for stable power generation. With the subject expert's supervision, we also conducted experiments for validating the results obtained using the augmented K-Means Clustering machine learning approach. As a result, this might help future technologists develop progressive energy generation, management and storage units, and equipment. We can conclude from the experimental outcomes and its analysis that we can use the augmented K-means clustering machine learning model to predict or determine the stability criterion (cycle and voltage) in CV profiles for emerging bio-energy application paradigms.

Selection of Optimal Value of K
In this work, K's optimal value is determined using the Gap Statistic algorithm [39,40]. The Gap Statistic algorithm's fundamental notion is to present the reference measurements achieved via Monte Carlo experiments [41], subsequently computing the L2 norm in every class, amongst any two measures. L2 norm optimizes the mean cost, which is often used as a performance measurement. The solution is more likely to be unique, and the non-sparseness of the L2 norm can improve the prediction performance. Further, the generated reference normal distribution's clustering outcomes gets compared for determining the optimal value of K [42]. Fig. 9 portrays the selection of the optimal value of K using the Gap Statistic algorithm. It can be observed from this graph that the optimum value of K is two in this scenario.

Discussion
The plot for Cycle Number vs. Variance in Current vs., Variance in Variance of Current, can be witnessed in Fig. 8 for all four medicinal herbs (Citrus reticulate, Syzygium aromaticum, Lonicera japonica, Zingiber officinale). We experimented with every possible permutation of available features, and the best results are produced from this particular combination of features. Since our dataset is of high dimensionality, it is arduous to plot everything on a scatter plot. Our experiments show that the optimum decision boundary lies approximately near the 45-55 mark. So now we know that for these four types of herbs, the optimum number of cycles for the stability criteria is around the 54-60 mark. Furthermore, the cost for the last 40-46 cycles can be saved by this prediction. Tabs. 1-4 presents the stable time-period (cycle number) criterion for the four medicinal herbs (Citrus reticulate, Syzygium aromaticum, Lonicera japonica, Zingiber officinale) under Optimal K-value (K = 2). It can be clearly witnessed from these tables that the proposed Augmented K-Means clustering method combined with the Gap Statistic K-Value Selection Algorithm performs superior to all other compared combinations of algorithms.

Conclusion and Future Work
The generation of stable power in microbial fuel cells seems to be an arduous task, and for this reason, the practical applications such cell becomes limited. Moreover, in this research, we highlight the problem in modelling CV profiles of the medicinal herbs (Citrus reticulate, Syzygium aromaticum, Lonicera japonica, Zingiber officinale). It aimed to provide a credible prediction of responding current at different given conditions. The present study proposed an augmented K-means clustering machine learning model for predicting the stability criterion (time-period) during the power generation process. The proposed expert system based on the K-means clustering machine learning model determines the stable CV cycle and medicinal herbs' period. Further, this determines the exact CV cycle, when the stable response is accomplished, which could provide a confident interpretation with bioelectrochemical significance for the researchers and scientists. In our case, we have developed a model to study and predict the CV process's serial results precisely. This model can be applied to any CV data, but the parameter sets a boundary with relevant minimum and maximum input values. This type of modelling can solve these practical problems and reduce experimental efforts by producing more accuracy. The combination of electrochemical CV tests and augmented K-means clustering machine learning model was proved to be a practical and efficient approach for both fundamental understanding and quantification of medicinal herbs characteristics. The result and analysis confirmed that the subject professionals could employ the augmented K-means clustering expert system to estimate the stability criterion, leading to the exploration of novel resources in the bio-energy field. Besides, this work delivers a typical model of logical thinking for developing prototype electrochemical instruments. The authors' future work is to develop sophisticated methods such as genetic programming, including micro and nanostructure analysis of herbs and develop fusion-based models to obtain deep CV insights.