Enhancement of Neural Network Based Multi Agent System for Classification and Regression in Energy System

Extreme Learning Machine improved the iterative procedures of adjusting weights by randomly selecting hidden neurons besides analytically determining the output weights. In this paper, the basic ELM neural network was enhanced with a simplified network structure to achieve regression performance. Next, to solve the pattern classification, a hybrid system was proposed which integrated the ELM neural network and MAS models. A MAS model is then designed with a novel trust measurement method to combine ELM neural networks. Firstly, ELM hybrid with Single Input Rule Module (SIRM-ELM) was designed. There was only a single input connected to the rules, where the rules were the hidden neurons of ELM and each represented a single input fuzzy rules. Results showed that the SIRM-ELM model was better than Support Vector Machine and traditional ELM. Secondly, an extreme learning machine based multi agent systems (ELM-MAS) was designed to improve ELM’s capability. Its first layer was made up of at least one ELM where ELM acted as an individual agent, whereas another layer was made up of a single ELM acting as the parent agent. Lastly, Certified Belief in Strength (CBS) method was applied to the ELM neural network to form ELM-MAS-CBS, using the reputation and strength of individual agents as the trust measurement. The assembly of strong elements related to the ELM agents formed the trust management that allowed the improvement of the performance in MAS using the CBS method. Both of the developed models were evaluated on its application on the power generation system. The test accuracy rate of both models for circulating water systems was shown to be comparable to other algorithms. In short, the developed models had been verified using benchmark datasets and applied in power generation, where the results were satisfactory.


I. INTRODUCTION
Feedforward Neural Networks (FNNs) is the most common approach for Artificial Neural Networks (ANNs) which is being used to recognize patterns. They are capable of handling non-linear as well as noisy data (e.g. data gathered from The associate editor coordinating the review of this manuscript and approving it for publication was Junxiu Liu . actual environments). Unfortunately, the downside of FNNs is its learning speed, due to: (i) The slow error back-propagation (BP) and other gradient-based learning algorithms [1]- [3] being employed to train the neural networks, and (ii) Its parameters are adjusted repeatedly using gradientbased learning algorithms. In terms of pattern recognition, Multilayer Perceptron (MLP) [4], [5] and Radial Basis Function (RBF) networks [4], [6] are both superior. The MLP network was made up of a non-linear transformation of combined sigmoid functions of hidden neurons and can be used to recognize patterns. In the case of the RBF network, it solved problems by combining non-linear semi-parametric functions, for instance, the Gaussian kernel function. Nevertheless, trialand-error or a set of cross validation is needed to predefine the number of kernel functions of RBF and hidden neurons of MLP, which might very well be a lengthy process. The training process was also lengthened due to the need to train the datasets so that it was compatible with the network.
There has been much work being done to improve BP algorithms to eschew the local minima according to the better selection of activation function, dynamic variation of momentum and learning, and cost function. Simple Adaptive Momentum (SAM) can improve the convergence rate of BP [12]. The momentum coefficient was scaled giving to the likenesses shared between the changes in weights for the previous and current iterations, with lower computational overheads relative to conventional BP. Mitchell et. al. adjusted the momentum coefficient differently by accounting for the weights in the Multi-layer Perceptrons (MLP) in 2008. Such an approach proved to be superior to that of SAM. In 2011, Gradient Descent BP (GDAM) was proposed to increase its overall efficiency [13].
Despite the reported improved versions of the BP approaches, having to repetitively and iteratively adjust weights during training to accurately model a specific learning task of the training samples remains a challenge. In order to circumvent this [14]- [16], Extreme Learning Machine (ELM) is a new learning algorithm which was proposed by Huang et al. (2006a). In ELM, a single hidden layer feedforward neural network (SLFN) improved the iterative procedures of adjusting weights by randomly selecting hidden neurons in addition to calculating the output weights of SLFNs. Theoretically, ELM reported an excellent generalization performance at exceptionally quick learning speeds.
Most users preferred that the input samples were accurately classified. The output included an estimate of the classification strength. The possibility that the prediction was accurate compels the system that they can rely on it and make informed decisions [17]- [19].
Intelligent agents are regarded as a computer intelligence paradigm. Multi-Agent Systems (MAS) are widely applicable, namely in decision support [20], navigation [21], industrial steel processing [22], and power systems [23]. In this paper, MAS was studied as an approach that can be used to ensemble ELM networks, where each ELM acted as individual agent, and MAS structure merged with the entire predictions of ELM to create a classification system that performed well.
In MAS, a number of models can be used to describe agent links, such as Rasmussen, Pejtersen, & Goodstein (1994)'s decision ladder model and Bratman (1987)'s Belief, Desire, and Intention (BDI) model. In this paper, two models which are Haider, Tweedale, Urlings, and Jain (2006)'s Trust, Negotiation, and Communication (TNC), and Tweedale and Cutler (2006) were examined to develop a MAS model. The primary element in the TNC model was trust measurement. Due to the fact that trust was subjective, we need to investigate methods that can be used for trust computations to render it an objective quantity.
The main objective was to design, develop, and enhance ELM based neural network models that capitalized the advantage of ELM while avoiding their inherent limitation. The sub-objectives of this work included: (Case 1) Enhancement of the existing ELM neural network for achieving regression performance with simplified network structures; (Case 2) Propose of a hybrid system integrating the ELM neural network and MAS models for solving pattern classification; and, (Case 3) Design of a MAS model with a novel trust measurement method to combine ELM neural networks. A standard ELM is presented in this paper which hybrid with a simplified network structure and MAS model. These proposed models are the novel technique to aim to achieve better results.

A. PROPOSED WORK
To achieve the main objective, the existing ELM neural network was enhanced with a simplified network structure to achieve regression performance. Next, to solve the pattern classification, a hybrid system was proposed which integrated the ELM neural network and MAS models. A MAS model is then designed with a novel trust measurement method to combine ELM neural networks. Each case is tested with application to testing its capability.

A. (CASE 1) EXTREME LEARNING MACHINE WITH SINGLE INPUT RULE MODULE (SIRM-ELM)
To assess ELM, this section details the proposal of a novel unprecedented technique in ELM ideology for regression problems, which was ELM-based model using ELM hybrid with Single Input Rule Module (SIRM), denoted as SIRM-ELM. In SIRM-ELM, there was only a single input that connected to rules, where each of the hidden neurons of ELM represented a single input fuzzy rule. Hence, the number of hidden neuron of ELM determines the number of fuzzy rules. Conventionally, when the ''if-then'' rules of the fuzzy inference method were used all the input and output items were assigned to antecedent and consequent parts respectively. Nevertheless, the major dilemma was that the numbers of the fuzzy rules were kept increasing until the system and arrangement of the rules became complicated [24]. Therefore, Yubazaki [25]- [31] developed an enhanced SIRM connected type fuzzy inference method that consociates the fuzzy rules module outputs significantly. The areas that SIRM method was applied to include the control of anti-swing and positioning for the overhead traveling crane [25], the control to stabilize inverted pendulum systems [27]- [29], the control of the 1st and 2nd order lag system with dead time [26], [31], non-linear function identification [31], and others, of which decent results were acquired [24].
The assumption is that a system consists of n input source and one output source. However, the system can also be extended with plural output sources. This is the basic, with n input source for SIRM: In Equation (1), each SIRM independently corresponded to n input sources. The SIRM-i, where the i refers to i th input source, R j i is the j th rule in the SIRM-i, x i refers to the i th input source variable in the preceding section, and u i is the variable in the following part of the SIRM-i.   1 showed the structure of SIRM-ELM, with the steps to train the data as shown below. Refer to Fig. 1 for the details definition of variables and parameters.
Step 1: Haphazardly set the input weights a j i , as well as bias, b j i (for i = 1, 2 . . . , N whereas for j = 1, 2, 3) of hidden neurons. Take into account that a j i and b j i are parameters of membership function for SIRM, A j i . The weights are generated based on αD − ω, where D is uniform distribution function that randomly generates a number between 0 to 1, α and ω are the parameters. By default, α = 2, ω = 1. As the results, the a j i and b j i are in the range of −1 to +1.
Step 2: For the training pair (x pi , t p ) where x pi is i th feature of p th training pair and t p is target output (for p = 1, 2, . . . , P). Determine the hidden layer output matrix H using the membership function µ(x pi , A j i ). For simplicity, the membership function can be denoted as µ j pi Step 3: The output weights, β, were computed. Since it is a high possibility that H is a non-symmetrical matrix, the inverse matrix cannot be resolved. To circumvent this problem, a Moore-Penrose pseudo inverse matrix method is utilized, hence work out the output weights of β by the formula below, where T is target output matrix, i.e.,T = [ Step 4: After the output weights of SIRM-ELM were calculated, prediction of a set of new and unlabeled samples z can be computed, i.e., λ is the membership function, h is the hidden layer whereby y is the prediction output.
where q = 1, 2, . . . .Q and Q is the number of test samples.
Step 5: After compute the output of ELM for testing samples, determine the root mean squared error (RMSE), i.e., where y q and d q were prediction and actual output respective to z q . The capability of SIRM-ELM was applied to the NO x emission of power generation plant.

1) REAL-WORLD APPLICATION: NO X EMISSION OF POWER GENERATION PLANT
Nitrogen occurred naturally in the atmosphere as an inactive gas. In addition, our atmosphere contains just about 78% N 2 by volume in the air. The NO x was referring to nitrogen oxides but mostly include nitrogen monoxide, also identified as nitric oxide, NO as well as nitrogen dioxide, NO 2 . There were also others in the family, including N 2 O, N 2 O 4 and N 2 O 5 .
The presence of atmospheric NO x posed direct and indirect effects on human health and ecosystems, i.e. animals and plants, in the environment. NO x reacted with components such as water, oxygen and other chemicals to form smog and acidic pollutants which leads to the formation of acid rain. In turn, acid rain, together with dry deposition and cloud, may cause damages and deterioration to cars and buildings. NO x is mainly released during the combustion process of fossil fuels like coal, oil and natural gas. According to European Environment Agency (EEA) technical report (1990 -2013), 21% of the NO x gas emissions in the European Union were from energy production and distribution, which was approximately 1,600 kilotonne. However, the growth of power generation industries was expected to be increasing by 18.7 gigawatts (GW) in the coming years, 2016 -2018, due to the price and availability of natural gas. Hence, the prediction of NO x emission is vital for the power generation sector and the issue should be taken seriously.
For real-world application in this study, the NO x emission of an open cycle gas turbine in a power generation plant (located at Port Dickson, Malaysia) had been investigated [32]. The objective was the development of a neural network model to predict NO x emission. There were 150 input attributes taken from the parameters of the power generation plant such as the loading of gas turbine, temperature, pressure, etc. The quantity of NO x (in ppm) emitted from the gas turbine was the targeted output.

B. (CASE 2) HYBRIDIZING ELM-MAS (EXTREME LEARNING MACHINE AND MULTI AGENT SYSTEMS)
Extreme Learning Machine (ELM) has been well recognized as a more effective learning algorithm with better generalization and faster learning speed, in comparison to the conventional learning methods [33]- [39]. In addition, ELM is well-known for its capability to produce universal approximation using input weights and haphazard biases [40]. VOLUME 8, 2020 In essence, the link among the output and hidden layers are studied using primarily the input weights with optional hidden neurons.
ELM is tremendously effective and inclined towards global optimum in divergence to the CFNN (conventional feedforward neural network), according to Huang et al. [41], [42]. Moreover, ELM is capable to attain the utmost generalization bound of the CFNN, in which each parameter is learned with activation functions that are usually exploited [86]. In terms of efficiency and generalization, ELM has shown enhanced accomplishments compared to the traditional FNN [33]- [39]. Besides, ELM is applicable to other fields and not limited to chemical processes [43], hyperspectral images [44], action recognition [45], biomedical analysis [46], [47], power systems [48], system modeling [49], [50].
Currently, the focus of the research on ELM was to assimilate each independent prediction of some ELMs to create an optimum output using an ensemble model [51]- [55]. This approach has been adopted as evidenced in Multi Agent System (MAS) particularly [56]. MAS has been a center of attention in modern years as it has been effectively functional by researchers with extensive applications in different sectors including health care [57]- [59], e-Commerce [56], [60]- [62], military support [63], [64], knowledge management [65]- [68], decision support [69]- [72], as well as control systems [73]- [76]. Fig. 2 showed the common structure of MAS, in which the ground platform consists of a group of ELMs which are the individual agents. In general, the ultimate combination module, which consisted of the outcomes of ELM's individual agents delivered to the corresponding parent agent, formed the structure.
The common exercise for meta-learning is used to combine the outcomes of various learners. It is interpreted as knowledge is learned by at least a learner [77], [78]. The model was developed by several ELMs which act as the hidden neurons, and the outcomes of hidden neurons learned from metalearner. Experimental results and theoretical analysis based on a number of studies using benchmark regression and artificial datasets which were trained by several ELMs, provide good performance at the expense of a lower computational rate [79]. The Meta-ELM [79] was a special design with ELM where ELMs as hidden neurons. Nonetheless, an ELM-MAS was designed from another perception. In this section, ELM-MAS had two layers of full ELMs: consisted of at least an ELMs where every ELM was reflected as an individual agent in the first layer; consisted of a single ELM and acted as the parent agent in the second layer. Therefore, this double layers' arrangement of the proposed ELM-MAS resembled a classic MAS as shown in Fig. 2.
As shown in Fig. 3, depending on the type of activation function it utilized, an ELM can either be a feedforward or RBF network with a sophisticated learning algorithm. A series of N training samples (with an individual target output vector as well as input vector), t j ∈ R C (C is the number of classes) and (x j , t j ), i.e. x j ∈ R M (M is the number of input attributes), consisting of L number of hidden neurons,  were utilized to train an ELM. Five ELMs acted as individual agents in this case and had different random input weights respectively. The output of every ELM k (for k = 1, 2, . . . , 5) shown in Fig. 4, in response to x j is where a k i is the bias and b k i is input weights of the hidden neurons, β k i is the output weights, whereas G(a k i , b k i , x j ) is the output of the i th hidden neuron given the input vector x j .
Equations (10) and (11) respectively revealed the definition of the G(a k i , b k i , x j ) for additive sigmoid hidden neuron as well as RBF hidden neuron.
The training procedures were given as followed. Stage 1: Assigned the input weights a k i and b k i randomly for k = 1, 2, . . . , 5 and i = 1, . . . , L.
Stage 2: Computation of the hidden layer output matrix for ELM k , H k , as follows where k = 1, 2, . . . , 5. Stage 3: Calculation of the β k , output weights of ELM k . As for the reason that H is probably a non-symmetrical matrix, the inverse matrix can't be solved. Therefore, a Moore-Penrose pseudo inverse matrix technique was embraced to evade this problematic, which was represented by the following calculation, where targeted output vectors are T = [t 1 , . . . , t N ] T . Stage 4: After the output weights of ELM k were calculated, the outputs of ELM k were computed using the training samples.
Stage 5: Randomly assigned the input weights for parent ELM, i.e., q i and p i (i = 1, . . . , L 1 ), where L 1 is the number of hidden neuron of parent ELM.
Stage 6: Computation of the S, hidden layer output matrix for parent ELM, shown as follow : . . . : where w j is the combined outputs of ELM k (for k = 1, 2, . . . 5) in response to x j , i.e., w j = [ y 1 j y 2 j y 3 j y 4 j y 5 j ], y k j ∈ R C , and w j ∈ R 5C .

Stage 7:
Used the output of ELM k to compute the output weights of parent ELM, α by the calculation beneath, where T = [t 1 , . . . , t N ] T is the corresponding targeted output vectors.
As soon as every sample were trained with Stage 1 until Stage 7, the ELM-MAS can be utilized for validation of an unknown z, input vector based on the a k , b k , β k , p, q and α i.e., where h k , hidden layer and y k , output layer of ELM k , v = [ y 1 y 2 y 3 y 4 y 5 ] is the combined outputs of the ELM k in response to z, whereas s and y are hidden layers of final output of the validation respectively. In addition to Equations (10) & (11), there were some activation functions that had been used in this case, i.e., The proposed Meta-ELM [77] and ELM-MAS have similar comparable structure. Nevertheless, they had alterations VOLUME 8, 2020  The gas turbine monitoring and control were frequently introduced by the Energy Management Systems [83]. In addition, the energy control center was commonly utilized by these computer-based systems [84], [85]. During steady-state operation, gas turbine application software and other analysis software were being presented into the Energy Management Systems to examine and forecast the behavior of gas turbines [86]. Even though this software was an influential tool, it's capabilities to support the operating engineers in creating the finest judgments were restricted in the period when unexpected otherwise unplanned approaches of 2 tasks were discovered. The triggers of abnormal modes in the system operation were network faults, frequency deviations or either reactive and active power imbalances in most cases. Therefore, system shutdown (complete or partial) can be occurred in an unintentional task [86]. As a consequence, experienced operation engineers will be the one who is making the judgements for the restoration of the gas turbine under these emergency situations. Therefore, the knowledge of experienced operation engineers as well as the conventional application software are both essential for balancing reactive and active power, efficiency in diagnosis of network faults, and network restoration [86]. Hence, developments of efficient and fast techniques of forecasting unusual system behavior are essential. From the record, Malaysia has experienced numerous large-scale blackout occurrences for the past years [84], [85]. In 2005, a number of gas turbine plants were consecutively tripped out unconsciously and followed with a frequency fall of about 1.5 Hz which subsequently led to depletion in a total of 5760 MW. Therefore, some studies were conducted to witness how the combined cycle power plants react with the drops in frequency [87], [88]. These gas turbine models developed by Rowen [89] and Mello et al. [90] to replicate the real-world plants, which were then used to determine the reactions of the frequency variations. However, there has not been a detailed analysis to study the behavior of plant variables during frequency drops.
The vital dynamic structures of industrial gas turbines driving generators connected to electric power systems indicates by the governor model (GAST). Speed variances from nominal were planned to be minute (approximately five percent). Fig. 6 showed GAST, which contained of a combustion chamber's time constant, T 2 , as well as a load-limiting feedback path, in addition a forward path with governor time constant, T 1 . The parameter that adjusted the gain of the load-limited (A T ) feedback path is the constant, K T . T 3 indicated the time constant of the exhaust gas measuring system. Lastly, the load limit was susceptible to turbine exhaust temperature.  (individual agents of MAS) are used by the CBS method to develop the trust measurement. With this method, strong elements that were linked to the individual agents (ELM) were gathered to develop the trust measurement to improve on MAS.
As the information in [53], rejection and recognition accuracy rates based trust measurement had been suggested. There were 2 groups utilized where the primary consisted of three modified FMM (Fuzzy min-max) agents whereby another group consisted of three modified Fuzzy ARTMAP (FAM) agents. Better performances were reported in the model as compared to other tactics stated in. On the other hand, there was one more trust measurement tactic suggested which based on Bayesian formalism with FMM MAS [91]. To attain the trust measurement, the FMM in the model was used as a learning agent in MAS and tailed by combination with Bayesian formalism. The results in proposed model showed improvement as compared to other tactics [91].
A technique called Certified Belief in Strength (CBS) is the latest development of MAS model for trust measurement, which was based on the reputation and strength of individual FMM based agents [91]. Consequently, trust was the strong element related to the FMM agents that enabled the CBS technique to increase the performance of the MAS in the training practice. The result showed that the improvement of the accuracy rates of the individual agents [91].
Therefore, an extended version of CBS method by using ELM (Extreme Learning Machine) based MAS (Multi Agent System) (from now designated as ELM-MAS-CBS). In Multi-Agent Classifier System for Certified Belief in Strength (MACS-CBS), it used FMM of several hyperboxes. In the proposed model, a ''team'' idea was employed with individual ELM-based agents.
The Fig. 7 is shown that the ELM-MAS-CBS model consisted of three levels. The bottom level contained a few individual agents (ELM-based agents); the middle level contained some teams of ELM-based. In addition, the new approach which is applied the CBS technique into the individual ELM-based agents. The final decision is on the last level and is selected by the Manager from the peak CBS team as the output. As for this section, the number of agents used in a team was set as 5 (K = 5), whereby the number of teams was set as 3 (T = 3). In addition, an ELM-based agent was   Fig. 8.
The stages of validation and training were as detailed below.
Stage 1: Randomly allocated the input weights a tk i and b tk i . In the training process, variables run for i = 1, . . . L (where L is the number of hidden neuron of ELM), for k = 1, 2, . . . , K , and t = 1, .., T .
Stage 2: Calculated H tk , the hidden layer output matrix for ELM tk as follows, where x j is the input vector, N is the number of training samples and G is the activation function.
Stage 3: Compute β tk , the output weights of ELM tk by using the following equation, where the respective targeted output vectors, Stage 4: Compute the outputs ELM tk , i.e., Stage 5: After that calculate for accuracy rates of the ELM tk as the following equation.
where A tk and N tk are accuracy rate and number correctly classified samples of ELM tk .

Stage 6:
Calculate the output of ELM tk based on Equation (28) by using the validation samples.
Stage 7: Set an initial bid coefficient (C bid ) is 0.01 [91] and initial strength of CBS for all team is 100 (S = [100 100 100]) [91]. In addition, the strength was in proportion to initial team bid as follows [92], Stage 8: Calculate the trust element, C t by using the validation samples as shown in Equation (31). Determine C k by using equation (29) in order to find the accuracy rate of the agents in each team. After that, the peak accuracy rate of ELM was selected (designated as ELM tw where the winner of the team, w) and then indicating its team by inserting into Equation (31) and then surrender it to the top level which is the manager layer.
Stage 9: Giving to a proposed paper [91], the Equation (30) act as the penalty and reward to revise the strength using the Equation (32), where R is reward and P is penalty. If an agent makes an incorrect prediction, Stage 10: After S t was revised, therefore both the B t and the A tk were also revised using the Equation (30) and (29), respectively.
The ELM-MAS-CBS can be used for prediction of a newly arrived and unknown input vector z after all the samples were trained using Stage 1 to Stage 10.
Stage 11: Load all the b tk i , a tk i , A tk , β tk , S t , and C t from the completed training process in Stage 1 till Stage 10. The variables were ran for k = 1, 2, . . . , K , for i = 1, . . . , L, and t = 1, .., T in all stages / equations of validation process.
Stage 12: Calculate h tk , the hidden layer output matrix for ELM tk as follows.
Stage 13: Compute the outputs of ELM tk , Stage 14: The selection of the peak accuracy rates was from each team (designated as A tU ), and then compute the trust elements of teams using Equation (35).
A tU = arg max k A tk (36) Stage 15: Determine the peak of the C t from all teams (designated as C V ), where the winner from all teams, V , i.e.,  Flowcharts were delineated in Fig. 9 to simplify the procedures taken by the training phase and the validation phase.
The capability of ELM-MAS-CBS was applied to the CWS (circulation water systems) and GAST governor for power generation.

1) REAL-WORLD APPLICATION: CIRCULATING WATER SYSTEMS
The Circulating Water datasets was explained in (Case 2) section. Despite the hybridization of ELM and MAS as described, this section explored the enhancement of ELM-MAS's capability in dealing with CWS dataset after the Certified Belief in Strength was applied on the ELM neural network, i.e. individual agents of MAS. This means that the trust measurement was achieved based on strength and reputation of every agent. To form the trust management, strong elements associated with the ELM agents were gathered which let the CBS enhanced the capability in MAS.

2) REAL-WORLD APPLICATION: GAST GOVERNOR
The explanation of GAST governor dataset had been used in the (Case 2, part 2) section. VOLUME 8, 2020

A. (CASE 1) EXTREME LEARNING MACHINE WITH SINGLE INPUT RULE MODULE (SIRM-ELM)
The applicability of the SIRM-ELM model was investigated. Four benchmark regression datasets were obtained from UCI machine repository, namely Abalone, Balloon, Strike and Space-ga, to utilize for performance evaluation of SIRM-ELM. Only addictive Sigmoid hidden neuron (SigAct) was utilized in the analysis. Table 1 showed the details of the computer and software specifications which was used to perform all analysis in this paper. The specifications of the datasets were shown in Table 2.  In all experiments, four benchmark regression datasets with training and validation samples were calculated using the train-validation-test technique as suggested by literature [35].
The number of membership functions of an input attribute is tested for 1, 2 or 3, (i.e., j = 1, 2, 3) for all the regression datasets. In addition, the RMSE is based on default range for a j i and b j i for all rules (i.e., i = 1, 2 . . . , 3N ). Note that in SIRM-ELM, the number of fuzzy rule was equivalent to number of hidden neuron of ELM. For each dataset, the experiments were conducted for 50 times with random a j i and b j i and the mean results are documented. The outcomes of the proposed SIRM-ELM were also compared to the results of other ELM-based methods. As seen from Table 3, the RMSE of SIRM-ELM is better when compare with OS-ELM [19], SVM [19] and ELM [1].

1) REAL-WORLD APPLICATION: NO X EMISSION OF POWER GENERATION PLANT
A total of 3,405 data samples had been collected for training and testing of SIRM-ELM. Out of 3,405 data samples, 2,270 were trained while the balances of 1135 were tested (Table 4). An experiment is conducted on the testing datasets for fifty rounds and the mean results were recorded. The quantity of membership function of an input attribute was tested for 1, 2 or 3, (i.e., j = 1, 2, 3) and the results (Table 5).  Based on the results of Table 5, the a j i and b j i were in default setting (in Step 1). After the number of membership function of an input attribute was set as 1, the a j i and b j i need to be tuned in different ranges in order to get the lowest RMSE. All the tuning results were shown in Table 6. In the experiment of using ELM, two-third of the data samples were trained while the remaining one-third were tested through a validation process to calculate the utmost applicable number of neurons for L (parent ELM). For the sigmoid activation function, the training and validation processes were set with L = fifty units and after that amplified by an increase of fifty units. Table 7 showed the details of the testing processes and the corresponding results based on the sigmoid activation function. The results showed that the greatest RMSE obtained was 0.027086.
In essence, this section presented a framework of Extreme Learning Machine with Single Input Rule Module, which was deemed a significant innovation in ELM ideology (hereafter denoted as SIRM-ELM). Adopting Single Input Rule Module in the ELM hidden layer can be a good alternative to the commonly used activation function, i.e., Sigmoid (SigAct). SIRM-ELM had been tested with Sigmoid hidden neuron using benchmark regression datasets, i.e. Abalone, Balloon, Strike and Space-ga. The results in Table 3 demonstrated that OS-ELM [19], SVM [19] and ELM [1] were better in the proposed model.
Due to the exciting results in the benchmark studies, the SIRM-ELM was used and applied to the NO x emission in power generation plant.

B. (CASE 2) HYBRIDIZING ELM-MAS (EXTREME LEARNING MACHINE AND MULTI AGENT SYSTEMS)
In the following section, the performance of the ELM-MAS was tested using two benchmark datasets (namely Image Segmentation and Satellite Image). The description of the datasets was displayed in Table 8 [79]. Referring to the model proposed by Liang [79], the number of hidden neuron of each L (i.e., ELM k ) was fixed to 180 for Image Segmentation and 400 for Satellite Image. Two-third of the training samples were used for training while the remaining one-third were utilized to work out the most suitable number of neurons of the L 1 (i.e., parent ELM) through a validation process. For each type of the activation function of ELM-MAS, validation and training processes were started by setting L 1 = 10 units and then amplified by an increment of 10 units.
An experiment is conducted on the testing datasets for fifty rounds and the mean results are documented. As an example, Table 9 showed a summary of validation and training processes based on sigmoid activation function. From Table 9, the number of hidden neurons with the greatest validation outcome was chosen for ELM-MAS's performance evaluation. Table 10 defined the outcomes by means of ELM-MAS in the context of the test accuracy, training time (seconds),  along with the number of hidden neurons for different kinds of activation function. The best results in Table 10 are 89.96% for satellite image used Laplace Act. and 95.39% for image segmentation using Laplace Basis.
An evaluation was also made among other variants of ELMs and the proposed ELM-MAS, such as ELM [93] and ensemble ELM [52]. The test accuracy rates (ELM-MAS) were comparable to ELM (Sigmoid) as well as ELM (RBF) is shown in Table 11. VOLUME 8, 2020  The capability of ELM-MAS was applied to the CWS (circulation water systems) and GAST governor for power generation.

1) REAL-WORLD APPLICATION: NO X EMISSION OF POWER GENERATION PLANT
Based on Table 12, there were 2500 data samples collected in total and then divided into validation, testing, as well as training sets [94]. ELM-MAS was validated as well as trained to decide on the optimum quantity of hidden neurons prior to the commencement of the tests.
An experiment is conducted on the testing datasets for 50 runs and the mean results are documented. The test accuracy's results were shown in Table 13 and the peak test accuracy of ELM-MAS was 96.96%, accomplished by training with a Laplacian activation function. A comparison was also made between the proposed model trained using other classifiers and a Laplacian activation function, including SVM [94] as well as FAM [95]. In Table 14, the test accuracy rate of ELM-MAS was comparable to SVM [94] as well as FAM [95]. The result for SVM is the highest is because of the complexity of neurons.

2) REAL-WORLD APPLICATION: GAST GOVERNOR
For a standard operating gas turbine, all training data were gathered on the output of the GAST block, i.e. the mechanical power, P mech [96]. As listed in Table 15, there were 630 data in total were collected for all the 7 input features in the GAST. These input features were varied within their operating range values [97]. The datasets shown in Table 16 were pre-assigned into validation, test, along with training sets. An experiment is conducted on the testing datasets for fifty rounds and the mean results are documented. Table 17 displayed the outcomes for using ELM-MAS in the context of the number of hidden neurons, test accuracy, as well as training time in seconds for all activation functions in GAST. The greatest test accuracy rate in Table 17 was 76.79% in the Laplace Basis activation function. The result is the highest due to the low complexity of neurons. On the other hand, the results were also compared with SVM where the SVM result is 77.68%.  In the summary, this section described a new proposed model with two layers of ELMs which is called ELM-MAS. The ELM-MAS model was certified by using two benchmark datasets (satellite image and image segmentation). The results of ELM-MAS were comparable to ELM (Sigmoid) and ELM (RBF). Moreover, application on power generation system containing governor model (GAST) as well as CWS (circulating water systems) with this model was conducted for assessment. Thus far, the results showed ELM-MAS for CWS was comparable to other algorithms.
Even though outcomes attained from the applications in power generation as well as benchmark studies were encouraging, further research should be conducted with application in other fields for the validation of ELM-MAS Throughout this section, the capability of ELM-MAS-CBS is tested using three benchmark datasets (i.e. Wine, Pima Indians Diabetes (PID) and Iris). Each team had 5 agents (N 1 ) based on ELM. The number of teams had been set as 3 (T = 3) for the experiment. Only Sigmoid and RBF were used in this experiment. Table 18 shown the information of the datasets [54]. The experiments were run in MATLAB (ver.2010) on a private computer equipped with Core(TM) Intel(R) 8 G RAM 2.9 GHz CPU and i7.
Based on the [54], three benchmark datasets were valued using the adopted train-validation-test method in the experiment. An experiment is conducted on the testing datasets for 50 rounds and the mean outcomes are documented. The evaluation of the Wine was based on the tenfold cross-validation method. The meaning is that each Wine dataset was divided into ten subsets, where one is for validation, eight are for training and the remaining is for testing. As for the case of Iris, all of the data samples were used for training (10 % for validation and 90 % for training) as well as for testing. 20% of the PID samples were used to determine the most appropriate number of neurons (i.e., L) through a validation process while 60% were used for training. All the experiments were repeated 10 times.
Sigmoid activation function (SigAct) and Radial Basis Function (RBFun) were the two types of activation functions that were used in each benchmark datasets. The test accuracy rates in Table 18 are based on SigAct for Iris, Wine and PID. In addition, the test accuracy rates based on RBFun for the three benchmark datasets shown in Fig 10. The number of hidden neurons, L with the best test accuracy rate in both Table 19 and Fig. 10 were selected for valuing the capability of ELM-MAS-CBS. An experiment is conducted on the testing datasets for 50 rounds and the mean outcomes are documented.
The outcomes for using ELM-MAS-CBS in terms of the number of hidden neurons and the test accuracy for both activation function in the benchmark datasets is summarized in Table 20. The results of RBFun have the peak test accuracy rate as matched to the SigAct.
The ELM-MAS-CBS was matched with other approaches. As for the comparison, MACS-CBS (Iris datasets) is the highest compared with others but ELM-MAS-CBS (RBFun) is the highest in PID and Wine. Therefore, Fig. 11 displayed that the test accuracy rates of ELM-MAS-CBS were comparable.
The capability of ELM-MAS-CBS was applied to the CWS and GAST governor for power generation.

1) REAL-WORLD APPLICATION: CIRCULATING WATER SYSTEMS
The test was conducted after the ELM-MAS-CBS was validated and trained to discover the ideal number of hidden neurons. The outcomes of test accuracy were recorded VOLUME 8, 2020   in Table 21 with the highest test accuracy of 96.92% using Radial Basis activation function for training. Comparison to other classifiers showed that ELM-MAS-CBS was comparable to SVM [94] and FAM [95], as shown in Table 21. Due to the complexity of hidden neurons in ELM, however, the test accuracy of ELM-MAS-CBS is lesser than ELM-MAS.
2) REAL-WORLD APPLICATION: GAST GOVERNOR Table 22 summarized the outcomes for using ELM-MAS-CBS in the relation of the test accuracy and the number of hidden neurons for different kind of activation function in GAST. The finest test accuracy rate in Sigmoid activation function was 83.04%. In addition, the comparison between   Table 17 and Table 22, the test accuracy was better for ELM-MAS-CBS as compared to ELM-MAS. Lastly, an improved version of ELM-MAS model with certified belief in strength was proven. The proposed model was validated using Wine, Pima Indians Diabetes (PID) and Iris. The results of ELM-MAS-CBS were comparable to ELM (Sigmoid) and ELM (RBF). Moreover, the ELM-MAS-CBS was applied to the governor (GAST) and circulating water systems (CWS) for the power generation system. The results showed that ELM-MAS for CWS was comparable (if not superior) to other approaches.

ELM-MAS and ELM-MAS-CBS in
Even though results were reassuring, further research with the application on other fields are crucial to further validate ELM-MAS-CBS.

IV. CONCLUSION
This paper presented a framework of ELM (Extreme Learning Machine) with Single Input Rule Module (SIRM-ELM), a fresh model (ELM-MAS) with two levels of ELMs and an improved version of ELM-MAS model with certified belief in strength (ELM-MAS-CBS) was established. All those proposed models were validated by utilizing benchmark datasets. The number of hidden neuron is based on trial and error method [79], [98]. It required a tuning process to find a reasonable number of hidden neurons. The experimental outcomes demonstrated that the SIRM-ELM was better than with OS-ELM [19], SVM [19] and ELM [1], as shown in Table 3. For CWS (circulating water systems), the test accuracy rates of ELM-MAS was comparable (if not superior) to other algorithms. Lastly, the comparison between ELM-MAS and ELM-MAS-CBS in Table 17 and Table 22, the test accuracy is better for ELM-MAS-CBS as compared to ELM-MAS. Most importantly, a new development of the hybrid ELM with MAS and SIRM is comparable (if not superior) to other algorithms.
Even though outcomes attained from the applications in power generation as well as benchmark studies were encouraging, further research should be conducted with application in other fields for further validation of SIRM-ELM, ELM-MAS, and ELM-MAS-CBS.