Binary Bitwise Artificial Bee Colony as Feature Selection Optimization Approach within Taguchi’s T-Method

Department of Mechanical Engineering, Universiti Tenaga Nasional, Kajang 43000, Selangor, Malaysia Razak Faculty of Technology and Informatics, Universiti Teknologi Malaysia, Jalan Sultan Yahya Petra, Kuala Lumpur 54100, Malaysia Institute of Engineering Mathematics, Universiti Malaysia Perlis, Kampus Pauh Putra, Arau 02600, Perlis, Malaysia Faculty of Manufacturing and Mechatronic Engineering Technology, Universiti Malaysia Pahang, Pekan 26600, Malaysia


Introduction
Taguchi's T-Method, which was explicitly developed for predictive analysis, is one of the Mahalanobis Taguchi System's (MTS) variants that has been increasingly used by researchers and industrial practitioners in Japan and other countries. Taguchi's T-Method was proposed for multivariate estimation to predict the integrated estimated output value. In the 1980s, Dr. Genichi Taguchi developed the Mahalanobis Taguchi System (MTS) as a pattern recognition technique that blends Mahalanobis Distance (MD) theory and Taguchi Robust Engineering concept to systematically and effectively classify and predict data in a multidimensional environment [1][2][3][4][5][6].
MTS establishes a multivariate measurement scale that recognizes a normal or healthy observation from an abnormal or an unhealthy observation and integrates it with the concept of signal-to-noise ratio (SNR) and orthogonal array (OA). Beginning with the introduction of the MT-Method as a classification technique that has so far gained much attention among scholars [7][8][9][10][11][12][13][14], Taguchi's T-Method has been established since then, which has utilized the same integration principles. e unit-space concept, the duplicate signal-to-noise ratio (SNR) adaptation as a weighting factor, zero-proportional theory, and OA as the feature selection optimization are the main elements that have been adopted in reinforcing Taguchi's T-Method robustness.
One of Taguchi's T-Method significant advantages is its ability to predict even with limited sample data. In multiple regression analyses, a limitation exists in which the sample size has to be higher than the number of variables. On the contrary, the said limitation does not apply to Taguchi's T-Method. Additionally, Taguchi's T-Method has no direct influence from multicollinearity since individual regression has been considered [2,15,16]. Based on the number of papers published in the literature, Taguchi's T-Method studies' progress is moving towards optimizing parameters and optimizing feature selection rather than just application purposes since the year 2012 [17][18][19][20]. e increasing pattern has indirectly triggered that there are indeed a variety of enhanced approaches towards parameter and feature selection optimization available out there that can be further explored and incorporated into Taguchi's T-Method as a hybridization or integration element.

Taguchi's T-Method for the Feature Selection Optimization
Problem. In MTS, the orthogonal array (OA) is a feature selection search mechanism that has been established between a series of MTS, including Taguchi's T-Method, which share standard procedures but vary in their objective function determination. e OA element within MTS has been debated and is believed to be insufficient as it offers a suboptimal solution [21,22]. Most OA's concerns are based on its restriction in having appropriate combinations of features to be assessed and evaluated in the search for optimality, as it relies on a fixed scheme [20,23]. e authors of [24] argued that the fixed combination in OA is not optimal since the results may vary significantly if the column-tocolumn information is rearranged [6]. In [25], the authors agreed with the authors of [24] after proving 1000 random variables to the column assignment. Issues in OA have been highlighted as well by [26,27], especially the fact that the OA design has a limitation in handling the higher-order interaction between variables, which might lead to an inconsistency in the identification of the significant variables [24,25,[27][28][29]. erefore, developing a hybrid methodology for better accuracy is a preferred solution to these concerns that drove this research's primary motivation.
Until recently, the OA element in the MTS classification approaches has been continuously improved by numerous machine learning algorithms. However, enhancing the OA element within Taguchi's T-Method as a prediction tool is still at an initial stage. In [30], the authors applied a stepwise forward and backward selection procedure for this purpose which showed an increase in accuracy in many cases conducted [30]. e author of [31] suggested a Binary Artificial Bee Colony (BABC) algorithm, and the findings revealed that T-Method + BABC worked better than T-Method + OA in a particular case study conducted [31]. e most recent reported study by [32] has specifically addressed OA's downside and suggested Binary Particle Swarm Optimization (BPSO), which indicates an increase in accuracy for specific case studies [32]. e published literature on OA improvement in Taguchi's T-Method is found not utilizing the generalization aspect thoroughly and focused on a somewhat limited case study. e previous research by [31,32] was further expanded in this study by proposing the other variant of binary ABC called Binary Bitwise ABC algorithms with proper generalization aspect been amended into it, which is the application of bootstrap cross-validation.

Taguchi's T-Method.
Regression analysis aims to construct a mathematical model that describes and explains the relationship between variables for prediction or a study of causal relationships [33]. Taguchi's T-Method, which is driven by similar purposes, was built to forecast the unknown value of the output variable concerning the established value of the input variables by statistically evaluating the relevant correlation and functional relationship between those variables through a specific developed linear regression model to compute the integrated estimate output value.
e integrated estimate output model in Taguchi's T-Method consists of some additional elements that differentiated it from standard linear regression: (1) zero-point proportional term, (2) inverse regression model, (3) unitspace concept, and (4) weightage SNR. All these elements have been embedded into the existing Taguchi's T-method model described by [34] to generate the specified integrated estimated model, as shown in equation (1). Taguchi's T-Method as well utilizes the ordinary least squares approach to calculate the proportional coefficient, β which is a common approach in linear regression. Equations (2)-(7) govern the inclusion of dynamic SNR as a weightage factor for each feature within the model [35]: Total variation, S Tj � Z 2 1j + Z 2 2j + . . . + Z 2 lj for j(number of features) � 1, 2, . . . , D, variation of proportional term, S βj � Error variation, S ej � S Tj − S βj , It is seen that the higher SNR of an item will contribute to a greater degree of contribution to overall model estimation.
e integrated estimate SNR (dB) is computed based on the result obtained using equation (1). e integrated estimate SNR, η (dB), is a performance measure to evaluate the input variable's relative importance towards the output variable. To further increase the model accuracy, optimization concerning the selection of features is considered a value-added approach within Taguchi's T-Method. Equations (8)- (13) are used for calculating the SNR (dB) for feature selection optimization, which as shown below. e evaluation of the relative importance of features is conducted using the two-level orthogonal array (OA). OA with a predetermined combination of "use" and "not use" of features allows for comparison of integrated estimate SNR (dB) under the setting. Table 1 shows the example of L 12 orthogonal array with Level 1 in the array indicates that the variable will be used, while Level 2 indicates that the variable will not be used during the simulation study. Evaluation of relative importance of features is performed by computing the new integrated estimate SNR (dB) when the features are not used in computation and observed the increment or deterioration of the value. A higher integrated estimate SNR (dB) value is preferred, and a combination of input variables that yields optimal integrated estimate SNR (dB) is selected as an optimal combination: Total variation,

Binary Bitwise Artificial Bee Colony (BitABC) into Taguchi's T-Method for Feature Selection Optimization.
is research's binary approach is similar to the orthogonal array (OA) concept in existing Taguchi's T-Method. e Binary ABC was explicitly developed for the feature selection optimization process by changing the information of each identified food source update to the discrete-binary data type to be "1" or "0." e primary food source (Xi) is randomly initialized by following the identified bee's population size (NP/2 � N) and the total number of features (D) using discrete-binary data (1 or 0). e primary objective function, which is to maximize the SNR (dB) value, is then computed. e best SNR (dB) are selected as Global_max and its binary combination as Global_para. e employed bees will continue searching for a better food source, which will make a little change based on their nearby information memory and create a new source. e objective function, SNR (dB) value, is computed then and been compared to primary sources. e higher SNR (dB) value will be memorized, while the lower will be forgotten. If the previous SNR (dB) value is higher than the existing candidates, the value will remain. is decision process is called greedy selection. e employed bees will then share the information on the new position to onlooker bees once they return to their hive in the dance area. e onlooker bees will then evaluate the new position and choose to emphasize the food source's information, relying on the probability rate calculated. e onlooker bees will modify the position if the criteria are fulfilled, and SNR (dB) amount will be recalculated and updated following greedy selection criteria. e employed bee that cannot improve their position up to the defined limit will be abandoned and become a scout bee. A scout bee will randomly search for a new food source near the surrounding area of its hive. e cycle is repeated until it reaches the maximum number of cycles. e Global_max and Global_para at the maximum cycle are updated accordingly. e method used by the bees (employed and onlooker) to search for the new food source which having more nectar amount within its neighborhood are following the approach introduced by Jia, Duan, and Khan [36] called binary Bitwise ABC (BitABC). Bitwise operators often transform an image into a binary number and represent a series of 0s and 1s on the computer [36]. However, only the logic operator results are adopted in the study conducted by Jia, Duan, and Khan [36], as it has similar characteristics with the binary space (0 and 1). e bitwise operator (∧, &, and |) to describe the trajectory of the food source within this study is illustrated in Table 2 and equations (14) and (15).

Data Preparation and Selection.
e optimum features are selected based on the total number of use items ("1") produced by each feature across the run's total number. e combination of use item ("1") at each run represents the combination features contributed to the most optimum SNR (dB) value across the maximum cycle iteration. In demonstrating the proposed algorithm's stability and consistency, 70% of the training dataset from 20 different independent runs were set, and features that appear to be selected more than 10 times (more than 50%) are selected as the optimum features. e optimum features will be used to validate the remaining 30% validation dataset. e 70% training dataset follows the bootstrap cross-validation analysis during the training phase, which segregates the training and test set into 63.2% and 36.8%, respectively, with 1000 bootstrap cycles. e risk of overfitting is being considered and monitored accordingly within this study.
For better comparative purposes, despite the current Taguchi's T-method, the outcome of Bitwise ABC's optimum features has also been compared to another metaheuristic algorithm variant called Probability Binary Particle Swarm Optimization (PBPSO) [32] as well as the existing Taguchi's T-Method with full features and Taguchi's T-Method with optimal features provided by OA analysis [35]. Several simulations were performed on eight realworld datasets on prediction and regression with multivariate cases in assessing the suggested algorithm. Six out of eight datasets were obtained from the University of California at Irvine (UCI) Machine Learning Benchmark Repository [37]. e other two datasets were taken from the actual case study.
Both the BitABC and PBPSO are being set by the parameter configuration listed in Table 3. e optimization of all the algorithms within this study was constructed using Matlab R2018a application software. e programming algorithm compiled on 64 bits Sony VAIO VPCCA notebook with Intel i5 (2.3 GHz) 4 Gigabytes RAM capability and 212 GB data storage. e pseudocode of the proposed BitABC algorithm into Taguchi's T-Method is shown in Figure 1.

Performance Measure.
Prediction is an iteration method involving model creation before performance evaluation, then proceeds to repeat the cycle until a satisfying solution is encountered. roughout this study, two performance criteria are used to evaluate the developed algorithm's performances: the prediction accuracy and convergence rate of training, testing, and validation dataset.
mean squared error (RMSE), mean absolute percentage error (MAPE), and several others. In practice, the regression prediction model accuracy must be estimated over the training and validation sets and are independent of one another. In this study, after the optimum features have been identified, the integrated estimate value, M predicted will be calculated as indicated by Equation (1). MAE formula was applied for the prediction model accuracy as shown in Equation (16). e MAPE measure has also been applied in this study to provide the final increment percentages of the optimal approach toward existing Taguchi's T-Method that uses full features, as shown by equation (17):

Results and Discussion
e feature selection analysis findings are addressed according to the respective case studies presented in this research using the defined integrated estimate model shown by equation (1) previously. Despite focusing on the MAE results and its SD value, the discussion is also guided with several other performance measures such as the convergence plot of the SNR (dB) value as the objective function and also MAE for the training and testing phase. Table 4 and Figure 2 illustrate the example of the performance analysis for the heating load case study. Researchers often use this dataset to interact with several other techniques that rely on regression analysis [38,39]. Similar procedures were applied to the remaining seven datasets applied within this study. e explanation of the heating load case study will provide a general idea of how the other case studies are analyzed in terms of their MAE trend for both training and testing, as well as the SNR (dB) convergence plot. e validation phase is summarizing the overall case studies considered within this research. In providing a more explicit description of how each outcome reflects the overall prediction analysis, the effects of the SNR (dB) and MAE for the training and testing are illustrated by the convergence plot shown in Figures 1(a) and  1(b). e result reveals that the T Method-BitABC is the most optimum approach with the highest SNR (dB) value compared to the T Method-PBPSO, T Method, and T Method-OA. e trend aligned with MAE's trend for the training and testing phase, with T Method-BitABC performing better prediction accuracy with lower MAE value than T Method-PBPSO.
As seen in Table 5, the validation phase results indicate the result of the trained model performance towards the validation dataset with the case studies having more than 30 sample data (large dataset), while Table 6 summarized for the case study having less than 30 sample data (small dataset). Table 5 indicates that the result of T Method-BitABC and T Method-PBPSO reflect the same MAE performance. is is possible due to similar optimal features' selection results gained from the training and testing phase. e improvement percentages range from 13.99% to 32.86% across three different case studies (Abalone, Heating, and Cooling). Body fat and Concrete Compressive Strength cases show that Taguchi's T-Method maintains the best compared to others, while T Method-OA is the best for the Auto MPG case study, which contributes to 45.71% improvement compared to Taguchi's T-Method. e trend for the small sample case studies is a little bit of contrast. e result for both T Method-BitABC and T Method-PBPSO seems to differ from each other. T Method-BitABC provides better performance for the JD dataset with 9.07% improvement compared to Taguchi's T-Method. T Method-OA provides the best result for the Chiller dataset with 9.54% improvement compared to Taguchi's T-Method. e analysis results shared explicitly represent how well the T Method-BitABC approach is well reflected in several case studies. A few findings could be further investigated, which implicitly represent the analysis results identified. e findings shall be summarized as follows: e adoption of BitABC into existing Taguchi's T-Method replacing the OA is found not suitable for the body fat case study. Body fat is a case study with a normal distribution trend and has a stable output

Mathematical Problems in Engineering
Enumerate unit-space, η , β, r, l as fix value in evaluating the objective function (snr_est) and... A random value between 0 to 1 is generated for an onlooker bee to compare with... … the calculated probability, Nomfit value of a food source.
This food source (i) is selected by onlooker bee and step 8 to 12 is followed… … for new food source Note: Mean Absolute Error (MAE) is calculated at each cycle to see the accuracy of Global_para in improving the model accuracy while optimum_para is tested using validation data.
Keep the best solution between current and candidate solution, update Xi position Update the Globe_max value (the maximum SNR(dB) ) and ... ...Global_para (the best among Xi position with maximum SNR (dB)) If the counter value of a food source is the maximum among those of food sources and … exceeds limit,

% a new food source for (ind) is created by a scout
following uniform distribution [0,1] Figure 1: e pseudocode of the proposed BitABC algorithm into Taguchi's T-Method. 6 Mathematical Problems in Engineering performance than other cases [40].
e adoption of feature selection optimization does not provide a better trend on this type of data since the combination features are already appropriate for the model.
e Concrete Compressive Strength dataset shows how the quality of the data within each analysis affects the analysis result. By considering randomness and variation effect within datasets, it is possible to have slightly different trend results. From the result in Table 5, the slightly different trend between T Method, T Method-BitABC, and T Method-PBPSO shows that the proposed algorithm should provide a better deal since just   Mathematical Problems in Engineering exploration and exploitation search mechanism within the ABC algorithm might be the main reason for this trend since small sample data are susceptible to variation. e bootstrap, adopted as the cross-validation element, helps in reducing the risk of overfitting across training, testing, and validation dataset.

Conclusion
e adoption of BitABC into Taguchi's T-Method replacing the OA is shown feasible in this study. e result analysis shows that 4 out of 8 case studies reflect that BitABC adoption provides better performance than existing Taguchi's T-Method. e other case studies vary with minimal MAE differences and provide fewer significant features to be considered. Even though the trend result for both BitABC and PBPSO is similar for the large dataset, the small data samples reflected that BitABC provides much better prediction results. It was apparent that the merging of the BitABC into the current Taguchi's T-Method optimization technique to increase the SNR (dB) and predict the accuracy of the predicted integrated model was indeed practical. Further development studies should also focus on improving parameter estimates' robustness to ensure an established integrated estimated output model is reliable, especially for small sample data analysis.

Data Availability
Data are available within the repository of the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.