Classification of coal deposited epoxy micro-nanocomposites by adopting machine learning techniques to LIBS analysis

Epoxy micro-nanocomposite specimens incorporated with 66 wt% of silica micro fillers and 0.7 wt% of ion trapping particles as nano fillers, are coated with four different variants of coal. The conductivity of the coal deposited samples is observed to be in direct correlation with the percentage carbon content present in the coal samples. The epoxy micro-nanocomposite specimens coated with different variants of coals were successfully classified by using Laser induced breakdown spectroscopy (LIBS) assisted by various machine learning techniques. It is noticed that the classification through Logistic regression method (LRM) has reflected a higher training as well as testing accuracy of 100% and 98%, respectively when compared to other machine learning methods.


Introduction
Polymeric insulating materials, particularly epoxy nanocomposites, are gaining popularity for usage as an insulating material in power apparatus like rotating machines, dry type transformers and also as spacers in gasinsulated substations (GIS) [1]. Recent studies have investigated the behaviour of epoxy resin after being reinforced with nano sized particles such as silica, alumina, titania and boron nitride. They exhibited enhanced properties such as high surface flashover voltage, high volume resistivity and low dielectric loss [2,3]. Epoxy resins reinforced with 60 wt% of micro-fillers are extensively utilised in industrial applications. Higher thermal conductivity along with enhanced breakdown strength can be achieved by introducing small wt% of nanofillers into epoxy micro composites [4]. Silica micro-fillers added to epoxy composite insulators result in a reduced coefficient of thermal expansion and better mechanical properties [5]. IXEPLAS ® is a zirconium phosphatemodified hydrotalcite compound. Zirconium phosphate is well-known as a refractory material with excellent ion trapping capacity and good oxidation resistance [6]. As a result, it can be employed as a filler particle to improve the reliability of epoxy-based insulation structures. In addition, the effects of IXEPLAS ® addition after being exposed to various harsh conditions must be explored.
Polymeric insulators are extremely vulnerable to contamination induced by environmental factors, human activities and industrial emissions from the surrounding ambience [7][8][9]. These factors tends to reduce the performance of the insulator and results in the early degradation of the insulating material [10]. In general, industrial pollution consists of gases and solid particles produced by combustion operations. Due to gravity and electrostatic forces, suspended carbon particles (soot) may be adsorbed onto the HV insulator surface when combustion products are transported by wind [11]. Because of the ionized atmosphere surrounding the insulator, the soot particle layer on HV insulators becomes conductive, causing a distortion in the electric field distribution. The deposits of these free carbon particles (soot) can possibly affect the flashover activity of polymeric insulators [11]. Deposition of coal on insulators is one of the major issues in the location such as coal mining areas, thermal power plants and brick kilns, which degrade the surface properties of the insulating material. Douar et al have explored the effect of a non-uniform pollution layer on the surface of insulators on their flashover voltage and have stated that flashover voltage had decreased linearly with the increase in the width of the pollution layer on the insulator surface [12]. According to the literature, moisture plays a significant role in the formation of a pollutant layer on insulation structures, which enhances pollution severity [11]. As a result, in order to have reliable insulation structures, it is critical to investigate the impact of pollution severity over the insulator surface.
Various laser spectroscopic tools such as laser induced breakdown spectroscopy (LIBS), laser absorption spectroscopy (LAS) and laser induced fluorescence (LIF) are getting popular recently [13,14]. Of these, laser induced breakdown spectroscopy (LIBS) is widely applied in a variety of fields due to its robustness, simplicity, minimal sample destruction, standoff capability and broadband detection [15][16][17][18]. LIBS is a qualitative and quantitative approach for determining the elemental composition of materials [19][20][21]. LIBS technique is also capable of the identification and classification of the polymers and other organic samples [22][23][24][25]. Using LIBS technology, the elemental composition of various contaminants and their compounds on insulating materials may be easily determined [26]. Wang et al employed LIBS analysis to detect pollutants on high-voltage transmission line insulators and discovered a linear relationship between element concentrations and their normalized intensities of LIBS spectral peaks [27]. LIBS technique has been employed by several researchers for coal analysis [28,29]. LIBS assisted with various machine leaning (ML) techniques has become a popular tool for material classification as well as regression [30][31][32][33]. Peng et al classified substances such as coal, municipal sludge and biomass, and found that when K-means clustering and support vector machine (SVM) work together, the accuracy of the hybrid classification model is above 98 percent [34]. Using K-nearest neighbours (KNN) and SVM classifiers, Li et al employed a multivariate statistical technique paired with LIBS to classify soft tissues and found that the accuracy was over 99.83 percent [35]. Also, Rzecki et al have employed several classifiers based on neural network methods such as generalized regression neural network (GRNN), probabilistic neural network (PNN), multi-layer perceptron (MLP) and classifiers based on other ML methods such as decision trees (DT), SVM, KNN and random forest (RF), to classify the paper-ink samples [36]. By considering above points, in the current work, an attempt has been made to classify the coal deposited epoxy micro-nanocomposite samples with respect to the type of coal, using LIBS analysis assisted by ML techniques such as SVM, KNN, MLP, and logistic regression model (LRM). Also, in the present study, a comparative study has been planned in between these ML techniques in terms of classification accuracy of training as well as testing data.
With these inputs, experimental research followed by analysis has been carried out, (a) to determine the conductivity of coal deposited epoxy micro-nanocomposites (b) to perform elemental analysis from the LIBS spectral peaks of the test specimens and (c) to employ machine learning techniques like KNN, MLP, LRM and SVM, on LIBS spectral data for classifying the coal deposited epoxy micro-nanocomposite specimens with respect to their conductivity values.

Experimental studies 2.1. Sample preparation details
Base epoxy resin was reinforced with crystalline silica (SiO 2 ) micro filler of 66 wt% and IXEPLAS ® (ion trapping nanofiller) of 0.7 wt% to prepare the epoxy micro-nanocomposite specimen used in the current study. The average diameter of the SiO 2 fillers is 14 μm, while IXEPLAS ® has a diameter of 200-500 nm. In-situ method of surface modification is employed in the present study by mixing the base epoxy resin, SiO 2 microparticles, IXEPLAS ® nanoparticles, hardener, and silane coupling agent together. The samples were prepared using standard shear mixing, degassing, casting, and curing procedures respectively (figure 1) [37]. Flat-type specimens with dimensions 30×30×2 mm 3 were used for performing experiments in the present study.

Coal deposition details
Four different types of coal samples have been deposited on the epoxy micro-nanocomposite samples, as a nonsoluble pollutant. The four types of coal samples are termed as A, B, C and D respectively. The percentage elemental carbon content of various type of coal used in the present study i.e. A, B, C and D are 32.6±1.9, 52.8±0.2, 56.9±0.3 and 49.6±4.1 respectively [28]. Coal powder is mixed with deionized water to make a slurry, which is then uniformly coated over the surface of test specimens to form a thin layer. The weight of the coal powder for coating is selected such that the non-soluble deposit density (NSDD) value is maintained as 1 mg cm −2 [38]. The electrical conductivity of the slurry prepared from the mixture of powdered coal sample and deionized water, has been measured with the help of conductivity meter (Spectralab Multi para MP-8). Figure 2 depicts the LIBS experimental setup. A Q-switched Nd 3+ : YAG laser (LAB-150-10-S2K, Quanta-Ray LAB series, Spectra Physics) was used to generate a 1064 nm beam with a pulse duration of 10 ns and a repetition rate of 10 Hz. By changing the flash lamp Q-switch time delay, the laser pulse energy was set at 40 mJ. The laser was focused with a 25 cm focal length lens at a 90°angle to the target sample with a spot diameter of 0.5 mm in the current work. For the laser energy of 40 mJ and a spot diameter of 0.5 mm, the fluence is determined as 203.72 kJ/m 2 . The optical emission is focused by a lens with a focal length of 100 cm and recorded by a spectrometer (Ocean Optics USB2000+UV-vis-ES) through an optical fibre with a core diameter of 400 m and a NA of 0.22. The spectrometer has a full width half maximum optical resolution of 0.3 nm. In the current work, the spectral data was evaluated in the range of 200-800 nm.

Results and discussion
3.1. Conductivity measurement Figure 3 represents the conductivity values of contaminants with respect to type of coal deposition on the test specimens. Each bar graph depicted in figure 3 is an average of 6 conductivity values and the error-bar represents the standard deviation of conductivity at each type of coal deposition. It is noticed that the conductivity of the test specimens increases with increase in the percentage of carbon present in the coal samples. Since, the carbon has high conductivity, higher the percentage of carbon content in the coal sample, higher will be the overall  conductivity of the coal pollutant. Figure 4 presents a direct correlation between the conductivity and the percentage carbon present in the coal samples. The conductivity values are correlated to the percentage carbon present in the coal samples with a correlation factor of 0.9856. Figure 5 shows the LIBS spectra of epoxy micro-nanocomposites recorded by ablating the sample surface with 40 mJ of laser energy. Peaks corresponding to Nitrogen (N), oxygen (O), carbon (C), silica (Si) and Zirconium (Zr) are identified, commonly in the LIBS spectra of the test specimens, using NIST database [39]. The LIBS spectral data of epoxy micro-nanocomposite specimen coated with 4 different types of coal is considered as input dataset for classifying them using four different machine learning algorithms such as K-nearest neighbours (KNN), multilayer perceptron (MLP), logistic regression model (LRM) and support vector machine (SVM). The open-source machine learning Python library Scikit-learn is used for analysing LIBS spectral data in the present study.

K-nearest neighbour (KNN)
K-nearest neighbour is one of the simplest machine learning algorithms based on the concept of supervised learning. KNN is a well-known technique for classification purposes and it is developed on the Euclidean distance between a test sample and the training sample. For a set of N samples with p features belonging to M samples, an input sample of X i is assumed with i varying from 1 to N. X i is represented as The distance in space between any two points i and j is expressed by using (1) KNN algorithm works on the distance between the given point and the training data set, based on the nearest K points in the training sample. The nearest point to the given point is expressed by using (2).
, ;j 0 ,N 2 i j j m i denotes the nearest neighbour to a random sample point ¢ x and the above equation incorporates the gradient over the distance in space between a point x j and ¢ x (According to equation (1)). The random sample point ¢ x belongs to the actual class m and predicted class ¢ m . The given point in a class is selected based on the most occurring class in the set of nearest neighboring points. The nearest neighbour rule enables the prediction as ¢ m for the given point, the prediction is right if = ¢ m m , where m is the class of the given point. If ¹ ¢ m m , the predicted class is incorrect. In the present study, the coal deposited samples based on the conductivity were distinguished successfully using the machine learning technique KNN adopted LIBS analysis. The input dataset has been divided into two sets, training dataset (75%) and test dataset (25%). The input and the output parameters are fed to the KNN classifier with the most optimum value of hyperparameters that gives maximum classification accuracy. Figure 6 represents the confusion matrix which gives the better visualization of the KNN classification of coal deposited epoxy micro-nancomposite samples. The percentage of classification accuracy (in the scale of 0%-100%) for  each individual sample have been depicted in the confusion matrices An accuracy of 90% while training, 80% during testing and 87.5% with overall dataset has been obtaiuned from the KNN classifier.

Multilayer perceptron (MLP) classifier
A multilayer perceptron (MLP) belongs to the class of feedforward artificial neural networks (ANN). MLP utilizes a supervised learning technique called backpropagation for training. It consists of three layers of nodes: an input layer, a hidden layer, and an output layer. Its multiple layers and non-linear activation distinguishes MLP from a linear perceptron and allows it to distinguish data that is linearly inseparable. MLP classifer present in sklearn.neural network library utilizes underlying neural networks to perform the task of classification. The MLP classifier maps input dataset onto a set of appropriate outputs. The nodes of the layers are neurons which use nonlinear activation functions to classify into various categories. The given dataset of coal deposited test samples with four different types of coal (Type A, B, C and D) is split into training dataset (75%) and test dataset (25%). The features of training and test data are scaled appropriately using standardscaler () module to ensure uniformity among the features. The MLP classifier present in sklearn.neural network is called with appropriate values for all the hyperparameters such as hidden layer size value of 100 and activation function such as rectified linear unit (ReLu). The weight optimization can be influenced by the solver parameter. The solver parameter used here is the 'Adam' solver which is a stochastic gradient-based optimizer that yields the most optimum value of output. The model is fit to the training and test data input features and the output classes are suitably predicted. Figure 7 represents the confusion matrix of the training dataset and the test dataset, obtained by MLP classification. The classification through MLP technique has reflected training accuracy of 100%, testing accuracy of 94% and overall accuracy of 98.5%.

Logistic regression model (LRM)
Logistic Regression technique is used to estimate the probability that an instance belongs to a particular class. If the estimated probability is greater than 50%, then the model predicts that the instance belongs to that class (called the positive class, labelled as '1'), or else it predicts that it does not (i.e., it belongs to the negative class, labelled as '0'). Logistic Regression model computes a weighted sum of the input features added to a bias term, but instead of outputting the result directly like the Linear Regression model does, it outputs the logistic of this result. The logistic function used is generally the sigmoid function which outputs a probability value between 0 and 1. For an input x, probability value p and output y, the predictions of Logistic regression follow as below: The given dataset of coal deposited test samples with four different types of coal (Type A, B, C and D) is split into training dataset (75%) and test dataset (25%). The dataset consists of 2048 features with varying levels of importance with the last column defining the type of coal deposit. The Logistic Regression model (LRM) is imported from sklearn linear model library. An LRM is built and the training data is fit to the same. The test data is used to predict the outcomes of the classification model. Figure 8 represents the confusion matrix of the training dataset and the test dataset, obtained by LRM. The classification during training and testing have reflected an accuracy of 100% and 98% respectively by using the LRM, resulting in overall accuracy of 99.5%.

Support vector machine (SVM)
Support vector machines are one of the most dynamic prediction methods which is based on statistical learning approaches. SVM maps the training examples to different points in space to maximize the width of the gap between the two classes. The new training examples are then mapped into that same space and predicted to belong to a class based on which side of the gap they fall. Apart from performing linear classification, SVM can also proficiently perform non-linear classification using the kernel hyperparameter wherein the inputs are mapped into polynomial and high-dimensional feature spaces. The hypothesis function for the SVM Classifier can be defined as, The point above or on the hyperplane will be classified as class +1, and the point below the hyperplane will be classified as class −1. The given dataset of coal deposited test samples with four different types of coal (Type A, B, C and D) is split into training dataset (75%) and test dataset (25%). The dataset consists of 2048 features with varying levels of importance with the last column defining the type of coal deposit. The SVC Module that supports the support vector machine is called from the scikit-library. Since the data points are inclined to behave more like a non-linear classification problem, one of the ways to tackle is to add features computed using a similarity function that measures how much each instance resembles a particular landmark. The similarity function used here is the Gaussian Radial Basis Function (RBF). The RBF Kernel SVM Model is built using the pipeline feature which also incorporates StandardScaler() Module that scales all the features to maintain uniformity among them. This model is fit to the training dataset and the same is used to predict the output on the test data. Figure 9 represents the confusion matrix of the training dataset and the test dataset, obtained by SVM. The classification through SVM technique has reflected training accuracy of 100%, testing accuracy of 90% and overall accuracy of 97.5%. Figure 10 depicts the classification accuracy of training as well as test datasets obtained through various machine learning algorithms used in the present study. It is noticed that the classification through Multinomial Logistic  Regression technique has reflected higher testing accuracy of 98% when comapred to ther machine learning methods. Yang et al, have adopted machine learning techniques like KNN and SVM to LIBS spectral data for classifying different types of iron ore samples and have indicated that the SVM assisted to LIBS data have reflected higher classification accuracy compared to KNN method [40]. Similar results have been observed in the present study, with SVM techqniue having higher classification accuracy compared to KNN technique. The neural network based MLP technique has relfected higher classification accuracy compared to other ML techniques like KNN and SVM technique. Overall, the LRM technqiue have shown higher classification accuracy compared to all the ML techniques used in the present study. Thus, the above results showed that the combination of LIBS and these ML techniques can improve the classification accuracy of coal deposited epoxy micro-nanocomposites. Coal deposition on insulators with different elemental compositions which are similar in appearance to naked eye can be distinguised easily with the help of LIBS assited by machine learning techniques. The accuracy and predictive capability of LIBS assisted by machine learning techniques on the contamination of insulation structures could possibly serve as a cost-effective tool to minimize the need for performing large number of experimentations on insulation structures.

Conclusions
The important conclusions arrived based on the current experimental studies and analysis are the following • The conductivity of the coal deposited samples is found to be in direct correlation with the percentage carbon content present in the coal samples, with a correlation factor of 0.9856.
• The epoxy micro-nanocomposites coated with different variants of coals were successfully classified by using LIBS assisted by various machine learning techniques such as KNN, MLP, LRM and SVM.
• Classification through LRM technique has reflected a higher tfraining as well as testing accuracy of 100% and 98% respectively, when compared to the other machine learning methods used in the present study. Therefore, the above results have indicated that the LIBS assisted by machine learning algorithms is a practical solution for a precise classification and rapid analysis of contamianted insulating structures, which could further improve the utilization efficiency of these insulation structures in the power system network. Figure 10. Classification accuracies with training, testing and overall dataset obtained through various machine learning algorithms.

Data availability statement
The data generated and/or analysed during the current study are not publicly available for legal/ethical reasons but are available from the corresponding author on reasonable request.