State Recognition of Mill Load Based on Improved K-means Clustering Algorithm

In this work, a new identification method for the load identification of ball mill based on improved K-means clustering is proposed. Since the external device response signals of ball mill are closely related to its load during the operation, the appropriate parameters are selected. The number of clusters and initial clustering centers are meliorated based on their density and genetic algorithm. The dimensionality of input features are reduced with kernel principal component analysis (KPCA), and the final clustering centers are obtained by iterating in the space with reduced dimensionalities. Tests on datasets have demonstrated the higher accuracy and stability of our improved K-means method compared with other methods. The case-analysis results show that the mill load status can be identified clearly according to the displacement variations of cluster centers calculated by this method. Our research provides a new reference for mill load identification in the viewpoint of multi-source information fusion.


INTRODUCTION
The ball mill is the key equipment in the concentration plant for ore beneficiation. Statistics show that approximately 5% of the power generated in China annually is consumed by ore grinding, and more than 50% of steel ball is consumed by the mineral concentration industry, and the energy consumption of ball mills accounts for 50~60% of that in the entire beneficiation process [1,2]. Ball mill load directly affects the production efficiency and energy consumption of beneficiation equipment and systems. Therefore, identifying the load status during operation and maintaining the ball mill under ideal load through timely operation and control can greatly facilitate the efficiency, safe production, and energy conservation.
It is complicated and difficult to directly identify the ball mill load, which can only be indirectly characterized or presented by some key variables. Because ball mill operation involves many external devices, such as the machine base, the motor, the classification mechanism, and the transmission mechanism, the external response signals of ball mill can only be used to indirectly characterize the load. Based on a parallel RBF kernel function or a neural network, reference [3] proposed to measure the ball mill load by identifying a set of parameters including coal supply, hot air flow, recirculated air MMEAT 2020 Journal of Physics: Conference Series 1676 (2020) 012203 IOP Publishing doi: 10.1088/1742-6596/1676/1/012203 2 flow, coal mill outlet temperature, coal mill inlet and outlet pressure difference, and coal mill current. References [4,5] established a soft measurement model with a multi-sensor information fusion algorithm to predict the internal load parameters of the ball mill based on the vibration, grinding sound and power signals of ball mills. Reference [6] measured the inlet and outlet pressure difference, hot air flow, inlet air negative pressure, outlet pulp temperature, recirculated air flow, and feed volume of double-in and double-out ball mill and integrated the vibration, vibration sound and useful power information to construct the BP neural network prediction model, which can identify the ball mill load to a certain extent. However, this method failed to analyze the correlation of the external features of the mill. In addition, it requires a large number of data samples, a long training time, and is difficult to apply it to practical situations. Reference [7] used external feature data such as inlet and outlet negative pressure, inlet pressure difference, mill power, vibration, and signal data including ore supply, hot air flow, and circulating air flow. It divided the parameters according to their credibility, used the Dempster-Shafer evidence theory to measure the load. This method solved many difficult problems, but the Dempster-Shafer evidence theory requires a large number of experimental samples, especially the prior data, and the time variability of the ball mill load makes it difficult to implement and has a low accuracy.
To seek more effective and applicable ball mill identification method, this paper analyzes "onestage" grinding process and selects 7 related parameters, which are analyzed after KPCA dimension reduction through the improved K-means algorithm based on density and genetic algorism. Then, the paper calculated the weight of parameters by the entropy weight method and used it as the abscissa, and proposed to identify the change of the ball mill load status according to the cluster centers displacement in the data space, so as to provide foundation for online monitoring of ball mill load.

2.1.
Ball mill load classification Ball mill load is the instantaneous total load of the grinding medium, new ore supply, new water supply, and slurry, which are pre-filled inside the cylinder [8]. Due to the long-term operation of the ball mill in a closed state and the dynamic change of the internal environment, specific load parameters such as material-to-cylinder ratio and slurry concentration cannot be accurately and quantitatively described. During the production process, the amount of ore and water is usually according to the experience of professional operators to ensure the operation safety and stability. Therefore, based on the operator experience, this paper divided the ball mill load into three states: underload, ideal load and overload.
When the mill is under-loaded, the water, steel balls, and minerals inside can fill about 20% to 30% of the ball mill, in which its production capacity is not fully reached, causing a waste of resources; whereas under the status of ideal load, about 30% to 40% of the ball mill is filled, in which it has an optimal working load. The capacity of the ball mill table is increased, and energy waste is reduced. When the ball mill is over loaded, the materials inside accounts for about 40% or above, which is prone to blockage, which not only causes the low production, but also may affect the safe operation of the ball mill and lead to economic losses.

Grinding process
The conventional "one-stage" grinding process [9] is shown in Figure 1, which includes equipment such as ball mills, cyclones, sand pump ponds and slurry pumps.
As shown in Fig.1, the crushed ore is transferred from the belt to the inlet of the ball mill and a certain amount of water is added. Then, the mixture of minerals and water enters the cylinder, and is lifted and then dropped with the pre-added steel balls as driven by the high-power motor. The ore is ground by the impact and friction, and the fully ground slurry flows from the ball mill outlet into the sand pump pool, and supplemental water is also added to adjust the concentration. The slurry in the sand pump pool goes into the cyclone slurry inlet driven by the slurry pump and then classified by centrifugal force and gravity. The slurry that meets the fineness requirement enters the flotation process from the overflow outlet. Otherwise, the slurry is returned to the ball mill inlet for secondary grinding.

Ball mill load related parameters
The difference in the load status of the ball mill leads to the grinding efficiency variations, which causes the changes of the parameters measured by the instruments in the grinding ore supply  Figure 1 The technology flow diagram of grinding process. Therefore, these parameters are able to reflect the load status to a certain extent. This paper used the FCS historical data in the industry as the original dataset, from which 7 load-related parameters were selected. The specific introduction is shown in Table 1.

BALL MILL LOAD IDENTIFICATION METHOD BASED ON IMPROVED K-MEANS
Seven parameters of an operating MQYΦ5.5×8.5m ball mill on the industrial site were collected as shown in Table 1. We exhibits in Fig.2 the evoluation of these parameters in a single day (September 2018). Fig.2 presents that due to the complex time-varying and coupling between the parameters during the grinding process, it is hard to directly find clues of the underlying rules of their changes. To fully unleash the potentially useful information, this paper established a clustering method to process the parameters when the ball mill reaches dynamic equilibrium, and distinguished the load status based on the clustering results.

Kernel Principal Component Analysis (KPCA)
The 7 load-related parameters have different units and，some are non-linear, and some are overlapped or even redundant. In order to effectively characterize the load status while reducing the calculation, this paper applies KPCA to map the data to high-dimensional space, and then use PCA to reduce the dimension of the dataset [10].
The load-related parameter dataset was set to be X=[X 1 , X 2 ,…, , which is a curve matrix composed of n sampled data within 10 minutes of the i-th parameter of the ball mill under dynamic equilibrium. The mapping of the data x i in the high-dimensional feature space F was set to be φ(x i ) and its projection on the feature vector v k in F was calculated, then the principal element vector was expressed as  It shows the energy consumption of the ball mill. In a certain range, the motor power will increase with the rise of the load, and when the load increases to a critical point, the load rise will lead to the decline of motor power. Figure 2 The changes of relevant parameters in one day Where α is the feature vector; K is the kernel function; the value of k is determined by the principal component contribution ratio.
A 7×k-order matrix L=[L 1 , L 2 ,…, L 7 ] T ∈R 7×k could be obtained using KPC, which retained useful information of the original data while eliminating redundant information.

Improved K-means algorithm
The K-means algorithm seeks to split the observation data into several clusters. Data within the same cluster have higher similarity, while those in different clusters are less similar. The method first randomly selects K data objects as the initial clustering centers, calculates the distances between other objects and the clustering center and then assigns them into the nearest cluster. For a new iteration, it recalculates the average value of each cluster based on the previous assignment and uses it as the clustering center for iteration cluster until the criterion function converges or the algorithm reaches a pre-defined iteration limits.. The disadvantages of K-means algorithm include sensitivity to noise and initial clustering center selection, and that the cluster number needs to be determined in advance [11,12]. In this paper, the algorithm was improved based on density and genetic algorithms to obtain stable and reliable clustering results.
The specific steps of the improved K-means algorithm are as follows: 1) Apply the optimal division of hill-climbing algorithm [13] to divide the data space into s regions; 2) Calculate the average density D and density coefficient i  of each area separately, determine the high-density area, and use its number as the K value, namely, Where n is the total number of data objects; D i is the actual density of the i-th area, i.e, the number of data objects in the area.
3) Calculate the distance between the local density i  and high local density points i h of each object separately. If the object i  is small but i h is very large, it is judged to be a noise point, and classified into a dataset named S1, and the remaining objects into another dataset named S2, hence, 4) Set parameters such as the selection strategy of genetic algorithm, population size L, mutation probability P m , cross probability P c , genetic algebra MaxGen to solve the minimum circumscribed circle center c i of the i-th high-density region as the initial clustering center point and define the fitness Function as 8) Repeat steps 5) to 8) until the clustering center no longer changes or T J no longer converges; otherwise, use i m c  as iteration clustering center and go to step 5); 9) Calculate the Euclidean distance between each object and the clustering center point in S1, and classify them into the nearest cluster; 10) Output the final clustering center point k c ; The improved algorithm, through density judgement, overcomes the shortcomings of traditional Kmeans clustering algorithm, namely, dependent on artificial experience and difficult to select K value. At the same time, it uses genetic algorithm to calculate the minimum circumscribed circle center of high-density area, which tackles the problems of excessive dependence on selecting initial clustering center point and hence the algorithm instability. The improved method is especially suitable for planar data and is easier to find the convex characteristics of the data and improve the clustering effect.

Load identification based on improved K-means
The paper proposed to identify the mill load status based on KPCA and improved K-means clustering, integrated effectively related parameters of different load. The specific steps are as follows: 1) Obtain 7 parameters in Table 1 as the original data set when the ball mill is in dynamic equilibrium through the DCS system of the concentration plant.
2) Apply KPCA to reduce the dimension of the 7 parameters to obtain their principal component values; 3) Apply the entropy weight method to calculate the weights of the parameters as follows: Where j E is the information entropy value of parameters. 4) Establish the data space with the principal component value as the ordinate and the weight as the abscissa, and input the data to the improved K-means clustering to obtain the coordinates of the clustering center; 5) Analyze the displacement of the clustering center in the data space to identify the load status of the ball mill.

Dataset Testing
The numerical dataset Iris, most commonly used for clustering algorithm performance testing in the UCI dataset, was first adopted to compare the traditional K-means algorithm, the K-means algorithm based on the maximum density [14], and the improved K-means algorithm of this paper in order to verify the effectiveness of the proposed algorithm. The Iris dataset contained 150 pieces of numeric-type data, each of which had a dimension attribute of 4, which was divided into 3 categories. Data 1to data 50 were Iris-setosa, 51 to 100 were Irisversicolor, and 101 to 150 were Iris-virginica.
The Iris dataset was reduced to 2 dimensions using KPCA for clustering, and the dimension-reduced data were used as test samples. The clustering accuracy (CA), clustering similarity (CS), and iteration number I were used as indicators. The three algorithms were executed 10 times each. The average clustering results are shown in Table Ⅱ after measuring the algorithm performance. Among them, CA was the percentage of correct clustering for each sample; CS was the similarity after using the same clustering algorithm for the same data set. The higher the similarity, the more stable the algorithm. This paper used the Pearson correlation coefficient to obtain the coordinates of the clustering center of each clustering result as the similarity index. Number I was the number of iterations to obtain the final result after determining the initial clustering center points, which was used for algorithm efficiency measurement.  Table 2 shows that the improved K-means algorithm had higher accuracy in clustering, and could accurately determine the location of the initial clustering center point. It avoided the problems of unstable accuracy and efficiency caused by random selection of the initial clustering center of the traditional K-means algorithm and improves the similarity. Meanwhile, the improved algorithm appropriately utilized of the relationship between the clustering center point and the data sample density, which greatly reduced the iteration number and algorithm operation time, and improved the algorithm efficiency. In addition, the improved algorithm eliminated the problem of determining K value based on artificial experience. It utilized the density and optimal partition to determine the K value completely based on the spatial distribution characteristics of the data samples and the K value was the same as the real results. This algorithm was proved to be valid, since it solved the shortcomings of lacking scientific basis and inaccurate results when the K value was determined by human experience if the prior conditions were unknown.

Measured data analysis
Seven parameters of historical data, including ore supply, water supply, water supplement volume, liquid level of pump pool, overflow volume, -200 mesh percentage and motor power from September 22 to October 19, 2018 for a total of 28 days were acquired. 60 sets of data with a duration of 10 mins under different load status were selected, and the proposed algorithm was used for analysis.
Since the DCS system of the concentration plant updated the data every 10s, the original dimension of each set of data was 60×7. The KPCA principal component contribution rate was set to be 0.99, and RBF kernel function was adopted, whose parameter satisfied 2 2 =10000. KPCA was used to reduce the dimension of the original data. Part of the dimension reduction results after KPCA are shown in Table 3. The parameter weights obtained by the entropy weight method are shown in Table 4.
With the weight as the abscissa and the principal component value as the ordinate, the distribution of 60 sets of data in this two-dimensional space is shown in Fig.3. The improved K-means algorithm was used to obtain the clustering center of each data set. The results are shown in Fig.4.   8 data, which was in line with the reality that the ball mill was under ideal load when controlled by the operators. The second set of data with a clustering center of (0.1525, -0.0095) corresponded to 15: 00-15: 10 on September 22 when the ball mill had a normal load; (0.1525, -0.0088) was the highest clustering center, which corresponded to the 55th data and to 22: 00-22: 10 on October 10 when the ball mill was over-loaded.
Many complex random factors exist in load data, making it difficult to interpret and accurately classify the load status. This paper proposed to use an improved K-means clustering algorithm to cluster 7 load-related parameters. As for each clustering center, the displacement variation of the cluster centers can be obtained by comparing the load status of each set of data for effective identification of the load status changes.  Figure 4 The clustering center result of industrial data

LOAD STATUS IDENTIFICATION SYSTEM BASED ON LABVIEW
According to the algorithm proposed in this paper, a load status identification system for ball mills based on LabVIEW was established. The overall block diagram is shown in Fig.5. Figure 5 The block diagram of ball mill load identification Inlet load parameters, such as ore supple and water supply, and outlet load parameters including overflow volume and particle size were collected by sensors such as electronic belt scales, electromagnetic flow meters, and particle sizers in the DCS system. The field analog-to-digital conversion used 4 Siemens IO modules, namely, DI, DO, AI and AO. Among them, the AI and AO modules were connected to the field through an isolation barrier, and the DI and DO modules were connected to the field through an intermediate relay, which greatly enhanced the stability of the transmission signal and the anti-interference ability. The parameters converted from analog to digital were collected and transmitted to the host computer system by myDAQ of American NI company. The dimensions of each load parameter were reduced through KPCA in the host computer system, and the entropy weight method is used to obtain their weights, and finally the corresponding load status.
The system can identify the ball mill load status of industrial DCS data and display the status change. In addition, it has the functions of data storage, query and alarm and a good human-machine interface. The status identification interface is shown in Fig.6.  Figure 6 The identification interface of ball mill load system 6. CONLUSION This paper proposed to identify the load status of the ball mill based on the displacement of the clustering centers of the load-related parameters. It improved the K-means algorithm to make the clustering results stable and reliable. The case analysis results showed that the improved K-means algorithm greatly reduced the number of iterations and improved the stability. The proposed method selected the K value based on the density and the genetic algorithm to determine the initial clustering center. It was adopted in the clustering of 7 load-related parameters such as ore supply, water supply, water supplement, liquid level of the pump pool, overflow volume, -200 mesh percentage, and motor power. The results effectively characterized the changes of load status. The results could be further analyzed and applied to industry for accurate monitoring and identification of the ball mill load status.