Data analysis of a paste thickener

Abstract The solids content of slurry is typically increased in thickeners. A clean overflow and maximum solids concentration in the underflow are the general targets. The flocculant rate and underflow rate are the two independent variables that are typically used for control. The dependent variables include rake torque, underflow density, overflow turbidity, solids interface level (bed depth), solids inventory (bed pressure), solids settling rate and underflow viscosity. The research problem in question is that the outgoing paste is sometimes difficult to pump. The phenomena leading to this situation are not well known. In the worst-case scenario these phenomena cause clogging in the piping. A data analysis has been done to find the variables that affect and correlate with the pumping problem. The scope of this study covers the measurements from the feed line, thickener and underflow. The goal is to gain better understanding of the phenomena after this phase. The data analysis was done using the paste line pressure difference as a response variable and by dividing the data collected from Yara’s Siilinjärvi mill into two parts: operation areas with high and low pressure difference. The analysis is focused on Thickener 1 due to better availability of measurements. The knowledge of the variables found to influence the pressure difference can be utilized in further development.


Introduction
The modern world is highly dependent on materials. The saying: "What you can't grow, you need to dig" is true and *Corresponding Author: Jari Ruuska: University of Oulu, Control Engineering; Email: Jari.Ruuska@oulu.fi Riku-Pekka Nikula: University of Oulu, Control Engineering Eemeli Ruhanen: Yara Suomi Oy, Siilinjärvi Mill Janne Kauppi, Sakari Kauvosaari, Mika Kosonen: Outotec (Finland) Oy it means that people need different kinds of materials to produce tools and necessities for their day-to-day use. As a result, mining is becoming a more and more important branch of industry. A lot of attention is being paid to make it as efficient as possible. As the amount of rock crushed increases, the amount of waste rock also increases. For example, the tailings from flotation circuits contain a lot of water and it is beneficial to reduce the water content before storing in tailings ponds. A paste thickener can be used to lower the water content. The aim is to investigate the behaviour and variables that affect the paste-thickening vessel in order to make the operation cost-efficient. Thickening is the most economical of several dewatering techniques. The thickening process normally occurs in largediameter tanks where solid particles settle under the influence of gravity, i.e. sedimentation. Jewell et al. [1] raise a valid challenge in the debate: the operation of paste thickeners has various practical problems. One of them is a narrow operating window as a small change in the underflow solids concentration may have a major impact on the properties (for example pumpability and flowability) of the underflow. Consequently, the control of the paste thickener can be challenging.
Different methods are used to control the paste thickener. Tan, Bao and Bickert [2] propose a model predictive control with rake torque constraint. Chai, Li and Wang [3] and Chai et al. [4] propose an intelligent switching control which includes an underflow slurry flow-rate (USF) pre-setting unit and a fuzzy reasoning-based USF set-point compensator. The switching mechanism uses rule-based reasoning. Langlois and Cipriano [5] have introduced a dynamic simulator for the thickener which can be utilized to develop different kinds of control strategies. Remes, Aaltonen and Koivo [6] have developed a Kalman filter based soft sensor to monitor the thickener operation to reduce process disturbances. Bergh, Ojeda and Torres [7] have introduced the expert control tuning of an industrial thickener. Segovia, Concha and Sbarbaro [8] tested different thickener control strategies using a calibrated simulator, pointing out the weaknesses and giving hints for improving their performance.
The research problem in question is that the paste exiting the thickener is sometimes difficult to pump. The phe-nomena leading to this situation are not well known. In the worst-case scenario these phenomena cause clogging in the piping. This leads to a need for extra maintenance and even stoppages, therefore it would be beneficial to understand these phenomena better. In this paper, we concentrate on the data analysis of a paste thickener to find the affecting factors, utilizing a data set from an industrial thickener.

Process description and data set
The only phosphate mine in the EU is the Yara mine located in Siilinjärvi, Finland. The product of the ore extracted from the open pits is apatite concentrate, from which fertilizers and phosphoric acid are the final products. The total ore mined annually is 11 Mt and the production of apatite concentrate is 1 Mt.
The solids content of the slurry is typically raised in thickeners. A clean overflow and maximum solid concentration in the underflow are the general targets. Agglomerates of solids are formed using flocculants to increase the settling rate and improve the overflow clarity. Very high on-line operation availability is required from a thickener. Many industrial sectors use these vessels.
A typical thickener used in mineral processing is illustrated in Figure 1. The flocculant rate and underflow rate are the two independent variables which are typically used for control. The feed rate is only used in an emergency to avoid disturbances to plant production. The dependent variables include rake torque, underflow density, overflow turbidity, solids interface level (bed depth), solids inventory (bed pressure), solids settling rate and underflow viscosity [7].
The data set used for data analysis contains data from both of Yara's paste thickeners, but after discussions with plant experts the data analysis was limited to Thickener 1. The whole data set contained altogether 73 measurements Figure 1: A typical thickener used in mineral processing [7] from February 2017 to February 2018. The frequency of measurements was one per minute. In the actual data analysis, twenty-six variables from the thickener feed, thickener and paste line were used. Information about the performance optimization of paste thickening can be found for example in [9].

Data pre-processing
The rows containing NaN (not a number) were removed. The periods when the values of the pressure difference of the paste line (i.e. response variable) did not change were removed. Four feed and three Thickener 1 measurements were delayed for eight hours because the thickener delay was estimated to be eight hours. The delay value was defined by the mill experts based on their knowledge of the process. Six data sets were formed using the booster pumps and the paste line pressure difference as dividers.
Only periods of over one day were accepted as a usage period. This was defined by the value of the current in amperes measurement. If the value was over 50, the pump was deemed to be in use.
Division into operation areas was made based on the pressure difference of the paste line. The histograms in Figure 2. illustrate the numbers of data points with different pressure differences. The bin size of the histogram is 0.5 bar. As the pressure difference fifteen-minute average is over 13 bars, it belongs in the high-pressure difference operation area. When the pressure difference is lower, it belongs in the low-pressure difference operation area. Two different booster pumps, KA7402 and KA7403, were used.

Correlation
The linear Pearson correlation coefficients between input and response variables were calculated and are presented in Figure 3.
The correlations in the whole operation area and the low pressure difference are mostly close to each other. In the high pressure difference operation area the correlations are weak (close to zero). Next, the variables with a correlation higher than 0.5 were taken for closer analysis. A couple of variables are presented in Figure 4, which shows that there are two different operating conditions. This is clearest between variable 18 and the paste line pressure difference. For example, between ten and thirty bars, two different positive correlating tubes can be observed. It can be concluded that dividing the data between high pressure and low pressure difference operation areas is not enough to explain all the interactions within the process. Based on the correlations, no single variable can be found that could explain the changes within the pressure difference. From the data, two different operating conditions can be determined. Let us try to determine these operating conditions using clustering.

Clustering
The data from variable Paste line 1 and the pressure difference during the usage of booster pump KA7403 was clustered using k-means. The cosine parameter was used as a distance measure. Two and three clusters were tested. The result is presented in Figure 5. The data tubes can be reasonably identified in the three cluster cases. The correlations are presented in section 3.2.3. Clustering was also done for KA7402, but the result was not as good as for KA7403.

Correlation coeflcients in clusters
Correlation coefficients for three clusters of booster pump KA7403 are presented in Figure 6. When comparing the results to those presented in Figure 3 In addition, in Figure 6 the variables of Outotec Analyzer 1 and Outotec Analyzer 2 have stronger negative correlation in cluster 1 than in cluster 3. In other words, as the values of these variables decrease, the paste line pressure difference increases more clearly in the operation area of cluster 1. Although the variables of Thickener 1 -5 had stronger correlation in cluster 3 than in cluster1, the same observation can be made for the variables of Paste line 1 and Thickener 12 too.

Multivariable linear regression
A multivariable linear regression (MLR) model is given by where y is the output variable (N×1), X is the input variable matrix (N×M) and b is the vector of regression coefficients (M×1). N is the number of data points and M is the number of variables. It should be noted that the equation above is written for a single output variable. The regression coefficients are obtained as the least squares' solution given bŷ The MLR model structure can capture the major interactions even though it is limited to linear relationships.

Data sets for booster pumps KA7402 and KA7403
Linear regression models for both booster pumps were created separately. Modelling was done using repeated 10fold cross-validation and a testing data set. The data set was divided so that 20% of the data was chosen randomly for the testing data set. The rest of the data set was used in cross-validation, thus forming the training data set and internal validation data set. The model was optimized on the grounds of the internal validation data set. R 2 and RMSE were used as the criteria. The response was scaled between 0 and 1. Variables for the models were selected using the forward selection method. The modelling results are presented in Tables 1 and 2.   Figure 7 shows that the best model works well overall, but in some situations, it estimates the pressure difference to be negative. This indicates that the variables of the model do not explain the pressure difference correctly in all situations.

Summary
The strongest interactions between the variables and pressure difference for booster pump KA7402 were: The process was analysed by dividing the data first into high and low pressure difference operation areas and then clustering the data from the variable of Paste line 1 and the response variable. As a summary, it can be stated that clustering has a major effect on correlation. However, there is a great risk of drawing misleading conclusions unless sufficient process knowledge is available.
When drawing scatter diagrams, it was noted that there are at least two operation areas in the process. The affecting variables were not investigated at this stage, but via modelling it can be assumed that there may be more operation areas, especially in the high pressure difference area.
The paste line pressure difference can be estimated promisingly using linear regression models, especially during usage of booster pump KA7402.
At this stage of data analysis, the analysis has been restricted to assuming that the pumping problem occurs when the paste line pressure difference is over 13 bar. This is by no means the whole truth, as often the quality is good with high pressure differences. However, when the pressure difference grows extremely high it will cause problems. Moreover, some sort of indicator would be needed to tell when there really are pumping problems. This is not known precisely, even by the process operators themselves. This limitation was used to be able to observe which variables have an effect on pressure difference and how they differ in high and low pressure difference operation areas. Through data analysis, the behaviour of the process was understood better. It was noted that the low pressure difference area behaves more smoothly and is easier to control. The high pressure difference area shows high variance, meaning that operating conditions change more, making process control more challenging. There are many operation areas which need to be identified in order to de-fine the variables leading to a problem situation, to investigate the dynamics and to be able to predict the problem.