Semi-Supervised Hybrid Local Kernel Regression for Soft Sensor Modelling of Rubber-Mixing Process

,


Introduction
Fed-batch processes play an important role in chemical and biochemical industry. ey are widely adopted in the production of a vast range of fermentation-derived products such as fine-chemical industry, pharmaceuticals and food products. Rubber internal mixing [1] is a classical fed-batch process performed in an internal mixer to achieve an optimal Mooney viscosity for further processing. Since Mooney viscosity cannot be online measured while its laboratory assay is labourintensive and time-consuming, so -sensing approaches are investigated to establish a real-time evaluation of it. Furthermore, data-driven but not mechanism-modelling methods are commonly used for its so sensor modelling because it is a complex nonlinear process without well-developed mechanism. Additionally, its instinctive time-variation, varying properties of natural rubber and additives accompanied with process dri ing caused by field conditions. e.g., equipment aging, introduce a great amount of complexity to the process. Moreover, in order to avoid affecting the regular productions, small sample condition always occurred, which further reinforces the difficulty of rubber internal mixing modelling.
In the past decades, many data-driven techniques have been proposed. Extensive reviews can be found in work of Kadlec [2]. Among these methods, multivariate static techniques [3][4][5][6] have been widely used. However, these algorithms are relatively sensitive to measurement noise and commonly require a large number of samples to build the promising so sensor as well. Meanwhile, various artificial neural network (ANN) algorithms [7] have been proposed and successfully applied to polymerization processes, but how to effectively construct the network topology is still an open question. To overcome these shortcomings, kernel-based methods, such as support vector regression [8], least squares support vector regression [9] are presented. ese kernel techniques can attain a better performance under small-sample condition owing to the structural risk minimization criterion. Note that all the aforementioned algorithms are offline approaches, which can achieve a universal generalization performance but lack the mechanisms to leverage the time-variation characteristics such as dri ing of the processes. So, kernel based online modelling algorithms [10][11][12][13] were presented. However, too many labelled samples of current batch are required to online build the model, while in most cases in industry field, those samples also have to be predicted instead of lab assay. erefore, both online and offline algorithms cannot effectively achieve the promising model [14][15][16][17]. On the other hand, taking advantage of the development of both information technology and industrial automation, there are lots of historical productive process data saved in the database of manufacturing execution system [18]. To leverage those data, local learning modelling algorithms [19,20] were proposed. Nevertheless, those models are not stable owing to the outdated data, which would be used for training. Meanwhile, the unlabelled data are abundant, which contain the production data without indices to be predicted. According to the semi-supervised learning theory, those unlabelled data can be potentially used to improve the predictive model. erefore, how to effectively leverage both existing historical and online productive process data to create the robust so sensing model still need to be solved.
In our work, we explore the potential of the hybrid local semi-supervised mechanism to leverage both unlabelled and labelled data via the proposed time window mixed with both historical and online samples. To enhance its feasibility, corresponding recursive calculation formulas are deducted. Furthermore, the so sensors using proposed and comparative algorithm are implemented to evaluate its performance. To the best of our knowledge, there is no such hybrid local semi-supervised algorithm presented in any article so far. e remainder of this paper is organized as follows. In Section 2, the detail of proposed SHLKR method, including its recursive calculation derivation is presented. In Section 3, so sensor modelling experiments of rubber internal mixing process using SHLKR method and comparative algorithms with real industrial field data are presented. Finally, in Section 4, the main contribution of this paper is summarized.

Materials and Methods
e thinking of local learning is to create the predictive model dedicated to the prediction of targeted unlabelled sample instead of building the global model using all samples. Since the model will only be created when the prediction is needed, it is also called "Just-in-time learning" or lazy learning [21].
eoretically it can get more precise model under the condition that similar inputs lead to similar outputs.
Basically, there are three steps of the local learning modelling: (1) Similar sample set selection: select similar samples from historical data based on one or some similarity calculation algorithms according to the features of the samples to be predicted. (2) Local modelling: build the local learning model using selected samples with corresponding algorithm. (3) Prediction: make the prediction and desert the predictive model.
Obviously, the key points of local learning are the algorithms to evaluate the similarity of samples and build the local model. Currently there are two categories that correlation based [19] and distance/angle based [10] similarity calculation algorithms. In this work, distance-based kernel is used because simply algorithm prone to be adopted under industrial application circumstances.
ere are two major disadvantages of aforementioned local learning algorithm: (1) In many cases the online time variation and dri ing characteristics cannot be tracked since only similar historical data will be used for the modelling. (2) Many unlabelled historical and online samples are orderly existed between labelled samples. ose time-series sequence data theoretically can be used to improve the model based on the manifold hypothesis [22] but currently leave unused.
In order to leverage those unused widely existed unlabelled data, we proposed recursive weighted kernel regression (RWKR) [23] before, which has already been validated in penicillin production process so sensor modelling. But it behaves not promising for some other fed-batch processes, such as rubber internal mixing, since it behaves much more dri ing and the time-based weighting mechanism does not work since the Mooney viscosity of rubber is not monotonic increased as the penicillin concentration in penicillin fermentation process. erefore, in this paper, semi-supervised hybrid local kernel regression (SHLKR) is proposed to fully leverage both labelled and unlabelled data selected from historical and online data.
Different from traditional local kernel learning algorithms: (1) Besides of labelled samples, combined with labelled samples, unlabelled samples are also used as time window during the training of SHLKR. (2) Both historical data and online manufactural data are used during training. According to the current run's index of batch, hybrid training data set is formed by selecting corresponding historical samples joined with online manufactural samples, which can potentially improve the practicability and precision of the so sensor.

SHLKR Flow.
As is shown in Figure 1, the time window is defined as run's labelled sample 푥 , 푦 with 푡 = 푥 푡−1 , 푦 푡−푢 푡 −1 which is the unlabeled sequence samples between and 푡 − 1 of current batch. In this way, each labeled sample associated with its unlabeled samples is formed as an ordered sequence, which will be entirely used to semi-supervised model the so sensor. According to the manifold hypothesis of semisupervised learning theory [24][25][26][27], samples are trend to be similar within a small local space, unlabelled samples make the data space denser to more precisely describe the characteristic of data samples. So theoretically proposed semi-supervised data combination mechanism can more effectively model the so sensor than only using labelled samples. From the first run of first batch, the number of current labelled sample is 0. If productive process data of current run will only be collected for modelling in future, it will be added into the unlabelled sample set of current batch, otherwise, since at this time only historical data can be used for modeling, evaluated by the similarity with , most similar historical labelled samples 푥 ὔ , 푦 ὔ associated with the unlabeled samples ὔ within corresponding time windows are selected to semi-supervised train the model. On the other hand, if there are labelled samples , 푦 existing, they and associated unlabeled samples will be both leveraged for training, in this case, if 푁 ≥ 푘, only online productive process data will be used, otherwise, 푘 − 푁 most similar historical labeled samples and corresponding unlabeled samples will also be used to train the model.

SHLKR Recursive Calculation Derivation.
Harmonic function is adapted to semi-supervised train the model. Its effectiveness and recursion have been validated before [23]. Although the historical data of training set cannot be recursively adopted since they depend on the remaining online productive process data can be recursively added because all of them will be used for training. e larger becomes, the more reduction it will have from following recursive calculation derivation.
Here we referred to the approach presented by Zhu et al. [28], in which the regularization framework is defined as follows: where is the real label of sample i, and ω ij can be treated as the similarity between sample i and j, since Gaussian kernel is usually used to calculate the similarity, ω ij is typically defined as Gram matrix can be partitioned into 4 blocks for labelled samples L and unlabelled samples U: (3) = .
Incoming online productive process data data x t To be predicted Combined with unlabeled samples of current batch X u t

No
All labeled samples of current batch (X l t , Y l t ) with corresponding time window's unlabeled samples X u t Parameter k Only use (X l t , Y l t ) and X u t to semi-supervised model the process according to ∆ t-1 -1 (Formula (4)- (11)) Prediction Last run of current batch Save current ∆ t -1 as ∆ t-1 -1 to help model the process during next modeling End Use (X l t , Y l t ), X U t , (X l t , Y l t ) and X u t to recursively semi-supervised model the process according to ∆ t-1 -1 (Formula (4)- (11)) According to x t , select k-N l t most similar labeled samples (X l t , Y l t ) from historical data with corresponding time window's unlabeled samples X u t (Formula (2)

Experimental Data.
Authorized by one rubber manufactory, 222 batches containing 19,148 runs historical samples were retrieved from the system. 2,140 of them were labelled and 17,008 runs are unlabelled which only contain manufactural information without Mooney viscosity value. All samples are from one rubber internal mixing formula to get rid of the formula variation impact. In the industrial application environment, to get the better performance, it also works to model the so sensor respectively according to different rubber internal mixing formulas. Each sample includes: (1) Index of current run.
(2) Density.  Figure 3, the Mooney viscosity value of unlabelled samples are 0, the dash lines are used to separate different batches. Obviously, the run number of each batch changes a lot owing to its industrial manufactural requirement and the lab assay is performed generally every 8 runs. Besides of that, although the Mooney viscosity is required to be consisted, but the truth is it varies a lot within and between different batches under no obvious rules. It verified our hypothesis that data driven algorithms work in this situation to train the so sensor.

Result and Discussion
To validate the performance of SHLKR, support vector machine (SVM) and Harmonic Functions based so sensors are also implemented respectively to make the comparison, in which only labelled samples are used. To be faired, all these three algorithms are using the same labelled samples and only the unlabelled samples respective to those labelled samples are additionally used in SHLKR.
As is shown in Figure 4, the predictive results of all three different algorithms are plotted. e result is for last 27 of 222 batches as well as 1,777 of 19,148 runs including 1,577 unlabelled runs and 200 runs to be predicted. In order to predict those 200 samples, both 1,940 labelled and 15,431 unlabelled samples are used to train the so sensor.
At the first step of training is to choose the parameter . A er the kernel width 1.1 is determined by leave-one-out cross validation [29], from 2 to 20, the results of using different are shown in Figures 5(a)-5(c).
Because SVM cannot be resolved when 푘 < 8, only SHLKR and Harmonic Functions have results shown in those figures. Obviously when 푘 = 5, both of them have the best performance, when 푘 < 5 they both behave unstably and when 푘 > 5 they all trend to worse but stably. It means that: since en the solution of Equation (1) is formulated as: here Δ −1 푡 can also be divided into four parts: where Δ is the kernel matrix between onlinemanufactural data and historical data of time . Δ is its transpose. Δ and Δ are the kernel matrixes of online manufactural data and historical data respectively. First the Δ is considered as follows:

Application
System. Smart Internal Mixing system is a product of MESNAC Co., Ltd., which is widely used in many rubber factories in China. It is mainly formed by four parts: internal mixing modelling, Mooney viscosity prediction, internal mixing process optimization and internal mixing expert system. As is shown in following Figure 2, Smart Internal Mixing system is embedded in the manufacturing execution system, which can monitor the online manufactural data and retrieve the historical manufactural data.
Advances in Polymer Technology theoretically can be automatically selected by traversing from smaller to larger ones. Besides of algorithms, also depends on the scale of the historical data and the varieties of noise and onlycontrols the number of historical samples but not the online sample number, besides of too small sample size condition, the model suffers from too many historical samples, as well as that there will be an optimized existing to trade-off between underfitting and overfitting. Because of that,

Conflicts of Interest
e authors declare that they have no conflicts of interest.

Authors' Contributions
Haiqing Yu and Jun Ji are contributed equally to this work.

Conclusion
In this paper, we propose a new semi-supervised hybrid local kernel regression model for so sensor modelling of internal rubber mixing processing. Distinguished from traditionally supervised models, it leverages unlabelled samples associated with labelled ones to benefit from widely existed supervised data. And the hybrid mechanism is proposed to effectively use both historical and online manufactural data to improve its practicability. Moreover the recursive formula is deduced to enhance its feasibility. With on-site data, so sensors using proposed and comparative algorithms are implemented to make the evaluation. Experimental results demonstrate that it has a better performance than classical ones. In our future work, SHLKR will be applied to various rubber manufactories and more features will be added into your model, such as raw rubber information, energy cost of each rubber internal mixing phase etc., which will further increase the precision of proposed model.
Data Availability e rubber mixing processing data used to support the findings of this study were supplied by Haiqing Yu under license