Data anomaly detection for structural health monitoring by multi-view representation based on local binary patterns

.


Introduction
To have an insightful understanding of the conditions of civil structures, structural health monitoring (SHM) systems are developed and deployed on the structures like bridges, buildings, tunnels, turbines, etc.They are designed to measure the structural behaviors and evaluate the structural performance in real-time.SHM allows a structure's operational and damaged condition to be monitored, which are then employed for diagnostic and prognostic purposes [1].Normally, multiple types of sensors such as accelerometer, strain gauge, thermo sensor and GPS sensors, are incorporated in SHM systems to acquire data from different perspectives [2].However, sensors are prone to flawed data because of imperfect manufacture of sensor systems, environmental effects, operational accidents, etc.In such cases, manual data quality check is always needed to avoid incorrect estimations and false alarms.Nevertheless, manual check is time-consuming and demands a huge workforce.Furthermore, large-scale SHM systems can consist of hundreds of or over one thousand sensors [3,4].Such systems are often used for long-term monitoring, accumulating enormous continuous measurements.This exacerbates the side-effects caused by the flawed data on the final predictions of the SHM systems, though the data scale provides opportunities for mining deeper information of structural behaviors.Therefore, automatic and effective methods for data anomaly detection are indispensable.
By now, many researches focus on detecting and overcoming a certain type of data anomaly.For instance, Yuen and Mu [5] proposed a probabilistic method for outlier detection.Yang and Nagarajaiah [6] developed a data denoising method to remove outliers using principal component pursuit.Tang et al. [7] proposed a group sparsity-aware Convolutional Neural Network (CNN) for continuous missing data recovery.Yang and Nagarajaiah [8] designed a sparse representation and low-rank structure to recover randomly missing structural vibration data.Chen et al. [9] developed a data imputation method by analyzing and modeling inter-sensor relationships for strain monitoring data.Zhang et al. [10] applied Bayesian dynamic regression for missing data reconstruction.Goebel and Yan [11] presented a fusion scheme to correct the drift sensor faults.However, in actual engineering scenarios, data anomalies can appear in different manners, considering the factors like imperfect manufacture of sensor systems (including sensors, transmission systems, etc.), complex environmental effects like low temperature, and operational accidents.Detecting one or two certain types of abnormal data is insufficient for SHM systems.
To address the above problem, data anomaly detection methods that can identify multiple fault types were proposed.Kullaa [12,13] modeled a sensor network as a Gaussian process, and each sensor in the network was estimated in turn using the minimum mean square error estimation.Seven sensor fault types were investigated and modeled: bias, gain, drifting, precision degradation, complete failure, noise, and constant with noise.Hernandez-Garcia and Masri [14] implemented statistical methods using latent-variable techniques to detect bias, drift, scaling, and minor faults of data.Huang et al. [15,16] proposed a canonical correlation analysis based method and a Bayesian combination of weighted principal-component method to detect bias fault and gain fault of data.The traditional model-based methods for data anomaly detection have been reviewed by Yi et al. [17].In general, data anomalies can be accurately detected based on the test results.However, sophisticated mathematical models for each type of data anomaly may limit its widespread use in engineering cases.
In recent years, with the rapid development of artificial intelligence (AI), as the core methodology of AI, deep learning [18,19] has been widely applied and investigated in the SHM field.Since the behaviors of civil structures are easily influenced by tons of uncertainties, it is practical to use deep learning to model correlations between the structural responses and evaluation indexes in a data-driven manner, as demonstrated in articles for vibration-based damage detection [20][21][22], image-based damage detection [23][24][25], and binary data anomaly detection using 1D-CNN [26], etc.As the expression of visual data is more intuitive than time-series data, using computer vision (CV) techniques [27,28] is becoming a prevailing trend, providing the potential to solve complex problems.CV is also beneficial for data anomaly detection because multiple types of fault data can be easily recognized by visual inspection of the acceleration data.Therefore, data anomaly detection methods that combine CV and deep learning have been proposed.Bao et al. [29] designed a deep neural network (DNN) that processes grayscale images of acceleration data and predicts the data class.The method can detect six types of data anomalies, including missing, minor, outlier, square, trend, and drift.An acceleration dataset acquired from a long-span cable-stayed bridge was used to test the method, and an overall 85.6% test accuracy was obtained.Based on this excellent achievement, Tang et al. [30] proposed a CNN-based data anomaly detection method, which uses both the gray-scale image of acceleration data and the corresponding FFT spectrum as a dual-channel input for the CNN, predicting the same six types of data anomalies.Liu et al. [31] and Chou et al. [32] used Generative Adversarial Network (GAN) + CNN and GoogLeNet on the image data of acceleration data, respectively, for data anomaly detection.Those two methods are also validated on the onemonth sub-dataset of the data used in [29,30].
These deep learning and CV-based methods can successfully detect multiple types of data anomalies with high accuracies and greatly alleviate the labor cost for manual data quality checks.However, at the same time, the typical drawbacks of deep learning are also inevitable, e. g., enormous hyperparameters to be fine-tuned, poor interpretation, high demand for labeled data, etc.Therefore, the research on data anomaly detection still has a large space to improve and explore, making it an open challenge.
In this article, a simple yet efficient approach is proposed to overcome the above problems.Our contribution can be summarized as follows: (1) we develop a novel multi-view approach based on local binary patterns (LBP) and random forests (RF) for data anomaly detection; (2) we design a fusion strategy that combines the complementary information from multiple LBP features under different parameter settings (multi-view), to further enhance the detection performance; (3) extensive experimental results on one long-span cable-stayed bridge dataset with distinct case settings and training conditions demonstrate the high performance of the proposed approach; (4) small-scale (few-shot) learning schemes are also investigated to exemplify the proposed method, where the advantages of the proposed approach are further validated.The article is managed into six sections.After the introduction, the proposed method is explained in Section 2. Section 3 introduces the dataset for training and test.Section 4 describes the layout and settings of the experiment.Section 5 shows the experimental results and discussions.Lastly, the article is finalized with some conclusions and future works.

Overview of the proposed method
The proposed method for data anomaly detection is illustrated in Fig. 1.It consists of multi-view LBP descriptors and a RF classifier (MLRF).The method includes three parts: pre-processing, feature extraction, and classification.Firstly, time-series vibration data are converted to gray-scale waveform images.Secondly, the multi-view texture features of the image data are extracted by applying LBP with three different parameter settings.For each LBP descriptor, a binary code is generated and converted to decimal for each pixel.The cooccurrence statistics of the encoded LBP in the waveform image are represented as a histogram.The values of the bins in each histogram directly form a feature vector as the representation of the vibration data.As the proposed method consists of three variants of LBP descriptors, Y. Zhang et al. three different views of the representations of each vibration data can be obtained.After that, the final representation of the vibration data is constructed by concatenating the three feature vectors into a compact one.Such a multi-view representation consists of explicit features with high interpretability since the features have clear physical meaning.Thus, the learning process can be viewed as a white box.By contrast, deep-learning-based data anomaly detection methods [29][30][31][32] learn implicit features that are derived from the hidden layers of the networks.Such implicit features have poor interpretability, making the learning process a black box.Finally, a RF classifier is trained for data anomaly detection.

Local binary pattern
LBP is a texture descriptor [33], which computes the binary patterns for each pixel in image data.It has shown its power in many CV tasks [34,35].In LBP, the binary pattern of each pixel is extracted by comparing the center pixel intensity to a circular symmetrical neighborhood.Let p represent the number of neighboring pixels around the center pixel, and r represents the radius of the circular symmetrical region that has been equally spaced by p neighbors, as shown in Fig. 2. The pixel value of the center pixel is set as g c , then the corresponding surrounding pixels can be expressed as g n (n = 0, …, p − 1).The center pixel is compared with its neighboring pixels one by one, and the bit-wise binary pattern B of the center pixel is generated according to Eqs. ( 1) and ( 2), as illustrated in Fig. 2(c).Then, the decimal value of the local binary pattern is computed according to Eq. ( 3), with the corresponding weights shown in Fig. 2(d).Finally, the occurrence histogram of the different binary patterns is computed to form the feature representation of the image.

Random forests
RF [36] is a statistical machine learning method for solving classification or regression problems.It fits a number of decision trees [37] on various sub-samples of the dataset.Then a voting or averaging strategy is applied on the predictions of all the decision trees to determine the final prediction.Such an approach can minimize the bias of every single prediction of the decision trees and approximate the ground truth.
The RF algorithm introduces two types of randomness during training.To begin, all the trees in the forests are constructed from random data sampled from the training set with replacement.Then, the optimal split is determined from a random subset of the features when dividing each node while creating a tree.The purpose of introducing the randomness is to lower the variance between each individual decision tree, since normally individual decision trees tend to overfit.Introducing the above randomness to the decision trees and using a voting or averaging strategy can offset the errors caused by the high variance of individual decision trees, leading to a good balance between over-fitting and reliable performance.

Decision tree
Decision tree is a hierarchical classifier.In this study, we adopt Classification and Regression Trees (CART) [37] as the decision trees.CART builds binary trees based on the feature and threshold that gives each node the highest information gain.The details of CART are explained as follows: Given a training set of data x i and corresponding labels y i , i = 1, 2, …, n, a decision tree recursively partitions the feature space in a binary manner to group the samples with the same labels together.Let the N m samples of data at node m be Q m , each candidate split θ = (j, t m ), regarding a feature j in the training data x i and a threshold t m , can partition the data at node m into two subsets, Q left m (θ) and Q right m (θ), as described in Eqs. ( 4) and (5).
The quality of a candidate split of a node is then evaluated using an impurity function H and a loss function L. The parameters θ which minimize the loss L will be determined as the thresholds θ * in the decision tree, as explained in Eqs. ( 6) and (7).Recurse for subsets Q left m (θ) and Q right m (θ) until the maximum allowable depth is reached or N m = 1.The Gini function is used to calculate the impurity H, as shown in Eq. ( 8), in which p mk is the probability of prediction of the class k in node m, see Eq. ( 9).

Predictions of random forests
All of the decision trees in the forests vote on the predicted class of an input sample, which is weighted by their probability estimations.The projected class, in other words, is the one with the greatest mean probability estimate across all decision trees.

Bridge and monitoring dataset
A long-span cable-stayed bridge in China is selected as the structure studied in this article.The side-view drawing of the bridge is shown in Fig. 3.The main span is 1088 m long, and the two side spans are 300 m long.An SHM system has been installed on the bridge to monitor its service state and behavior.Overall, 16 accelerometers are integrated into the SHM system that continuously measures the vibration of the bridge.
Regarding the distribution of the accelerometers, 14 dual-axis accelerometers were deployed on the deck, two dual-axis accelerometers were on the top of the two towers, and two tri-axis accelerometers were at the bottom of the towers, as shown in Fig. 4. In total, 38 channels of the vibration data are collected.The sampling frequency for each channel is 20 Hz.The whole dataset includes the continuous monitoring data for one year, in which a one-month part of the dataset has been used for the 1st International Project Competition for Structural Health Monitoring (IPC-SHM, 2020), as reviewed in the article [38].This study uses the one-month sub-dataset for training and validating the proposed method.The acceleration dataset was annotated by each hour, as a result, the dataset includes 28,272 labeled data (31 days × 24 h × 38 channels).Each datum is a continuous 1-hour measurement, which consists of 72,000 sample points (3600 s × 20 Hz).The data anomaly detection task is performed on these 1-hour single-channel acceleration data.
The dataset includes seven different classes, e.g., normal, missing, minor, outlier, square, trend, and drift.The labels are given from 0 to 6 correspondingly.To use computer vision techniques on the dataset, all the acceleration data were plotted in hourly gray-scale images with the shape of 100 × 100 × 1. Examples of each category of the dataset are shown in Fig. 5.In the dataset, the amounts of data in all categories are summarized in Table 1.The normal class is dominant, and the numbers of fault categories are limited, especially the amounts of the outlier and drift classes.The data in each category show an unbalanced distribution, which may increase the difficulty of training a high-performance classifier for data anomaly detection.Therefore, data augmentation was applied to the data in the fault categories to increase their numbers and alleviate the imbalance problem.In such a way, by keeping the same amount of training data as used in the literature, we automatically increase the testing data scale, which can further validate the effectiveness of the proposed approach.The operations for data augmentation include horizontal flipping, vertical flipping, and diagonal flipping.One example of the above augmentation operations is shown in Fig. 6.Consequently, an augmented dataset, consisting of 72,363 data, was obtained.The augmentation operations effectively increase the numbers of fault data, alleviating the unbalance of the dataset.The ratios of each  class of data before and after the augmentation are compared in Fig. 7.

Experimental protocol
To have a comprehensive investigation of the proposed MLRF method, its performance is compared to the methods in the literature [29][30][31]39] which used the same dataset for training and testing.As there is no common protocol for algorithm comparison on the dataset, we did our best to make an effective comparison by keeping the same amount of training data as in the literature and using the same evaluation metric.In total, we designed 8 cases with different training data scales, in which Cases 1-6 have identical numbers of training data to the literature [29][30][31]39].In the literature, Liu et al. [31] 2. The results in the articles [32,40] are not compared because they either used a different experimental protocol or omitted the class of drift data, which makes it difficult for a fair comparison.

Implementation details
At first, the acceleration data are converted to gray-scale waveform images I ∈ R W×H×C , where W, H, C represent the width, height, and channel of the image.In our experiments, we set W = H = 100,C = 1.To obtain the representation of the image data, three LBP feature descriptors are applied on the image data in parallel.The parameters of the three LBP descriptors are set as (p = 24, r = 3), (p = 16, r = 2), and (p = 8, r = 1), respectively.Then by calculating the co-occurrence statistics of the binary encoded features on the three feature maps, three corresponding representations of the image data can be obtained.After that, the three representations are concatenated as the final feature vector of the image to feed the RF classifier.
For the RF classifier, the random state and the number of decision trees in the random forests are two parameters that have an obvious effect on the performance of the classifiers.Therefore, to search for the optimized hyper-parameters for the RF classifier, random states are searched from 0 to 10,000, and the number of trees is searched from 1 to 300 for the RF classifier.The model which has the best test result will be selected.The other parameters of the RF classifier are as follows.Gini impurity function is applied for measuring the quality of a split.The maximum number of features is the square root of the length of the feature vector.The maximum depth of the tree is determined by whether the nodes are expanded until all leaves are pure or until all leaves contain less than 2 samples.The minimum number of samples for splitting an internal node is set to 2, and the minimum number of samples required to be at a leaf node is 1.

Experimental results and discussions
In this section, the experimental results of the proposed method are

Results of MLRF
Following the workflow of the proposed method as introduced in Fig. 1, in the beginning, the acceleration data are converted into grayscale images.Then, three LBP feature extractors are applied to all the image data.Fig. 9 shows examples of the feature maps of each class of the image data extracted by an LBP descriptor (p = 24, r = 3) and corresponding statistics of features.Various texture features in Fig. 9  Subsequently, the feature vectors are fed into the RF classifier for training and testing.In total, 8 cases with different numbers of training data are investigated, as introduced in Section 4. The index of accuracy, as defined in Eq. ( 10), is used to evaluate the performance of the proposed method.The accuracy is equal to the number of correct predictions divided by the total number of predictions.The RF classifier can directly output an integer that represents the class of the data.If the prediction of one data is identical to its true label (class number), it will be counted as a correct prediction.
The results of the 8 cases are summarized in Table 3.All 8 cases achieved about 100% training accuracies.Cases 1-6 obtained over 97% test accuracies.Surprisingly, Cases 7-8 also obtained over 95% test accuracies even though the numbers of training data decreased dramatically.Meanwhile, no overfitting appeared in all 8 cases, demonstrating the reliability of the proposed MLRF method for this topic.Compared the results of the proposed MLRF method to the referred articles, MLRF outperforms all other methods.This comparison is also visualized in Fig. 13, which shows the proposed MLRF method achieved higher accuracies in a range of 1.68% to 11.55% through all the cases.In Cases 1-6, when the training data is in normal scales, the proposed MLRF method shows 1.68% higher accuracy than the GAN + CNN method, about 3-5% higher accuracies than the CNN-based method, up to 4.78% higher accuracy than the Shapelet + RF method, and over 10% accuracy than the DNN method.The referred articles did not validate the performance on small-scale training sets, e.g., 200 and 50, thus, the performances of their methods in such circumstances are unknown.However, based on common sense, training deep learning models with such small numbers of training data is very challenging.On the contrary, our MLRF method accomplished remarkable accuracies when training with such small scales of data.From Case 1 to Case 8, the number of training data decreased from 22,626 to 50 (99.78%).However, the accuracy only dropped 2.68%.It proves that the proposed MLRF method has very low dependence on the number of training data.
The detailed test results of MLRF in each case are reported in the confusion matrices shown in Fig. 14.The colors in the confusion matrices are based on the accuracy of the predictions in each class of data.The higher the accuracy, the darker the color.According to the confusion matrices, Classes 1 and 4 (missing and square) obtained over 96.4% accuracy in all cases.Classes 3 and 6 (outlier and drift) achieve accuracies in a range of 75.4-94.4%despite the small-scale and unbalanced training data, as well as some labeling errors.In Case 7, when the number of training data suddenly decreases to 200, only Class 3 (outlier) drops its accuracy to 75.4%, other six classes remain high accuracies that over 92%.In Case 8, when the number of training data almost reaches the minimum value, the accuracies of Classes 0, 1, 3, and 4 are still higher than 96%.The other three classes are also greater than 81%.The above results show that the proposed method is robust to unbalanced training sets.
The notable performance can be explained by the cooperative work of both the discriminative multi-LBP features and the powerful RF classifiers.To demonstrate the high quality of the multi-LBP features, we visualized the data structure of the features processed by the t-SNE algorithm [41], as shown in Fig. 13.In each class, 500 data are picked randomly, and then the dimensionality of their multi-LBP features was reduced to 2. In Fig. 15, data in the same classes are grouped into small clusters and highly overlapped.The clear boundary among all the data classes proves the advanced distinguishing ability of the LBP features.It ensures the high accuracy for data anomaly detection of the proposed method.

Comparison to single LBP feature descriptor
To demonstrate the effectiveness of the feature fusion strategy in the proposed MLRF method, the multi-LBP feature descriptors in MLRF are replaced by a single LBP feature descriptor.The new framework with a single LBP feature descriptor is named SLRF and is illustrated in Fig. 16.Then whether MLRF can outperform SLRF is investigated.
The results of SLRF with three different parameter settings, including (p = 24, r = 3; p = 16, r = 8; p = 8, r = 1), are compared to MLRF, as shown in Table 4.The accuracies of the SLRFs in the 8 cases are all lower than MLRF, indicating the effectiveness of the multi-LBP feature fusion strategy.The improvement of MLRF is in the range of 0.07-2.77%.In Fig. 17, we visualize the test accuracies of the two methods, and apparently, better results of MLRF are demonstrated.

Comparison to single LBP feature descriptor with feature selection
From the histograms in Figs. 9 and 10, it is evident that a minority of values are much greater than the other values.Thus, in this section, the average feature representation of the whole dataset is computed with p = 3, r = 1, as shown in Fig. 18.It is noticeable that only values in the 24th, 25th, 15th, 13th, and 17th bins are larger than 0.01.Then we naturally came up with two questions: (1) whether the small values (< 0.01) in the feature vector are from noise; (2) whether the model can achieve better performance by using only the large values (>0.01) as the representation of image data.Therefore, the proposed MLRF method is adjusted to the framework, as shown in Fig. 19.Only the features higher than 0.01 (top 5) in the feature vector are picked to form a compact representation for the prediction.Thus, the framework is called SLRF-C (C for compact).The results of the MLRF and SLRF-C are compared in Table 5 and Fig. 20.SLRF-C shows lower accuracies than MLRF in all the cases in a range of 0.81-6.02%.As the amount of training data decreases, the advantage of the performance of MLRF becomes more obvious.The results indicate the high quality of the multi-LBP feature extraction strategy, especially tackling small scales of training data.Meanwhile, the high accuracies of SLRF-C prove the feasibility of using LBP features as the representation of image data of waveforms, though the length of the feature vectors is only 5.

Comparison to decision fusion
Feature fusion and decision fusion are two typical fusion schemes.Feature fusion refers to the combination of different features in the feature extraction stage, while decision fusion means the integration of decisions from different classifiers in the classification stage.Hence in this subsection, we proposed a variant of MLRF with decision fusion (named MLRF-df) as the framework shown in Fig. 21.The gray-scale image data flow through three parallel paths with a single LBP feature extractor and a RF classifier.The settings of the three LBP feature extractors are (p = 24, r = 3), (p = 16, r = 2), and (p = 8, r = 1), respectively.In the classification phase, the probability of each class for the test data is predicted respectively by each RF classifier.These mean values of the three parts of probabilities are calculated, and the class with the highest probability will be decided as the final prediction of the data.
The results of MLRF-df are compared with MLRF in Table 6 and Fig. 22. MLRF obtained higher accuracies than MLRF-df in all cases, showing that using feature fusion can achieve better results than using decision fusion.The differences between accuracies are in the range of 0.08-0.18%.

Conclusions
In this article, we propose a simple yet efficient data anomaly detection method MLRF using multi-view LBP feature descriptors and RF.The method uses gray-scale images converted from time-series data as input.Then multi-view LBP features are extracted from the image data to train the RF classifier.Through our investigations and analyses, several conclusions can be drawn as follows.
Firstly, the effectiveness of the proposed MLRF method for data anomaly detection has been fully demonstrated.Compared to the results    in the literature, the state-of-the-art results are achieved by the proposed MLRF method, based on the high accuracies in all 8 cases.Secondly, the feasibility of using LBP features to represent the image data of acceleration waveforms is proved.The variants of the proposed MLRF, e.g., SLRF, SLRF-C, and MLRF-df, can also achieve remarkable performances compared to the literature.
Thirdly, the multi-LBP feature fusion strategy can considerably improve the performance for data anomaly detection, as MLRF archives higher accuracies than other methods in both literature and the ablation study on single LBP features.Furthermore, MLRF tends to have a higher impact when the training data scale becomes small.To sum up, the proposed data anomaly detection approach shows great potential to be used in SHM systems as a data quality control module, which can alleviate the significant labor cost of manual data quality check.Moreover, the proposed method can also be deployed in other types of time-series data, such as displacement, temperature, strain, etc.In our future works, from the data aspect, instead of using waveform images of vibration data as the input, other 2-D representations of vibration data, e.g., spectrogram or Mel-frequency cepstrum, will be investigated.From the algorithm side, an end-to-end few-shot learning scheme will be extended from the proposed method.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Y
.Zhang et al.   first analyzed and compared to the literature in Section 5.1.Subsequently, to demonstrate the advantage of the proposed MLRF method, ablation studies on different features and fusion strategies are further discussed in Sections 5.2-5.4.
are displayed clearly.The edge areas of the waveforms tend to have more texture features.After that, co-occurrence statistics of the binary encoded features are calculated as histograms.Based on Eqs.(1)-(3), the encoded values in each feature map are integers.Thus, the interval of the bins in the histograms is set to 1.For each LBP feature descriptor, the range of bins of the histogram is from 0 to the maximum binary encoded feature.As a result, the lengths of the feature vectors obtained by the three LBP feature descriptors are 26, 19, and 10, respectively.The three feature vectors are then concatenated as the final representation of the image data with a length of 55.Fig. 10 compares the LBP feature maps and corresponding histograms extracted by the three LBP feature descriptors, presenting the complimentary correlations of the extracted features.To have a general observation of the LBP features in each data class, the distributions of the 24th and 25th features in the fused feature vectors of each data class are illustrated as representatives, as shown in the box charts in Figs.11 and 12.Those two figures clearly show the different distributions of the feature values in each data class.Interestingly, the ranges of Class 1 and 4 (missing and square) are very narrow in the boxplots.Such a phenomenon can be explained by the lowest diversities of the image data of those two classes.Because most of the pixel intensities in an image of missing are 1, while most pixel intensities in an image of square anomaly are 0.

Fig. 11 .
Fig. 11.Distributions of 24th feature in the representation of each class of data.

Fig. 12 .
Fig. 12. Distributions of 25th feature in the representation of each class of data.
Finally, the proposed MLRF method has a very low dependency on the number of training data.Outstanding performances in the smallscale training data cases with only 200 and 50 unbalanced data shed light on the potentials of few-shot learning scenarios.

Table 1
Distribution of data.
Y.Zhang et al.

Table 2
Training data distribution in each class.
MLRF can be evaluated in more comprehensive training situations, i.e., with varied training data scales.The detailed training data distribution of the 8 cases is described in Table

Table 3
Comparison between the proposed MLRF and literature.

Table 4
Comparison of the test accuracies between MLRF and SLRF.

Table 5
Comparison of the test accuracies between MLRF and SLRF-C.

Table 6
Comparison of the test accuracies between MLRF and MLRF-df.