1 Introduction

Turbine compressors, industrial energy recovery turbines (TRT), and other equipment are widely used in petrochemical, metallurgy, wind tunnel tests and other national pillar industries. The blade is a key component with a complex structure. Its machining quality directly affects the performance of the turbine power system.

The blade size accuracy of manual polishing is completely dependent on the operator’s experience, which leads to poor blade size consistency due to the big difference between personal experience and easy fatigue of operation; the dust pollution in the process of polishing is serious, the labor intensity is hard, which makes it difficult to hire young workers under 40 years old at present.

In manual blade grinding, workers control the material removal rate (MRR) by experience through observing the characteristics of grinding sparks. To achieve automatic polishing, it is necessary to solve the problem of image processing in grinding sparks.

Fan [1] proposed an image low-level structure feature extraction method based on a deep neuron network and stacked sparse noise reduction auto-encoder (SSDA). It could extract the features of natural images and infrared image features. An unsupervised deep-learning training method was proposed by Chelsea Finn [2]. Combining predictive models obtained through raw perception with model predictive control (MPC) enabled the robot to show significant computer vision tasks under complex, unstructured environments. Avelino [3] addressed the problems of rapid visual movement and delays in image acquisition systems. By combining automatic stabilization mechanisms and advanced detection algorithms, the prediction accuracy of pedestrian position improved, and the human − robot interaction behavior strengthened.

Our team has studied the abrasive belt polishing workpiece process for many years. Among them, Ren [4] investigated the relationship between the spark field characteristics and the MRR. Ren [5] established a contact stress model on Hertzian contact theory, and developed a material removal rate mode to recognize the material removal in surface polishing. Wang Nina [6, 7] established an MRR model using images and sound. Wang Nina [8] proposed a new method using 2D-CNN learning to monitor material removal. Wang Nina [9, 10] investigated the effect of the wear state of abrasive belts on the MRR among the workpiece polishing, established a method for monitoring the wear state of belts based on the machine vision and image processing.

The above methods used a variety of sensors and investigated the effects of spark images and sound on the MRR. However, there are still the following issues. The segmentation of spark images is not fully automated; when the light is dim, it is hard to distinguish spark image from some background images; the processing of spark-image segmentation after a complete machining process is finished; an online system cannot be mentioned.

Huang [24] used the deep learning algorithm YOLO5 to perform target detection on spark images and obtain spark image regions. However, this paper is a further study after that one, to establish an automatic spark image segmentation system that can quickly detect target image regions in complex environments and separate them. The objective is to lay the foundation for studying the relevance between spark images and MRR and to achieve automatic workpiece processing.

The above processing methods for spark images are closely related to computer vision technology. The target detection and image segmentation are a research hotspot. Many new algorithms have proposed. Among the target detection algorithms, a typical one is the YOLO algorithm [11, 12]. The algorithm has developed through five generations, the accuracy and speed of target detection have greatly improved, and it can achieve multitarget detection [13, 14]. Its counterparts are Fast-RCNN [15] and Faster-RCNN [16] algorithms, with accurate target detection, but it still have some disadvantages. The YOLO algorithm is better by fast speed and high accuracy and is suitable for application in real-time target detection.

Image segmentation methods include semantic segmentation method, pixel segmentation method, and others. Many scholars have combined deep learning with image segmentation techniques and proposed new segment algorithms. Olaf R. proposed the U-Net algorithm [17]. He, K. proposed the MASK R-CNN algorithm [18], which achieved semantic segmentation of images. In addition, many scholars have improved the classical algorithm. For example, Lian [19] proposed an attention feature-based method for finding small targets, achieved the detection of small targets in traffic scenes and improved the accuracy of target prediction. Wenkel [20] presented a methodology to evaluate the confidence level in target detection and obtained the best performance point to improve the efficiency. Wang [21] applied the YOLOv3 model to complete the experimental targets. Arunabha [22] used a YOLOv4 algorithm to improve the accuracy of plant disease detection.

In order to establish the model between spark images and MRR, separating the spark images from the background is an important part. When we tried to segment the spark images, we found that the spark images were irregular in shape, similar to the shape of fireworks, with complex boundaries. Although using deep learning methods for segmentation, the target image had to be labeled first, which was very time consuming, had more manual marking errors, and the final segmentation effect after training was not very satisfactory. Therefore, we proposed a new method, first detected the target region with YOLO5, then used fully connected CRF to further segment and refine the sparkle image within the region, and finally extracted the sparkle image completely. Concretely, as shown in Table 1.

Table 1 Introduction to the content of this article

Section 2 introduces the experimental and the method of spark image acquisition. Section 3 discusses the process and steps of YOLO5 target detection, which achieves the separation of the spark image region from the background. Section 4 introduces the principle and method of fully connected CRF image segmentation, which further segments the complete spark image from the spark region. Section 5 shows the image segmentation using the above method and compares it with other algorithms. Section 6 discusses the result.

2 Experiment Setup

2.1 Belt Grinding Mechanism

To realize the study of spark images, we built an experimental platform to record a complete workpiece machining process and capture the spark images collected during the machining process. Figure 2 shows the experiment, which mainly includes a three-axis machine, two cameras, and two computers. In Fig. 1, the workpiece selected GCr15 with a hardness of HRC58 and a size of 170 mm × 41 mm × 50 mm. The motor speed was 0 ~ 5000 rpm, the speed of the driven belt was 0–34 m/s. The contact wheel was rubber with a Shore A hardness of 85. The belt material was corundum and a width of 20 mm. When the motor rotates at high speed, it could cut the workpiece and produce a spark field.

Fig. 1
figure 1

Belt grinding system

A Beckhoff CX5130 embedded was selected as the controller; Table 2 shows the parameters.

Table 2 Beckhoff CX5130 controller

The type of the camera is MT-E200GC-T CMOS; its parameters are shown in Table 3.

Table 3 The indicators of cameras

The CMOS camera can collect images in real-time with Mindvision software, its frame frequency is 100 Hz.

2.2 The Mechanism of Spark Generation

During the grinding process, a number of sparks generated due to the high-speed rotation of the abrasive belt. Two industrial CCD cameras were mounted on the front and side of the spark field to record and save the spark images at a frame rate of 100 Hz. Figure 2a shows the acquired side spark images, Fig. 2b shows the frontal spark images.

Fig. 2
figure 2

a Side spark image. b Frontal spark image

3 Steps of Image Segmentation

The whole processing steps of image segmentation are shown in Fig. 3. In the second section, we built an experimental platform and obtained spark images. Then, the acquired dataset was labeled and segmented and then trained with YOLO5 to the target region of the spark image. It will explain in a more in-depth explanation in Sect. 4. Finally, the images in this region were further subdivided and segmented with fully connected CRF, separate the complete spark image from it. The method used will describe detail in Part 5.

Fig. 3
figure 3

Image segmentation steps

4 YOLO5 Target Detection Process and Steps

4.1 Image Annotation

YOLO5 training the dataset. We used software to label all images, as shown in Fig. 4, with only “fire” for labeling. The dataset divided into two parts: testing and training dataset, with a ratio of 0.2:0.8.We annotated the data in the training dataset.

Fig. 4
figure 4

Image annotation

4.2 YOLO5 Model

The YOLO target detection algorithm has undergone many years of development. The latest version is YOLO5, which has several versions. The minimum architecture is YOLO5S, which is only 14 Mb and can detect targets quickly and accurately. The basic architecture of YOLO5S shown in Fig. 5.

Fig. 5
figure 5

YOLO5s architecture

It mainly includes four parts: Data input side, Backbone, Neck, and Prediction. Among them, the data input side adopts Mosaic data enhancement to improve the effect of target detection. Backbone adopts a Focus structure and Cross-Stage Partial (CSP) structure. The original 608*608*3 image is input to the Focus structure. It becomes a 304 * 304 * 12 feature map after slicing. Through 32 convolution kernels, it finally becomes a feature map of 304*304*32. At the end of Backone, SPPF (Fast Spatial Pyramid Pooling) is used to fuse feature maps of different receptive fields, thereby improving the expression ability of feature maps. The CSP1_X structure applied to the Backbone core network, and the other CSP2_X structure applied to Neck. Neck adopts the form of Feature Pyramid Networks (FPN) and Pixel Aggregation Networks (PAN). The FPN obtains the feature map for making predictions. In the PAN structure, two feature maps combined using concat operation and the feature.

4.3 Train

The computer configuration for training YOLO5 is shown in Table 4. The computer is equipped with an RTX3060 GPU, could significantly improve the training speed. The training performed on the frontal and side spark image datasets.

Table 4 Configuration for Yolo5

The results of frontal and side spark image target detection are shown in Figs. 6 and 7. The complete spark image area with an accuracy of 0.9 or higher detected.

Fig. 6
figure 6

YOLO5 frontal spark image training results

Fig. 7
figure 7

YOLO5 side spark image training results

The software used to crop the above image according to the bounding box of the YOLO5 detection region to obtain the front and side spark image regions. As shown in Fig. 8, the region will be further segmented below to separate the spark images from the background.

Fig. 8
figure 8

a YOLO5 crop side spark image. b YOLO5 cropped frontal spark image

5 Fully Connected CRF Image Segmentation Method

5.1 Fully Connected CRF Image Segmentation Principle

Fully connected CRF uses an efficient full-connected conditional random field model for image semantic segmentation. In the fully connected pairwise CRF model, the corresponding Gibbs energy [23] is shown in Eq. (1).

$$ E(x) = \sum\limits_{i} {\psi_{u} (x_{i} ) + \sum\limits_{i < j} {\psi_{p} (x_{i} ,x_{j} )} } . $$
(1)

The range of i and j is from 1 to N. The unary potential ψu(xi) is computed independently for each pixel. The unary potential incorporates shape, texture, and location. The pairwise potentials are shown in Eq. (2).

$$ \phi_{p} (x_{i} ,x_{j} ) = \mu (x_{i} ,x_{j} )\sum\limits_{m = 1}^{K} {\omega^{(m)} k^{(m)} } (f_{i} ,f_{j} ). $$
(2)

The k(m) is a Gaussian kernel, the vector fi and fj are feature vectors for pixels, ω(m) is a linear combination weight, and µ is a label compatibility function.

The mean field approximation computes a distribution Q(X) that minimizes the KL divergence using the following iterative update equation, as shown in Eq. (3).

$$ Q_{i} (x_{i} = l) = \frac{1}{{Z_{i} }}\exp \{ - \psi_{u} (x_{i} ) - \sum\limits_{{l^{\prime} \in L}} {\mu (l,l^{\prime})\sum\limits_{m = 1}^{K} {\omega^{(m)} } \sum\limits_{j \ne i}^{K} {k^{(m)} (f_{i} ,f_{j} )Q_{j} (l^{\prime})} \} } . $$
(3)

This updated equation leads to the following inference algorithm [23].

Algorithm 1
figure a

Mean field in fully connected CRF [23]

5.2 Fully Connected CRF Image Segmentation Results

We further subdivided the area map of the spark image obtained after YOLO5 detection in Fig. 8 with a fully connected CRF. Then, the extracted complete spark image is shown in Fig. 9. It can be seen from the figure that the spark image processed by this method removes the complex background and obtains the complete spark image.

Fig. 9
figure 9

a Side spark image after image segmentation. b Frontal spark image after image segmentation

6 Discussion

6.1 U-net Algorithm for Image Segmentation

In addition, we also segmented the spark images with the U-Net algorithm. Figure 10 shows the segmentation results. It can be seen that some of the backgrounds in the segmentation not completely removed cleanly, some of the spark images are lost. It indicates that the overall effect is not as good as using our method.

Fig. 10
figure 10

Comparison of segmentation results between U-NET and CRF

6.2 PASCAL VOC 2007

We also used the PASCAL VOC 2007 dataset to validate the algorithm in our experiment. We randomly partitioned the images into two groups: 80% is training set, and 20% is the test set. Segmentation accuracy was measured using the standard VOC measure. The accuracy was shown in Table 5. Table 5 reports the segmentation accuracy against the U-Net. The segmentation images shown in Fig. 11 and in Table 5, our algorithm has improved accuracy by 10% as compared to the U-NET algorithm and faster.

Table 5 Qualitative and quantitative results on the PASCAL VOC dataset
Fig. 11
figure 11

Qualitative results on the PASCAL VOC 2010 dataset

6.3 Evaluation

Once the training completed, the optimal model was used to predict the images of the test set in the spark images dataset and PASCAL VOC dataset. The results of which are shown in Figs. 12 and 13.

Fig. 12
figure 12

The test results on the fire images dataset

Fig. 13
figure 13

The test results on PASCAL VOC dataset

As shown in Figs. 12 and 13, when the optimal model generated by fully connected CRF was used to predict test images, the accuracy reached over 0.96, and the segmentation of a single image took 0.2 s.

7 Conclusion

The paper uses YOLO5 + fully connected CRF to achieve the segmentation of images. The method suitable for graphics with irregular image edges and complex background images. Through experiments, we demonstrate that we successfully detected the target region image with YOLO5 to narrow the image segmentation and then further segment the region image with fully connected CRF to extract the spark image. This method provides a new idea for image segmentation. Later, we will further process the spark image, calculate the area of the spark region, and find the relationship between the spark image and the MRR.