A New Image Segmentation Method Based on the YOLO5 and Fully Connected CRF

Huang, Jian; Zhang, Guangpeng; Ren, Li juan; Wang, Nina

doi:10.1007/s44196-023-00365-9

A New Image Segmentation Method Based on the YOLO5 and Fully Connected CRF

Research Article
Open access
Published: 14 November 2023

Volume 16, article number 180, (2023)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Computational Intelligence Systems Aims and scope Submit manuscript

A New Image Segmentation Method Based on the YOLO5 and Fully Connected CRF

Download PDF

Jian Huang^1,2,
Guangpeng Zhang¹,
Li juan Ren¹ &
…
Nina Wang¹

527 Accesses
Explore all metrics

Abstract

When manually polishing blades, skilled workers can quickly machine a blade by observing the characteristics of the polishing sparks. To help workers better recognize spark images, we used an industrial charge-coupled device (CCD) camera to capture the spark images. Firstly, the spark image region detected by yolo5, then segment from the background. Secondly, the target region was further segmented and refined in a fully connected conditional random field (CRF), from which the complete spark image obtained. Experimental results showed that this method could quickly and accurately segment whole spark image. The test results showed that this method was better than other image segmentation algorithms. Our method could better segment irregular image, improve recognition and segmentation efficiency of spark image, achieve automatic image segmentation, and replace human observation.

Image segmentation evaluation: a survey of methods

Article 18 April 2020

A comprehensive survey of image segmentation: clustering methods, performance parameters, and benchmark datasets

Article 09 February 2021

A survey on instance segmentation: state of the art

Article 03 July 2020

1 Introduction

Turbine compressors, industrial energy recovery turbines (TRT), and other equipment are widely used in petrochemical, metallurgy, wind tunnel tests and other national pillar industries. The blade is a key component with a complex structure. Its machining quality directly affects the performance of the turbine power system.

The blade size accuracy of manual polishing is completely dependent on the operator’s experience, which leads to poor blade size consistency due to the big difference between personal experience and easy fatigue of operation; the dust pollution in the process of polishing is serious, the labor intensity is hard, which makes it difficult to hire young workers under 40 years old at present.

In manual blade grinding, workers control the material removal rate (MRR) by experience through observing the characteristics of grinding sparks. To achieve automatic polishing, it is necessary to solve the problem of image processing in grinding sparks.

Fan [1] proposed an image low-level structure feature extraction method based on a deep neuron network and stacked sparse noise reduction auto-encoder (SSDA). It could extract the features of natural images and infrared image features. An unsupervised deep-learning training method was proposed by Chelsea Finn [2]. Combining predictive models obtained through raw perception with model predictive control (MPC) enabled the robot to show significant computer vision tasks under complex, unstructured environments. Avelino [3] addressed the problems of rapid visual movement and delays in image acquisition systems. By combining automatic stabilization mechanisms and advanced detection algorithms, the prediction accuracy of pedestrian position improved, and the human − robot interaction behavior strengthened.

Our team has studied the abrasive belt polishing workpiece process for many years. Among them, Ren [4] investigated the relationship between the spark field characteristics and the MRR. Ren [5] established a contact stress model on Hertzian contact theory, and developed a material removal rate mode to recognize the material removal in surface polishing. Wang Nina [6, 7] established an MRR model using images and sound. Wang Nina [8] proposed a new method using 2D-CNN learning to monitor material removal. Wang Nina [9, 10] investigated the effect of the wear state of abrasive belts on the MRR among the workpiece polishing, established a method for monitoring the wear state of belts based on the machine vision and image processing.

The above methods used a variety of sensors and investigated the effects of spark images and sound on the MRR. However, there are still the following issues. The segmentation of spark images is not fully automated; when the light is dim, it is hard to distinguish spark image from some background images; the processing of spark-image segmentation after a complete machining process is finished; an online system cannot be mentioned.

Huang [24] used the deep learning algorithm YOLO5 to perform target detection on spark images and obtain spark image regions. However, this paper is a further study after that one, to establish an automatic spark image segmentation system that can quickly detect target image regions in complex environments and separate them. The objective is to lay the foundation for studying the relevance between spark images and MRR and to achieve automatic workpiece processing.

The above processing methods for spark images are closely related to computer vision technology. The target detection and image segmentation are a research hotspot. Many new algorithms have proposed. Among the target detection algorithms, a typical one is the YOLO algorithm [11, 12]. The algorithm has developed through five generations, the accuracy and speed of target detection have greatly improved, and it can achieve multitarget detection [13, 14]. Its counterparts are Fast-RCNN [15] and Faster-RCNN [16] algorithms, with accurate target detection, but it still have some disadvantages. The YOLO algorithm is better by fast speed and high accuracy and is suitable for application in real-time target detection.

Image segmentation methods include semantic segmentation method, pixel segmentation method, and others. Many scholars have combined deep learning with image segmentation techniques and proposed new segment algorithms. Olaf R. proposed the U-Net algorithm [17]. He, K. proposed the MASK R-CNN algorithm [18], which achieved semantic segmentation of images. In addition, many scholars have improved the classical algorithm. For example, Lian [19] proposed an attention feature-based method for finding small targets, achieved the detection of small targets in traffic scenes and improved the accuracy of target prediction. Wenkel [20] presented a methodology to evaluate the confidence level in target detection and obtained the best performance point to improve the efficiency. Wang [21] applied the YOLOv3 model to complete the experimental targets. Arunabha [22] used a YOLOv4 algorithm to improve the accuracy of plant disease detection.

In order to establish the model between spark images and MRR, separating the spark images from the background is an important part. When we tried to segment the spark images, we found that the spark images were irregular in shape, similar to the shape of fireworks, with complex boundaries. Although using deep learning methods for segmentation, the target image had to be labeled first, which was very time consuming, had more manual marking errors, and the final segmentation effect after training was not very satisfactory. Therefore, we proposed a new method, first detected the target region with YOLO5, then used fully connected CRF to further segment and refine the sparkle image within the region, and finally extracted the sparkle image completely. Concretely, as shown in Table 1.

Table 1 Introduction to the content of this article

Full size table

Section 2 introduces the experimental and the method of spark image acquisition. Section 3 discusses the process and steps of YOLO5 target detection, which achieves the separation of the spark image region from the background. Section 4 introduces the principle and method of fully connected CRF image segmentation, which further segments the complete spark image from the spark region. Section 5 shows the image segmentation using the above method and compares it with other algorithms. Section 6 discusses the result.

2 Experiment Setup

2.1 Belt Grinding Mechanism

To realize the study of spark images, we built an experimental platform to record a complete workpiece machining process and capture the spark images collected during the machining process. Figure 2 shows the experiment, which mainly includes a three-axis machine, two cameras, and two computers. In Fig. 1, the workpiece selected GCr15 with a hardness of HRC58 and a size of 170 mm × 41 mm × 50 mm. The motor speed was 0 ~ 5000 rpm, the speed of the driven belt was 0–34 m/s. The contact wheel was rubber with a Shore A hardness of 85. The belt material was corundum and a width of 20 mm. When the motor rotates at high speed, it could cut the workpiece and produce a spark field.

A Beckhoff CX5130 embedded was selected as the controller; Table 2 shows the parameters.

Table 2 Beckhoff CX5130 controller

Full size table

The type of the camera is MT-E200GC-T CMOS; its parameters are shown in Table 3.

Table 3 The indicators of cameras

Full size table

The CMOS camera can collect images in real-time with Mindvision software, its frame frequency is 100 Hz.

2.2 The Mechanism of Spark Generation

During the grinding process, a number of sparks generated due to the high-speed rotation of the abrasive belt. Two industrial CCD cameras were mounted on the front and side of the spark field to record and save the spark images at a frame rate of 100 Hz. Figure 2a shows the acquired side spark images, Fig. 2b shows the frontal spark images.

3 Steps of Image Segmentation

The whole processing steps of image segmentation are shown in Fig. 3. In the second section, we built an experimental platform and obtained spark images. Then, the acquired dataset was labeled and segmented and then trained with YOLO5 to the target region of the spark image. It will explain in a more in-depth explanation in Sect. 4. Finally, the images in this region were further subdivided and segmented with fully connected CRF, separate the complete spark image from it. The method used will describe detail in Part 5.

4 YOLO5 Target Detection Process and Steps

4.1 Image Annotation

YOLO5 training the dataset. We used software to label all images, as shown in Fig. 4, with only “fire” for labeling. The dataset divided into two parts: testing and training dataset, with a ratio of 0.2:0.8.We annotated the data in the training dataset.

4.2 YOLO5 Model

The YOLO target detection algorithm has undergone many years of development. The latest version is YOLO5, which has several versions. The minimum architecture is YOLO5S, which is only 14 Mb and can detect targets quickly and accurately. The basic architecture of YOLO5S shown in Fig. 5.

It mainly includes four parts: Data input side, Backbone, Neck, and Prediction. Among them, the data input side adopts Mosaic data enhancement to improve the effect of target detection. Backbone adopts a Focus structure and Cross-Stage Partial (CSP) structure. The original 608*608*3 image is input to the Focus structure. It becomes a 304 * 304 * 12 feature map after slicing. Through 32 convolution kernels, it finally becomes a feature map of 304*304*32. At the end of Backone, SPPF (Fast Spatial Pyramid Pooling) is used to fuse feature maps of different receptive fields, thereby improving the expression ability of feature maps. The CSP1_X structure applied to the Backbone core network, and the other CSP2_X structure applied to Neck. Neck adopts the form of Feature Pyramid Networks (FPN) and Pixel Aggregation Networks (PAN). The FPN obtains the feature map for making predictions. In the PAN structure, two feature maps combined using concat operation and the feature.

4.3 Train

The computer configuration for training YOLO5 is shown in Table 4. The computer is equipped with an RTX3060 GPU, could significantly improve the training speed. The training performed on the frontal and side spark image datasets.

Table 4 Configuration for Yolo5

Full size table

The results of frontal and side spark image target detection are shown in Figs. 6 and 7. The complete spark image area with an accuracy of 0.9 or higher detected.

The software used to crop the above image according to the bounding box of the YOLO5 detection region to obtain the front and side spark image regions. As shown in Fig. 8, the region will be further segmented below to separate the spark images from the background.

5 Fully Connected CRF Image Segmentation Method

5.1 Fully Connected CRF Image Segmentation Principle

Fully connected CRF uses an efficient full-connected conditional random field model for image semantic segmentation. In the fully connected pairwise CRF model, the corresponding Gibbs energy [23] is shown in Eq. (1).

$$ E(x) = \sum\limits_{i} {\psi_{u} (x_{i} ) + \sum\limits_{i < j} {\psi_{p} (x_{i} ,x_{j} )} } . $$

(1)

The range of i and j is from 1 to N. The unary potential ψ_u(x_i) is computed independently for each pixel. The unary potential incorporates shape, texture, and location. The pairwise potentials are shown in Eq. (2).

$$ \phi_{p} (x_{i} ,x_{j} ) = \mu (x_{i} ,x_{j} )\sum\limits_{m = 1}^{K} {\omega^{(m)} k^{(m)} } (f_{i} ,f_{j} ). $$

(2)

The k^(m) is a Gaussian kernel, the vector f_i and f_j are feature vectors for pixels, ω^(m) is a linear combination weight, and µ is a label compatibility function.

The mean field approximation computes a distribution Q(X) that minimizes the KL divergence using the following iterative update equation, as shown in Eq. (3).

$$ Q_{i} (x_{i} = l) = \frac{1}{{Z_{i} }}\exp \{ - \psi_{u} (x_{i} ) - \sum\limits_{{l^{\prime} \in L}} {\mu (l,l^{\prime})\sum\limits_{m = 1}^{K} {\omega^{(m)} } \sum\limits_{j \ne i}^{K} {k^{(m)} (f_{i} ,f_{j} )Q_{j} (l^{\prime})} \} } . $$

(3)

This updated equation leads to the following inference algorithm [23].

5.2 Fully Connected CRF Image Segmentation Results

We further subdivided the area map of the spark image obtained after YOLO5 detection in Fig. 8 with a fully connected CRF. Then, the extracted complete spark image is shown in Fig. 9. It can be seen from the figure that the spark image processed by this method removes the complex background and obtains the complete spark image.

6 Discussion

6.1 U-net Algorithm for Image Segmentation

In addition, we also segmented the spark images with the U-Net algorithm. Figure 10 shows the segmentation results. It can be seen that some of the backgrounds in the segmentation not completely removed cleanly, some of the spark images are lost. It indicates that the overall effect is not as good as using our method.

6.2 PASCAL VOC 2007

We also used the PASCAL VOC 2007 dataset to validate the algorithm in our experiment. We randomly partitioned the images into two groups: 80% is training set, and 20% is the test set. Segmentation accuracy was measured using the standard VOC measure. The accuracy was shown in Table 5. Table 5 reports the segmentation accuracy against the U-Net. The segmentation images shown in Fig. 11 and in Table 5, our algorithm has improved accuracy by 10% as compared to the U-NET algorithm and faster.

Table 5 Qualitative and quantitative results on the PASCAL VOC dataset

Full size table

6.3 Evaluation

Once the training completed, the optimal model was used to predict the images of the test set in the spark images dataset and PASCAL VOC dataset. The results of which are shown in Figs. 12 and 13.

As shown in Figs. 12 and 13, when the optimal model generated by fully connected CRF was used to predict test images, the accuracy reached over 0.96, and the segmentation of a single image took 0.2 s.

7 Conclusion

The paper uses YOLO5 + fully connected CRF to achieve the segmentation of images. The method suitable for graphics with irregular image edges and complex background images. Through experiments, we demonstrate that we successfully detected the target region image with YOLO5 to narrow the image segmentation and then further segment the region image with fully connected CRF to extract the spark image. This method provides a new idea for image segmentation. Later, we will further process the spark image, calculate the area of the spark region, and find the relationship between the spark image and the MRR.

Data Availability

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to the data also forms part of an ongoing study.

References

Fan, Z.: Low-level structure feature extraction for image processing via stacked sparse denoising autoencoder[J]. Neurocomputing 243, 12–23 (2017)
Article Google Scholar
Finn, C.: Deep visual foresight for planning robot motion[J]. IEEE. Int. Conf. Robot. Autom. arXiv:1610.00696v2 [cs.LG] (2017)
Avelino, J.: On the perceptual advantages of visual suppression mechanisms for dynamic robot systems[J]. Procedia Comput. Sci. 88, 505–511 (2016)
Article Google Scholar
Ren, L.J., Zhang, G.P., Wang, Y., Zhang, Q., Huang, Y.M.: A new in-process material removal rate monitoring approach in abrasive belt grinding. Int. J. Adv. Manuf. Technol. 104, 2715–2726 (2019)
Article Google Scholar
Ren, L.J., Zhang, G.P., Zhang, L., Zhang, Z., Huang, Y.M.: Modelling and investigation of material removal profile for computer controlled ultra-precision polishing. Precis. Eng. 55, 144–153 (2019)
Article Google Scholar
Wang, N., Zhang, G., Pang, W., Ren, L., Wang, Y.: Novel monitoring method for material removal rate considering quantitative wear of abrasive belts based on LightGBM learning algorithm. Int. J. Adv. Manuf. Technol. 114, 3241–3253 (2021)
Article Google Scholar
Wang, N., Zhang, G., Pang, W., Wang, Y.: Vision and sound fusion-based material removal rate monitoring for abrasive belt grinding using improved LightGBM algorithm. J. Manuf. Process. 66, 281–292 (2021)
Article Google Scholar
Wang, N., Zhang, G., Ren, L., Li, Y., Yang, Z.: In-process material removal rate monitoring for abrasive belt grinding using multisensor fusion and 2D CNN algorithm. Int. J. Adv. Manuf. Technol. 120, 599–613 (2022)
Article Google Scholar
Wang, N., Zhang, G., Ren, L., Pang, W., Li, Y.: Novel monitoring method for belt wear state based on machine vision and image processing under grinding parameter variation. Int. J. Adv. Manuf. Technol. 122, 87–101 (2022)
Article Google Scholar
Wang, N., Zhang, G., Ren, L., Yang, Z.: Analysis of abrasive grain size effect of abrasive belt on material removal performance of GCr15 bearing steel. Tribol. Int.. Int. 171, 107531 (2022)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unifified, real-time object detection. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. 2018. arXiv:1804.02767
Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: YOLOv4: Optimal Speed and Accuracy of Object Detection. 2020. arXiv:2004.10934v1
Girshick, R.: Fast R-CNN. 2015. arXiv:1504.08083v2.
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. 2016. arXiv:1506.01497v3
Olaf, R., Philipp, F., Thomas, B.: U-Net: Convolutional Networks for Biomedical Image Segmentation. 2015. arXiv:1505.04597v1
He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. 2018. arXiv:1703.06870v3
Lian, J., Yin, Y., Li, L., Wang, Z., Zhou, Y.: Small object detection in traffic scenes based on attention feature fusion. Sensors 21, 3031 (2021). https://doi.org/10.3390/s21093031
Article Google Scholar
Wenkel, S., Alhazmi, K., Liiv, T., Alrshoud, S., Simon, M.: Confifidence score: the forgotten dimension of object detection performance evaluation. Sensors 21, 4350 (2021). https://doi.org/10.3390/s21134350
Article Google Scholar
Wang, J., Wang, N., Li, L., Ren, Z.: Real-time behavior detection and judgment of egg breeders based on YOLO v3. Neural Comput. Appl. 32, 5471–5481 (2020). https://doi.org/10.1007/s00521-019-04645-4
Article Google Scholar
Arunabha, R., Roy, M., Bose, R., Bhaduri, J.: A fast accurate fine-grain object detection model based on YOLO4deep neural network. Neural Comput. Appl.Comput. Appl. 34, 3895–3921 (2022)
Article Google Scholar
Philipp, K., Vladlen, K.: Efficient inference in fully connected CRFs with Gaussian edge potentials. 2012, arXiv:1210.5644v1
Huang, J., Zhang, G.: A study of an online tracking system for spark images of abrasive belt-polishing workpieces. Sensors 23, 2025 (2023). https://doi.org/10.3390/s23042025
Article Google Scholar

Download references

Funding

This work was supported by the National Natural Science Foundation of China under Grant No. 52275511 (professor Guangpeng Zhang). This work was also supported by the Natural Science Basic Research Program of Shaanxi under Grant No. 2023-JC-QN-0428 (Doctor Lijuan Ren).

Author information

Authors and Affiliations

School of Mechanical and Precision Instrument Engineering, Xi’an University of Technology, Xi’an, 710048, China
Jian Huang, Guangpeng Zhang, Li juan Ren & Nina Wang
School of Computer Science, XiJing University, Xi’an, 710123, China
Jian Huang

Authors

Jian Huang
View author publications
You can also search for this author in PubMed Google Scholar
Guangpeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Li juan Ren
View author publications
You can also search for this author in PubMed Google Scholar
Nina Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Guang Peng Zhang planned the entire experiment. Jian Huang performed the analysis and summarized the experimental data and was a major contributor to the writing of the manuscript. Lijuan Ren and Nina Wang participated in grinding experiments. All authors read and approved the final manuscript. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Jian Huang.

Ethics declarations

Conflict of Interest

The authors declare no competing interests.

Ethical Approval

All data in this paper comes from machining grinding experiments and does not involve ethical issues.

Consent to Participate

Not applicable.

Consent to Publish

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Huang, J., Zhang, G., Ren, L.j. et al. A New Image Segmentation Method Based on the YOLO5 and Fully Connected CRF. Int J Comput Intell Syst 16, 180 (2023). https://doi.org/10.1007/s44196-023-00365-9

Download citation

Received: 12 July 2023
Accepted: 01 November 2023
Published: 14 November 2023
DOI: https://doi.org/10.1007/s44196-023-00365-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A New Image Segmentation Method Based on the YOLO5 and Fully Connected CRF

Abstract

Similar content being viewed by others

Image segmentation evaluation: a survey of methods

A comprehensive survey of image segmentation: clustering methods, performance parameters, and benchmark datasets

A survey on instance segmentation: state of the art

1 Introduction

2 Experiment Setup

2.1 Belt Grinding Mechanism

2.2 The Mechanism of Spark Generation

3 Steps of Image Segmentation

4 YOLO5 Target Detection Process and Steps

4.1 Image Annotation

4.2 YOLO5 Model

4.3 Train

5 Fully Connected CRF Image Segmentation Method

5.1 Fully Connected CRF Image Segmentation Principle

5.2 Fully Connected CRF Image Segmentation Results

6 Discussion

6.1 U-net Algorithm for Image Segmentation

6.2 PASCAL VOC 2007

6.3 Evaluation

7 Conclusion

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interest

Ethical Approval

Consent to Participate

Consent to Publish

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation