Conveyor-Belt Detection of Conditional Deep Convolutional Generative Adversarial Network

: In underground mining, the belt is a critical component, as its state directly affects the safe and stable operation of the conveyor. Most of the existing non-contact detection methods based on machine vision can only detect a single type of damage and they require pre-processing operations. This tends to cause a large amount of calculation and low detection precision. To solve these problems, in the work described in this paper a belt tear detection method based on a multi-class conditional deep convolutional generative adversarial network (CDCGAN) was designed. In the traditional DCGAN, the image generated by the generator has a certain degree of randomness. Here, a small number of labeled belt images are taken as conditions and added them to the generator and discriminator, so the generator can generate images with the characteristics of belt damage under the aforementioned conditions. Moreover, because the discriminatorcannot identify multipletypes of damage, the multi-class softmax function is used as the output function of the discriminator to output a vector of class probabilities, and it can accurately classify cracks, scratches, and tears. To avoid the features learned incompletely, skip-layer connection is adopted in the generator and discriminator. This not only can minimize the loss of features, but also improves the convergence speed. Compared with other algorithms, experimental results show that the loss value of the generator and discriminatoris the least. Moreover, its convergence speed is faster, and the mean average precision of the proposed algorithm is up to 96.2%, which is at least 6% higher than that of other algorithms.


Introduction
Belts comprise a very important component of conveyors as the condition of the belt directly affects the safe and stable operation of the conveyor [1]. However, the working environments in mining are extremely complex. When coal mixes with angular gangue, thin rods, and other objects, the surface of the conveyor belt is easily worn during transportation of such materials due to uneven force, resulting in scratches, cracks, and severe tears [2]. To ensure mining safety, a beltdetection system is not only designed to quickly detect the working status of a belt, but also to determine the damage location accurately. Rapidity refers to the ability to judge the damage severity of the belt and respond in time. This requires a system-detection algorithm with low complexity. Accuracy refers to discerning different types of damage such as scratches, cracks, and tears, and labelling severe damage locations. At present, belt tear-detection methods can be divided into contact detection [3] and non-contact detection [4]. Contact detection is generally conducted by detecting the force on the rollers, such as swing roller detection [5] and tear pressure detection [6][7][8]. These methods are fast and simple, but when a large coal block passes through the blanking port and collides with the buffer roller, false or missed detection is easily caused. In contrast, non-contact detection methods are usually based on non-destructive detection, such as ultrasonic detection [1,9,10]. Because there is complex noise in underground mining, it is difficult for an ultrasonic system to receive the echoes of longitudinal tearing to perform accurate detection. With the development of machine vision, non-contact detection has gradually begun to use edge extraction to capture saliency areas and other methods to monitor the acquired images. In practical application, most of these image-based detection methods can only detect a single type of damage. They require pre-processing operations, such as binarization, edge extraction, and image denoising, which can easily cause a long computation time.
Deep learning has superiority in using a massive data training network to extract object features. A generative adversarial network (GAN) is a deep-learning model. It is based on the idea of the zero-sum game, which extracts image features through competition between a discriminator and generator. The former tries to identify the image data generated by the generator to minimize the error, while the latter tries to maximize the error. Finally, Nash balance is reached between both, and the foreground and background are segmented according to the difference of the features. When a GAN and its improved algorithm are applied to belt-damage detection, the images generated by the generator are not constructed specifically for belt damage, and thus the features extracted during up-sampling appear to be random, resulting in feature deviations. Moreover, the discriminator mostly uses a binary classification function. It can output only two categories of images, i.e., real and fake, but cannot distinguish multiple types of damage. In addition, the generator and discriminator networks have a large number of layers. When the dimension of the convolutional layers is reduced, only part of the information is retained, which is considered useful, but may cause the loss of important features. Therefore, an improved conditional deep convolutional generative adversarial network (CDCGAN) is proposed and applied to belt-damage detection.
The contributions of this paper are the following. (1) For the images generated by the generator having a certain randomness, a small number of labelled belt images is taken as conditions and are added to the generator and discriminator. According to the conditions, the generator generates images with corresponding belt-damage features. This facilitates learning the characteristics of damaged parts in one image and improves the accuracy of damage detection.
(2) Because the discriminator for a DCGAN cannot identify multiple types of damage, a multiclass softmax function is adopted as the output function of the discriminator in the proposed CDCGAN. The vector of the output class probability is used to classify the cracks, scratches and tears accurately. (3) To avoid the features learned incompletely, skip-layer connection is adopted in the generator and discriminator. This not only can minimize the loss of features, but also improves the convergence speed.
The rest of this paper is organized as follows. In Section 3, the problem of the traditional DCGAN and corresponding solution are introduced. In Section 4, the proposed conveyor-belt detection-system is described, including system design and the algorithm design of the detection module based on the multi-class CDCGAN. Experimental results and analysis are provided in Section 5. Finally, conclusions are presented in Section 6.

Related Work
With the development of machine vision, non-contact detection has gradually used edge extraction to capture saliency areas as well as other methods to monitor acquired images. For example, Wang et al. [11,12] proposed a tear-detection method based on Haar features instead of traditional geometric features. The weak classifier is trained by the Haar features extracted from the dataset and is promoted to the strong classifier with the AdaBoost algorithm. For the method only trained on the features of the area of the tear, other damage, e.g., cracks and scratches, cannot be detected. Yang et al. [13][14][15] designed a belt longitudinal tear warning method that uses infrared spectroscopy analysis. The maximum target background-radiation contrast is obtained, and the infrared-radiation field matrix is acquired through the infrared-radiation difference. By setting the radiation field as the carrier and acquiring its characteristic coefficient T in the frequency domain, the demodulation of the carrier wave is used to detect the damage. However, the frequency domain is limited to tear detection and lacks generalization ability. Qiao et al. [16][17][18][19][20] proposed a Harris corner point detection algorithm with filtering function that helps to make it possible to eliminate the influence of pseudo-corners in feature recognition. Combined with a Hough transform, the original image is divided into a highlight area and a dent area. From the difference between the two types of areas, the damage is detected. However, among the images detected, the dent area usually includes a single crack, which is not suitable for detecting multiple damage types. Hao et al. [21][22][23] proposed a multi-class support-vector-machine (SVM) detection method based on visual salience that uses a SVM to transform the nonlinear separable samples of the extracted seven-dimensional feature vectors into linear separable samples in a high-dimensional space. It classifies the test samples by using the radial basis function. Although this method can detect three types of damage, i.e., scratches, cracks, and tears, the collected images must be preprocessed by binarization and a grey histogram to obtain the features of the damaged positions, which is time-consuming and costly.
Deep learning has been widely applied in image segmentation [24][25][26] and image detection [27][28][29], owing to its advantage of using a massive data training network to extract object features. Among them, GANs [30,31] extract image features by competition between discriminator and generator. Then, the foreground and background are segmented by feature differences. Usually, traditional convolution neural networks require labelling a large number of images manually [32,33]. Superior to them, a GAN only needs a small amount of data to be labelled, because the model can automatically learn the data distribution from the training samples and generate new sample data. However, the training process of the network usually adopts the gradient-descent method, and the generator model may be trained along a certain feature all the time, resulting in failure to converge and model collapse. DCGANs [34,35] adopted step convolution instead of up-sampling layer, and convolution instead of fully connected layers. Thus, they can learn their own spatial down-sampling to obtain image features. However, Since the noise vector is random, it is not constructed for the specific type of images. When the generator performs up-sampling, the extracted features appear random to some degree, resulting in the deviation in the features. In CGANs, the condition is the target label, which is expected to be matched by the images generated by the generator [36][37][38]. The discriminator not only identifies whether the generated image is true, but also discerns whether the image and condition (c) match. However, the model has a large number of layers, and some features may be lost during forward propagation, which results in the features being acquired incompletely. Ren et al. [39][40][41][42] added a skip-layer connection among the network layers, which can promote function re-use between layers and preserve useful information. Even if some of them are lost in training, the key features can be well retained.

Problem Statement
To detect belt tears effectively, a belt damage detection system must first be designed. Such a system includes image-acquisition, data-transmission, image-detection and system response modules. These modules realize image acquisition, image transmission, image detection, and real-time response, respectively. Among them, the image-detection module is the core part of beltdamage detection. The rationality of its algorithm design is related to the real-time and accuracy requirements of belt-tear detection; Therefore, the design of this part is particularly critical.
When a traditional DCGAN is used for image detection, the input of the generator is a random noise vector. Owing to the fact that DCGANs are not constructed specifically for belt damage, the extracted features appear random during the generator's up-sampling, which causes feature deviation. In response to this problem, a small number of the labelled belt images with damage as conditions are taken herein and added them to the generator and discriminator. Thus, the model can generate the damaged images according to the corresponding conditions. Furthermore, the output of the discriminator mostly uses a binary classification function. If it is adopted, only the torn and non-damaged parts of the belt can be detected, and no warning can be issued regarding the potential danger of scratches. Therefore, to solve this problem, the output of the discriminator is changed to a multi-classification softmax function that detects and classifies three types of damage: scratches, cracks, and tears. In addition, due to the large number of network layers in the generator and discriminator, dimensionality reduction of the convolution layer is essential. During the process, some of the important features are easily lost as they are considered useless. The incomplete features obtained easily affect the accurate detection of belt tearing. Considering this question, a skip-layer connection is used in the generator and discriminator that not only improves convergence speed, but also avoids the loss of features in propagation, thereby improving detection accuracy.

System Design
The detection-system architecture is divided into three parts, namely, image-acquisition, datatransmission and decision subsystems, as shown in Fig. 1. Of these, the image-acquisition subsystem is shown in Fig. 2, and it includes a surface light source and image-acquisition equipment, which is installed at the bottom of the conveyor belt to collect belt-damage images. The surface light source illuminates the belt surface vertically to improve the brightness of the image and a charge-coupled-device (CCD) camera (Mind Vision PMV-GE100M-T, ShenZhen, China) is placed at a suitable angle to collect images with the surface light source. Hundreds of images are collected as samples with the test belt operating at speed. Once the appropriate image is obtained, the system begins to process the image.
The decision-making subsystem is divided into a detection module and a response module. The former uses the algorithm designed as detailed in Section 4.2, which is accelerated by a graphical processing unit (GPU) (NVIDIA), to detect the damage of the images. The latter responds to the results in real time. If a tear occurs, the conveyor stops immediately; If a crack occurs, the system warns but does not stop; If the conveyor is detected as normal or a scratch appears, the system operates normally.

Multi-class CDCGAN
The multi-class CDCGAN is designed in the belt-tear-detection module. The generator model of a traditional network is a deconvolution neural network. Through the input layer and deconvolution layer for up-sampling feature extraction, it transforms random noise into the fake images, which are very close to the real images. To avoid feature deviation, a small number of images with damage labeled are set as condition (c) to help guide data generation. The conditions are added to the DCGAN and it is expanded to the conditional model. In the generator, both condition (c) and the noise are input. Similarly, in the discriminator, condition (c), real data and the images generated by the generator are regarded as input that helps to train the networks purposefully to obtain the characteristics of belt damage precisely.
The goal of the generator is to minimize the difference between the real and generated data. It tries to make the discriminator unable to distinguish them. However, the discriminator tries to maximize the difference. Here, the objective function is set to illustrate a continuously iterating process to obtain the optimal solution by minimizing the generator and maximizing the discriminator.
In this paper, the object function V (D, G) is shown in Eq. (1): where, min G max D V (D, G) represents the optimization process of minimizing the generator and maximizing the discriminator, E (·) the expected value of the distribution function, P data(x) the distribution of real data, p z(z) the distribution of noise data, D (x | c) the discriminator with condition (c), and G (z | c) a generator with a condition and noise.
The conventional discriminator model is a convolution neural network. Its input is the real and fake images generated by the generator. The output layer adopts the sigmoid binary classification function with an output value between [0, 1]. If the output is 1, it indicates that the input image is real data; but, if the output is 0, it means that the input image is a fake image generated by the generator. Owing to the binary classification characteristic, only the torn and non-damaged parts can be detected, while the types of cracks and scratches cannot be identified. In this paper, the softmax function is used as the output function of the discriminator to identify scratches, cracks, and tears. This is called multi-class CDCGAN.
Assuming that the random vector z has a uniform noise distribution P z(z) , the generator model G (z | c) maps it to the data space of the real image. The input x of the discriminator is the real images or the fake image with condition (c), and its distribution is P data (x | c). The output of the fully connected layer in the discriminator is l = {l 1 , l 2 , . . . , l k }, which is a k + 1 dimensional vector. It is converted by the softmax function to the k + 1 dimensional category probability p = p 1 , p 2 , . . . , p k+1 . Using it, the real image will be judged as the first k class and the fake image will be judged as the (k + 1)-th class. The softmax function is shown in Eq. (2): where l i represents the input vector of the fully connected layer, l j the class vector output by the fully connected layer, p j the class probability of the output, and e is the base of the natural logarithm, equal to approximately 2.71828.
In this paper, the cross-entropy function is selected as the loss function of discriminator D (y | x) to determine the closeness between the actual and expected output. The smaller the loss value, the better the model learning. Therefore, it is necessary to optimize the network model by minimizing the loss function. D (x | c) is defined as Eq. (3): where j represents the class, c the expected category and p j the category probability of the output.
One-hot coding is adopted for c and c ; in other words, if the discriminator's output is the j-th class, the corresponding position is coded as 1, while the remaining positions are coded as 0.
When the input is a real image, Eq. (2) can be further expressed by Eq. (4): where c denotes the expected class and p j the probability of the output category.
When the input is a fake image, it can be simplified to Eq. (5): where p k+1 is the category probability of the fake image.
In this paper, the damage-detection method is based on the multi-class CDCGAN, and the type of belt damage is identified by the softmax function. Damage labelled by 1-4 indicates the detection of tears, cracks, scratches, and fake images, respectively.
In addition, skip-layer connections are implemented to enhance feature propagation and enable feature re-use between two convolution layers. Without skip-layer connections, features obtained from previous layers will be gradually lost after a series of convolution layers, and the convergence rate of the model will decrease during the training period.

Algorithm Design and Description
The algorithm design process is the following.
Step 1: Collect belt images with the area light source through the CCD camera, and label some of them with damage type. This dataset contains a small amount of labelled data and a large number of unlabelled data. Fig. 3 shows the images collected in the dim environment of a coal mine. Belt damage is marked as follows: the red box represents tears, the green box represents cracks, and the blue box represents scratches.

Figure 3: Labelled belt damage
Step 2: Build the generator model by taking a 100-dimensional random noise vector z and condition (c) (a small number of images with the damage labelled) as input. The noise is converted to an 8192-dimensional vector through a full connection layer, and then transformed into a 4 × 4 × 512 feature map by reshape function. Through deconvolution layers 1-4 for up-sampling, a 64 × 64 × 3 belt image is finally generated. The generator model's structure is shown in Fig. 4.

Figure 4: Generator model structure
Step 3: Build the discriminator model by taking a 64 × 64 × 3 image generated by the generator and condition c (a small number of damaged images labelled) as input. After using convolution layers 1-4 for down-sampling, the final output is a 4 × 4 × 512 feature map that is reshaped into a 4 × 4 × 512-dimensional vector, and through the fully_connected layer, the probability values of scratches, cracks, tears, and fake images are output by the softmax function, and the types of conveyor-belt damage are judged. The discriminator model structure is shown in Fig. 5. Step 4: Train the network by adding skip-layer connections in the generator and discriminator. This helps the network to learn the characteristics of belt damage and keeps the important information in the network propagation. This can improve the precision of detecting of the cracks, scratches, and tears.
Step 5: Based on the predicted results, the system responds in real time. If a tear occurs, the belt stops immediately. If a crack occurs, the system issues a warning and does not stop. If the belt is detected to be operating normally or a scratch occurs, the system operates normally.
The detection process diagram of belt images is shown in Fig. 6.

Data Acquisition and Pre-processing
As the conveyor reaches a constant speed, a surface light source is added to clarify the collected data clearer. At this time, the CCD camera is used to capture the image of the surface of the conveyor belt, and the captured image is transmitted to the computer through the data transmission line. Accelerated by the GPU, the processing module classifies the damage image, and the control module responds in real time based on the types of damage, including maintaining normal operation or stopping the conveyor immediately.
The image-acquisition process was tested under ideal conditions; that is, without water, dust or any other environmental factors that may affect the test results. A total of 3,200 images were collected and divided into four groups, each containing 800 images. The experimental parameters are the height of the CCD camera and speed of the conveyor belt. The height determines the size of the image, while the speed affects the clarity of the image. Both affect the recognition accuracy. In the first set of experiments, the belt was run at a low speed (1 m/min), the CCD height was set to 0.4 m and the resolution was 900 × 700. In the second set, the belt was still run at a low speed (1 m/min), the CCD height was set to 0.8 m and the resolution was 1800 × 1400. In the third set, the conveyor belt was run at a high speed (2 m/min), the CCD height was set at a low height (0.4 m) and the resolution was 900 × 700. In the fourth and final set, the conveyor was run at high speed (2 m/min), the CCD height was set at a high height (0.8 m) and the resolution was 1800 × 1400. From each group of images, 200 images were selected randomly for labelling. As a result, there were 800 labelled images and 2400 unlabelled images.

Model Training and Results
The experiment was run on the pycharm 2017 software platform. The python library included tensorflow, scipy, and numpy. The hardware was configured with the windows 10 on a i5-9300HQ@2.40 GHz CPU. The GPU was an NVIDIA GeForce GTX 1650. In the experiment, the data were loaded in batches, each of size seven. That is, seven pictures were loaded in one batch during each training cycle. An epoch represents the image data of the entire dataset loaded at once. The epoch size was set as 300 and the sizes of the images collected were uniformly adjusted to 64 × 64 pixels.
In this work, the Adam optimizer was used to optimize the network and skip-layer connection used to accelerate the convergence speed of the CDCGAN. Figs. 7 and 8 show the training curves of the generator and discriminator with and without a skip-layer connection, respectively. The horizontal coordinate represents the epoch number and the vertical coordinate the value of the loss function. The smaller the loss value, the more realistic the image generated by the generator and the better the model fits. In Fig. 7, the loss function in the generator exhibits the same downward trend regardless of whether or not it contains a skip-layer. However, the generator with a skiplayer connection has a loss value of 0.41 while that without has a loss value of 0.71. The loss value of the former is approximately 0.3 and is less than the latter. In Fig. 8, the smaller the loss value, the closer the discriminator's prediction to the real damage. Compared with the discriminator without a skip-layer connection, the loss value for the discriminator with a skip-layer connection decreased to 0.63. It can be seen that the algorithm model proposed in this paper is better than that without a skip-layer connection in belt-tear detection. The experimental evaluation indexes mainly include precision, recall and the mean average precision (mAP) curve, which are used to evaluate the overall performance of the model. Among them, average precision (AP) is the area surrounded by the curve of the accuracy rate changing with the recall rate, and mAP is the average value of the AP of multiple classes. In this paper, the algorithm of the precision and recall rates are shown in Eqs. (6) and (7), respectively: where TP is the number of pixels in the damaged area of the belt that is correctly judged, FP is the number of pixels in the damaged area that is misjudged and FN is the number of pixels in the missing area.
To analysis the generalization ability, the algorithm was compared with the DCGAN and CGAN on the belt-image dataset. Tab. 1 shows the comparison of detection results with different algorithms, where from top to bottom are represented scratches, cracks, tears, scratches + cracks, scratches + tears, tears + cracks and scratches + cracks + tears. Tab. 2 shows the comparison of precision and recall with different algorithm models. Fig. 9 shows the mAP curve comparison with different algorithm models.
It can be seen from Tab. 1 that, compared with the algorithm proposed in this paper, the DCGAN and CGAN have the same effect for a single crack. However, when detecting multiple types of damage, the effect of the algorithm proposed in this paper performs relatively better. As the discriminators of the DCGAN and CGAN are both binary classification models, the effect is obviously poor at detecting multiple types of damage.
It can be seen from Tab. 2 that the precision and recall rates of the model proposed in this paper are higher than those of the DCGAN and CGAN.   As can be seen from Fig. 9, in which a skip-layer connection was adopted in the proposed algorithm, the convergence speed is faster relatively. Owing to the fact that the discriminators of the DCGAN and CGAN are both binary classification models, lacking multi-class detection, the mAP of the DCGAN is 88.3% while that of the CGAN is 90.1%. In contrast, the mAP of the proposed algorithm is up to 96.2%, which is at least 6% higher than that of the others.

Conclusions
A reliable and fast tear-detection method for mining conveyor belts is presented in this paper. The model can obtain the corresponding damaged image by adding conditions to the generator and discriminator. The use of a skip-layer connection can not only improve the convergence speed, but also avoid the loss of features during the propagation process, and the output of the discriminator is a multi-class softmax function, which can detect and classify damage very well. Experimental results show that compared with other methods, the method advanced herein is suitable for detecting multiple types of damage in an image with both high accuracy and reliability.