Compressed dual-channel neural network with application to image-based smoke detection

Effective detection of smoke from visual scenes can play a vital role not only in industrial safety as an early warning system but also in forest ﬁre prevention. However, it is difﬁcult to detect smoke based on texture and color. Therefore, many researches have been conducted on this issue and derived detection methods based on convolutional neural networks (such as DNCNN and DCNN etc.). However, in the process of convolution, with the super-position of convolutions times, the parameters of the network increase gradually and thus cause a large computational burden, which brings about the problem of unsatisfactory operating efﬁciency. Thus, this paper mainly introduces the depthwise separable convolution into the state-of-the-art DCNN developed speciﬁcally for smoke detection, dubbed as the improved DCNN (IDCNN). Compared with standard convolution, by introducing the depthwise separable convolution, the convolution parameters and the corresponding calculation amount in the process of convolution can be greatly reduced, so that the network can deal with more data in a shorter time which improves operating efﬁciency. Experimental results demonstrate the effectiveness of IDCNN as compared with the state-of-the-art deep networks for smoke detection based on standard convolution in terms of parameter quantity and running speed.


INTRODUCTION
As is well known, fire is one of the main hazards that endanger the safety of personal and property in the world. Figure 1 shows the scene of a forest fire. In order to prevent and avoid the damage caused by fire, some detectors based on temperature and smoke concentration are applied in the scenario. However, in the actual application process, when these detectors are close to fire source, they are prone to break down or damage under harsh environmental conditions. Recently, due to the continuous improvement of image processing technology, imagebased smoke detection technology has emerged to provide fire alarms. Smoke is a sign of fire as well, as smoke detection can present fire information earlier [1,2]. In addition, smoke This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2021 The Authors. IET Image Processing published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology detection also has some important applications in the system of industrial automation. Hence, image-based smoke detection methods have been widely studied and applied in recent years. In [3], Yu et al. put forward a new video smoke detection method using both color and motion features. Yuan proposed an accumulative motion model based on integral images by estimating the motion orientation of smoke swiftly [4]. In [5], Yuan combined the histograms of the local binary pattern (LBP) and local binary pattern variance (LBPV) to design a video smoke detection method, which was based on the pyramid histogram sequence. In [6], Wang et al. raised a fire smoke detection algorithm based on optical flow methods and texture features, which can be adopted for warning of fire. Above approaches of smoke FIGURE 1 The scene of a forest fire producing smoke detection are most designed based on images and videos. However, with the development of technology, deep learning has become increasingly popular and many achievements have been achieved in image recognition and classification. Some typical deep neural networks such as Res-Net [7], Alex-Net [8], ZF-Net [9], GoogLe-Net [10], Dense-Net [11] and VGG-Net [12] have appeared. Therefore, methods based on convolutional neural networks are applied to smoke detection gradually. In [13], Gu et al. put forward a deep dual-channel neural network (DCNN), which can extract the detailed features and basic information of smoke and dust separately and equip with excellent performance in smoke and dust detection. In [14], Yin et al. devised a deep normalized convolutional neural network (DNCNN) for detecting smoke from images. DNCNN embeds batch normalization (BN) into the convolutional layer on sequential convolutional neural networks and employs data augmentation techniques to solve the problem of imbalance and shortage of positive and negative samples in training samples. Gu et al. proposed that FSDR-Net [15] is founded by merging meta-learning and selective ensemble systematically; however, the generalization capacity and recognition accuracy of soot detection remains to be broken through. In [16], Gu et al. aroused that VMFS can search for flare smoke timely and ensure adequate flare gas combustion, which is faster when in implementation, and equips with optimal monitoring performance. Nevertheless, today, VMFS is weak in color segmentation, and it has a long way to go in the application of vision-guided control system for flare soot suppression.
Although it is an impressive progress that the smoke detection method based-on convolutional neural networks has made, there remains much room to grow. At present, standard convolution is widely adopted in smoke detection method based on deep learning. The standard convolution is to obtain multiple output mapping layers through convolving the input layer with different filters. Then we stack these mapping layers together to form a single layer. Through the operation, the width and height of the image will decrease, and the depth of the image will increase. Nevertheless, it brings plenty of parameters and creates computational burden, so the smoke detection method implemented with traditional convolution network may not be effective when the data scale is too large.
In terms of above problems, this paper introduces a depthwise separable convolution [17,18] based on the smoke detection convolutional network DCNN, which provides a better solution to detect smoke more effectively. In depthwise separable convolution, we first convolute each channel of the input layer with different filters, and then these channels are convolved with smaller convolution kernels. Finally,we can coincide with the effect of standard convolution through stacking these mapping layers. Since smaller convolution kernels are employed in the depthwise separable convolution, the number of convolutional network parameters and operations are reduced compared with the standard convolution [19]. Specifically, in our method, we replace the part of the standard convolutional layers in the DCNN network structure with the depthwise separable convolutional layers. However, depthwise separable convolution will cut down lots of parameters; thus, for small-scale models, the performance may significantly degenerate if depthwise separable convolution substitutes to standard convolution. Moreover, we make the number of parameters and running time as indicators to compare with the IDCNN network with some typical convolutional networks and DCNN. The results indicate that the IDCNN network is superior to other networks.
The highlights of this paper can be summarized as follows: First, on the basis of the DCNN network structure, some standard convolutional layers are replaced by the depthwise separable convolutional layers, and then the two modified depth subnetworks are merged to form a new network structure called IDCNN, which improves the feasibility and effectiveness of smoke detection. Second, the depthwise separable convolution in IDCNN integrates the convolution into depthwise convolution and pointwise convolution, and then verifies the convolution in each channel of the feature map with different convolution kernels, which reduces the number of model parameters sharply and improves the speed and efficiency of network training. Finally, we combine the information between channels, and the detection performance is superior to state-of-the-art approaches.
The rest of the article is outlined as follows: Section 2 first illustrates the structure of the DCNN network, then introduces the general process of depthwise separable convolution and its advantages, and eventually we explain the details of the improvement of IDCNN network compared with the DCNN network. The effectiveness of the IDCNN network is confirmed through experimental data in the third part. Besides, in the fourth section, we discussed the possible effects of the introduced depthwise separable convolutions and the idea of combining smoke detection networks with other convolutions networks. At last, we summarize the whole article in Section 5.

PROPOSED IDCNN NETWORK
We can realize smoke detection by introducing depthwise separable convolution into the DCNN network, and achieve the effect of network compression. In this part, we first illustrate The structure of SBNN, where "Fc", "Nac", "Maxp", and "Conv", respectively, represent full connection, normalization and convolution, max-pooling, and convolution operations The structure of SCNN, where "GAP", "Nac", "Maxp", and "Conv", respectively, represent global average pooling, normalization and convolution, max-pooling, and convolution operations the structure of the DCNN network. Then, there will introduce the process of depthwise separable convolution and compare it with the standard convolution. Finally, we designed a improved depthwise separable convolution based-on DCNN, which is called IDCNN.

DCNN
The DCNN is composed of two deep sub-networks SBNN and SCNN. Therefore, we will demonstrate the structure of SBNN and SCNN briefly as well as present the details construction of DCNN. The architecture of SBNN is based on CNN, and Figure 2 tells the specific details. In SBNN, six convolution layers and three maximum pooling layers are connected in sequence for feature extraction. The ReLu function is utilized as the activation function of the network [20]. Moreover, SBNN adds BN layers [21] to the last four convolution layers and introduces learnable reconstruction parameters in normalization. Finally, SBNN appends three fully connected layers behind the maximum merge layer A9 to extract features.
The SCNN introduces skip connections [22] and global average pooling [23] with the foundation of SBNN. In the entire structure of SCNN, SCNN will connect 11 convolution layers, 7 BN layers and 2 maximum pooling layers to build a net-work in sequence for feature extraction, as shown in Figure 3. To provide better feature protection, like the SBNN, the first two convolution layers are not added with the BN layers. In addition, the sixth and eleventh convolution layers also do not attach the BN layer. Compared with SBNN, the obvious change in the structure of SCNN is that the first feature map is connected to the fifth feature map through transitions, and the two are merged together by cascading operation. Another distinct change in SCNN is that the fully connected layer in SBNN is replaced by the global average pooling.
The SBNN is able to extract detailed information of smoke image well, while SCNN has excellent performance in obtaining the basic information of smoke image. Therefore, the DCNN network employed to conduct smoke detection is composed of the both. Figure 4 shows the whole structure of DCNN, where SBNN removes the last three fully connected layers to extract SBNN0; SCNN deletes the global average pooling layer to extract SCNN0. However, the output size of SBNN0 does not match the output size of SCNN0, so we further revise SBNN0 by deleting A9 and concatenating SBNN0 and SCNN0. The current dual-channel network structure can be used for feature extraction and feature fusion, but some layers are required to be classified in DCNN. Consequently, the global average pooling is appended to the last convolution to calculate two averages. At last, the entire structure of DCNN is described as above.

Comparison of depthwise separable convolution with standard convolution
In the standard convolution, the convolution kernel requires to study channel correlation and spatial correlation simultaneously, which leads to lots of parameters and calculations in the training process. Nevertheless, the correlation and spatial correlation between convolution layer channels can be decoupled, so depthwise separable convolution maps them separately. In the depthwise separable convolution, every channel of the feature map is first mapped to a new space as well as the correlation between the channels is learned in this process; then the convolution is performed through a conventional convolution kernel.
At the beginning, let us analyze the process of standard convolution. In traditional convolution algorithm, when the input N × H × W feature map is convolved with k convolution kernels of size C × C × W , if the padding of the convolution = [C ∕2] and the stride = 1, we will obtain a feature image of size N × H × k. In this process, the parameter amount is W × C × C × k and the computation cost is N × H × C × C × W × k. The specific convolution process is shown in Figure 5. Depthwise separable convolution proposes a new idea: For different input channels, different convolution kernels are used for convolution. It decomposes the convolution into two processes: depthwise convolution and pointwise convolution [24]. In depthwise convolution, the same input channel class is convolved with a size of C × C , which corresponds to assembling the spatial characteristics of each channel, and Figure 6 tells us the process of convolution. In this process, the parameter amount of depthwise convolution is W × C × C and the computation cost is N × H × C × C × W . In pointwise convolution, C convolution kernels with a size of 1 × 1 × W are used to fuse the information between channels, which corresponds to assembling the characteristics of each point. The convolution process is shown in Figure 7. In this process, the parameter amount of the pointwise convolution is W × 1 × 1 × K and the computation cost is What follows is that we will compare the parameters and computation cost of standard convolution and depthwise separable convolution. Through separating depthwise convolution and pointwise convolution, the depthwise separable convolution is equivalent to compressing the parameters of ordinary The computation cost is compressed as: In general, k is much larger than C, so there P 1 ≈ 1∕C 2 , P 2 ≈ 1∕C 2 . If we perform a 5 × 5 convolution, the parameters and computation cost will become 1/25 of the ordinary convolution. It shows that the depthwise separable convolution can not only cut down the number of parameters but also greatly improve the running efficiency.

IDCNN's network structure
In the smoke detection method based-on CNN, with the increase of convolution depth, there will inevitably be problems in the network that the amount of parameters and computation cost become too large, resulting in worse real-time results. In the previous section, we compared and analyzed the process of standard convolution and depthwise separable convolution; it is found that the former is obviously better than the latter in terms of operating efficiency. Therefore, in order to achieve the effect of better real-time and high efficiency in smoke detection, we introduce depthwise separable convolution based on the DCNN smoke detection network and reconstruct it.
In the IDCNN network, if the current position is the first convolution layer or the number of convolution kernels of the current position has increased compared to the previous layer, we will maintain the convolution as the standard convolution. Otherwise, we adopt the depth separable convolution to take place of the current standard convolution. To facilitate readers to understand which convolution layers have been replaced, we list the number and size of convolution kernels of each layer in the original SBNN and SCNN, as shown in Tables 1 and 2. The reason we choose to replace these convolutions is that  Through the above replacement, the original convolution is converted into two ones, and each filter of depthwise convolution is convolved with only one channel of the feature image, which can be written as: where K is a sized C × C × W kernel, and the s th filter in K is applied to the s th channel in F to generate the s th channel of the filtered output feature map P. After the depthwise convolution is completed, the latter convolution is used for combining, that is, the results of the previous layer of convolution are fused through the convolution kernel.

EXPERIMENTAL RESULTS
Aiming at solving the problem of plenty of parameters in the smoke detection method based-on convolution neural network and the low efficiency of network operation, this research introduces depthwise separable convolution on the basis of DCNN.
During network training, we train ISBNN and ISCNN separately. In the ISBNN training process, the glorot uniform method is introduced to initialize the network weights [25] and the trial-and-error method is applied to search for the optimized network structure, and then we utilize the stochastic gradient descent method [26] to train ISBNN by setting the momentum  FIGURE 8 The structure of ISBNN, the red layer represents the layer that introduces depthwise separable convolution

FIGURE 9
The structure of ISCNN, the red layer represents the layer that introduces depthwise separable convolution coefficient to 0.9, the rate attenuation coefficient to 0.0001 and the initial learning rate to 0.01. Similarly, we perform the same procedure to train ISCNN and then merge the two networks. Finally, we tune-up the general parameters of IDCNN to find the best parameters. During the experiment, we compared the performance of IDCNN with DCNN and other networks.

Testing methods and indicators
To verify the validity of the method proposed in this paper, we first deployed a publicly available smoke detection database consisting of four subsets of Set1, Set-2, Set-3 and Set-4 [27]. The datasets are used to train the network; there are about 2200 smoke image blocks and about 8500 smoke-free image blocks. In addition, we can increase the number of smoke image blocks to an approximate number of smokeless image blocks through a certain angle of rotation. In our experiments, the experimental environment was the server with an Intel (R) Core (TM) i7-8550U CPU at 1.80GHz (8 CPUs) and an NVIDIA GeForce MX130 under Windows 10. In order to better illustrate the performance of the IDCNN network and other networks, we applied three common evaluation indexes, which includes accuracy rate (AR), detection rate (DR) and false alarm rate (FAR).
Here, are their definitions: Among them, S and Y, respectively, represent the number of positive samples and negative samples; P, F and N are the number of true positive samples correctly detected, the number of negative samples falsely divided into positive samples, and the number of true negative samples correctly detected. In our expectation, the AR and DR of a good model should obtain high values while its FAR should obtain low value.

Performance comparison
Firstly, we illustrate the advantages of our method in terms of computational complexity including time complexity and spatial complexity. To be specific, we compared IDCNN network with some excellent neural networks.  least time and IDCNN is significantly superior to other networks. On the other hand, the spatial complexity determines the number of parameters of the model. Due to the limitation of the curse of dimension, the more the parameters of the model are, the larger the amount of data required by the training model will be. We compared the number of parameters used in the network; when the amount of parameters in a network is too large, performance may be reduced due to excessive calculation times or overfitting phenomenon. Therefore, we expect a network with fewer parameters which has a stronger generalization ability. The number of IDCNN parameters proposed by us is the smallest in all networks we compared with, which is less than one-tenth of the DCNN. In the meanwhile, the performance of IDCNN is compared with other networks. The results are presented in Table 4. Secondly, we examine the detection performance of the proposed IDCNN, and compare it with eight popular and state-of-the-art neural networks, which include VGG-Net, Res-Net, Alex-Net, ZF-Net, DNCNN, Google-Net, Dense-Net and DCNN. The results are presented in Table 4. As we can see, there are similarities in performance between DCNN and IDCNN; the former is better-behaved than the latter in the indicators of AR, DR and FAR; however, the former's parameters are four times larger than the latter, and the training time of DCNN is 0.143 milliseconds longer than IDCNN. Moreover, it is more vital for neural network to reduce the number of parameters and speed up the running time. In addition, Dense-Net and IDCNN outperform each other in set-1 and set-2, respectively, and the performance of IDCNN is also sharply better than the remaining several networks. For example, considering the AR indicator, the IDCNN network has improved 0.9% and 1.3% on set-1 and set-2 compared with Res-Net; when compared to Google-Net, IDCNN has improved 1.5% and 0.9% on the AR indicator. To sum up, IDCNN outperforms among the nine state-of-the-art networks listed in the Table 4.

DISCUSSION
In recent years, smoke detection methods based on convolution neural networks have become increasingly popular, so humans have devoted themselves to studying it constantly, and DCNN network is one of them. This paper introduces depthwise separable convolution on the basis of the DCNN network structure as well as improves the performance of the network. The depthwise separable convolution reduces the number of parameters in the convolution; therefore, in term of small-scale networks, if the depthwise separable convolution takes place of the standard convolution, the performance of network may be significantly degenerated, resulting in suboptimal network. However, if applied properly, depthwise separable convolutions can help to improve efficiency without decreasing the performance of the network. In addition, we can also take the combination of smoke detection network and packet convolution, spatially separable convolution etc., into consideration, which may contribute to improving the efficiency of the network.

CONCLUSION
This paper has introduced the depthwise separable convolution on the basis of deep dual-channel neural network (DCNN), which devotes to improving the problem of too many smoke monitoring network parameters and poor operating efficiency. In order to solve this problem, based-on the DCNN network structure, we replaced some standard convolution layers with depthwise separable convolution layers, and then merged two modified deep subnetworks to form a new network structure IDCNN. Besides, we train the network through the features of the smoke images. The depthwise separable convolution in IDCNN integrates the convolution into depthwise one and pointwise one, and then we convolve each channel of the feature map with different convolution kernels. At last, we merge the information between the channels. In order to confirm the effectiveness of our proposed network, we conducted the experiment on a smoke detection image database which is open and available. By comparing our network with the DCNN network and other popular neural networks, it is found that the network parameters in the experiment are indeed less than other networks as well as the running speed has been greatly improved. However, when the size of network model is too small, it is not suitable to introduce depthwise separable networks, owing to its difficulty to extract enough features to fully reflect the good performance of the network. In the future, we will devote ourselves to doing further studies so as to overcome the tricky problem