Research on Application of the Feature Transfer Method Based on Fast R-CNN in Smoke Image Recognition

,


Introduction
In the process of rapid economic development in today's cities, people's lives and property as well as the normal operation of enterprises are often threatened by fire. Fire has the characteristics of suddenness and great harm [1,2], and there is a clearly realistic demand for fire detection and early warning. At present, there are three fire detection and alarm methods: sensor based, image processing, and depth learning. e causes and places of fire are diverse, which hinder the fire early warning and fighting. e detection method of traditional sensors is to use a variety of sensors to detect the smoke, flame, heat, and other signals generated during fire. e purpose of distinguishing fire is achieved by analyzing different phenomena, such as the temperature sensor for detecting heat change and the smoke sensor for detecting smoke concentration in the air. ese signals are processed by sensing the changes of different parameters, to detect whether there is a fire. At present, the main types of fire detectors are heat-sensing detectors, light-sensing detectors, and smoke-sensing detectors. ese traditional detectors are cheap and accurate, but they generally have some defects that are difficult to solve. For example, due to the relatively long time required for the occurrence of smog propagation and temperature rise, the response delay of traditional sensors will inevitably occur. In addition, the sensor is usually installed close to the fire point and exposed to a large amount of dust for a long time, which makes the traditional sensors vulnerable to failure. Moreover, they are especially unsuitable for fire detection in places with high fire hazards, such as tall space or outdoor scenes. e development of more effective and reliable detection methods has always been the direction of fire control efforts. Effective identification of smoke in the early stage of fire has important theoretical significance and application value [3]. e traditional detection method based on image processing is to extract the dynamic and static features of flame or smoke from video by image processing technology and then determine whether there is a fire by recognition algorithm. To make effective use of the existing video monitoring hardware resources, the research on video-based fire detection technology has important theoretical and application value. e fire prevention method based on video monitoring is a noncontact fire detection way based on machine vision, which was especially suitable for solving fire detection problems in large space, outdoor, and other places [4].
is kind of method not only has strong anti-interference ability and fast response speed but also has the advantages of wide application range and low cost. It has become an important interdisciplinary research field in fire detection methods. e traditional smoke identification methods usually use physical signals for monitoring. For example, Yamada [5] proposed a smoke sensor based on a layer-by-layer selfassembled electrolyte membrane for smoke perception recognition. Keller et al. [6] proposed the use of photoacoustic sensors for flame smoke monitoring. Cheon [7] proposed the use of temperature sensors and smoke sensors for flame smoke identification. However, this method has a strong dependence on the environment. If the surrounding environment changes, the recognition accuracy will decrease sharply or even fail. e detection method based on deep learning is to train the fire detection model using the marked fire image and then input the image to be tested into the model for recognition.
e achievements of the image processing and pattern recognition method in recent years provide a new solution for smoke recognition. Yu et al. [8] used the video smoke monitoring method on the basis of optical flow and determined the moving pixels and areas in the video through the background estimation method.
e Lucas-Kanade method is used to extract optical flow characteristics. Abadi et al. [9] proposed the smoke recognition using the machine vision. In this method, the Gaussian mixture model is used to extract the preselected smoke region, and the dynamic and static characteristics of smoke are obtained. Finally, support vector machine was used to train and predict the model. Compared with the existing smoke recognition technology based on physical signal, this method effectively reduces the cost and improves the accuracy and stability of recognition.
To further improve the effectiveness and accuracy of the smoke recognition method, we propose a smoke recognition method based on fast R-CNN. at is, by reducing space complexity and time complexity, the neural network training process does not need to be graded, and the efficiency and accuracy in the detection process are improved.

Related Work
e ever-changing shapes and colors of smoke and the difficulty in controlling the movement rule bring great challenges to the video smoke detection. Many researchers have made full use of the various properties of smoke, such as turbulence and fluttering, and devoted themselves to analyzing the essential characteristics of smoke. e movement pattern of the target provides important information for smoke detection. Guillemant et al. [10] used grayscale embedding and other methods to generate the linked table, then extracted the moving characteristics of the target based on the table to detect whether there is smoke, and proposed a fire monitoring method. Kopilovic et al. [11] extracted the distribution entropy in the direction of moving optical flow to explore the motion characteristics of smoke, so as to realize smoke detection. e fuzziness and fluctuation of smoke with time were studied by wavelet transform. Tung et al. [12] studied a four-level video tracking and smoke monitoring algorithm. Firstly, the approximate median method is used to detect the moving area, and then, the fuzzy c-means method is used to realize the clustering analysis of the moving area to obtain the area where smoke may appear, and then, the spatiotemporal characteristics of the area are extracted. Finally, the support vector machine (SVM) is used to analyze and judge and get the results. Yuan Feiniu et al. [13] explored a rapid detection model of translucent shielding with high-pass filtering.
rough the study of smoke motion law, a fast smoke detection method integrating image and color saturation monitoring was put forward. On this basis, the elimination technology of isolated noise and chaotic motion interference is further studied.
How to extract the color and texture features of smoke in the process of smoke detection is very important. In the RGB model, the smoke gray values of the three channels are very close, mainly distributed between 80 and 220. Krstini et al. [14] compared the RGB, YCbCr, CIELab, and HSI color model and put forward the characteristics which reflect the smoke of the HSI model. Gubbi et al. [15] studied a smoke detection method based on wavelet transform and support vector machine. Geometric mean, inclination, arithmetic mean, kurtosis, and entropy are extracted from the obtained subimages to describe the variation characteristics of smoke. e smoke detection method based on fused images by means of image separation was proposed. e method requires the fusion image of smoke and background to be calculated, and the smoke opacity is solved by the optimization method. Yuan [16] studied a smoke monitoring method based on pyramid multiscale feature fusion. It adopted the method of regular partition of detection window to reduce the shape dependence generated by the AdaBoost method, so as to propose a robust video smoke feature.
According to different applications, the existing smoke monitoring methods based on video images are mainly divided into two categories. One type of smoke detection algorithm is combined with flame detection, which focuses more on the framework suitable for both flame detection and smoke detection. e other type focuses on smoke detection and puts forward more revealing methods to improve detection accuracy and reduce false positive rate. No matter which of the two algorithms is used, the basic detection framework can be summarized, which is divided into video image preprocessing, extraction of suspected smoke area, description of smoke characteristics, and smoke recognition.

Algorithm Implementation
Deep learning theory has also gained extensive attention in the research field of fire recognition. Convolution neural network (CNN) extracts image features, uses multiple convolution cores to realize the construction of low to high and local to global features, fuses feature information in the final full connection layer, and recognizes fire images with the help of Softmax algorithm [17]. Compared with the traditional flame recognition algorithm of artificially selecting image features, this method can obtain more diverse and comprehensive features, improve the accuracy of fire recognition, and reduce the false alarm rate of the algorithm model [17,18].
It is known from the existing research that smoke identification provides an important basis for fire early warning. Traditional machine learning and deep learning methods require more data and cannot be directly applied to smoke recognition. e delay data of smoke recognition based on fixed scene is relatively single and the model generalization ability is weak when the environment such as smoke scene is changed. erefore, in this paper, the feature extraction layer in the pretrained VGG-16 model on the ImageNet dataset, which was also image data, is migrated to the classification task of the target dataset. e feature extraction capability of the model is migrated, so as to expand the application scope of the smoke recognition method. e feature extraction capability is contains edge feature extraction capability, texture feature extraction capability, shape, and other high-level abstract feature extraction capability.
Although there is a certain difference between ImageNet data and target smoke recognition data, there are some invariable universal features at the feature level. High-level abstract features are edges, textures, and shapes. ese features are common to both ImageNet datasets and target smoke datasets. erefore, feature transfer based on isomorphism space can be carried out.

e Model Flowchart of Transfer Learning.
e definition of transfer learning is very broad. ere are many aliases of transfer learning technology in relevant research, such as learning to learn, life-long learning, multitask learning, knowledge transfer, and metalearning. Multitask learning technology is most closely related to transfer learning. is technology attempts to train multiple unrelated tasks at the same time, so as to find the similarities between tasks and guide the learning of a single task according to the same characteristics. Since 2005, transfer learning has given a new connotation transfer learning to learn knowledge from one or more source tasks and then use knowledge to guide the learning of target tasks [19]. is definition further defines the purpose of transfer learning. Different from multitask learning, transfer learning only trains the source task and learns knowledge from it, rather than training the source task and target task at the same time. e following figure shows the difference between traditional machine learning and transfer learning. For different tasks, traditional machine learning needs separate training models, while transfer learning only needs training source task models, as shown in Figure 1.
Deep learning has requirements for the amount of data. In the field of fire alarm, the source of image annotation data is very limited, resulting in easy overfitting of the model and reducing the recognition effect, while transfer learning can effectively avoid the problem of overfitting and improve the recognition accuracy when there are few datasets [20,21]. Traditional machine learning needs a large amount of labeled training data. Without these labeled data, the trained model will perform poorly, and labeling a large amount of data requires a lot of time and manpower. e transfer learning does not need too much annotation data. It can use the knowledge or model learned in the source domain to be applied to the target domain to complete specific tasks. e construction of a new discrimination model using transfer learning mainly includes the following aspects. Firstly, we cut the original image, enhance the data, and normalize it as the initial dataset. Secondly, divide the dataset into training set, verification set, and test set, then load the pretrained model on the ImageNet dataset, reset the full connection layer, and then put the training set in the pretrained model for training, respectively. In the training process, the model is finetuned and a new model is trained. en, the model is evaluated and tested through the verification set and test set, and the optimal model with migration ability is selected for smoke image detection at the fire scene. Figure 2 indicates the flowchart of transfer learning, which mainly includes four parts. First, the data are preprocessed. In the preprocessing stage, all image data are resized according to corresponding categories (unified into three channels). e size of the image is 3 * 150 * 150 after random transformation such as random rotation, cutting, flipping, and normalization. Secondly, a network based on deep transfer learning is constructed. In this process, a fully connected network was pretrained using VGG-16 network within smoke dataset. e specific procession is to take the smoke dataset as the input to obtain the convolution part output in the VGG-16 network trained in ImageNet. is output is then used to train a fully connected network. en, the reserved parameters of convolution part in the VGG network trained in ImageNet are transferred. e fully connected network of the previous pretraining is connected to obtain the learning model based on deep migration. en, the model is trained and the parameters are adjusted, and finally, the model is predicted.

e Core Process of Fast R-CNN.
In the new fast R-CNN structure, feature extraction, candidate box extraction, bounding box regression, and classification are all integrated into one network, which improves the model performance, especially the detection speed to a great extent. Fast R-CNN is mainly composed of Conv layers, Region Proposal Networks (RPN), Roi Pooling, and Classification [21]. e input objects of Fast R-CNN mainly include image samples and calibration parameters [22]. By calculating the coverage of the calibration box and object proposals for each sample image, a set of regions of interest (RoI) relative to each sample image can be known. Table 1 lists the specific composition of ROI for each image. Convolution neural network mainly uses partial convolution layer and maximum pool layer to obtain the convolution attribute of samples, which is shown in Figure 2. en, the ROI pool Advances in Multimedia layer obtains the normalized feature vector through the convolution features in the region of interest.
When these eigenvectors are processed in fully connected layers, the results can be shared and enter different layers, respectively. For softmax regression calculation, the probability estimation value of class objects is calculated by one layer, and the other layer completes the data output to obtain the detection frame coordinate value of class objects on the image.

RoI Pool Layer
Calculation. ROI pool layer mainly converts the ROI characteristic matrix into usable normalized characteristic matrix through the largest pool layer in fixed spatial amplitude H × W. e RoI coordinates are represented by quaternions (r, c, h, w) representing the corner (r, c) and (h, w) of RoI, respectively. e RoI pool layer segments the RoI window of h × w with the subwindow of H × W size and gets the subwindow of h/H × w/W approximately. e corresponding grid output is obtained by calculating each subwindow by using pooling. With this layer, the input images that will no longer constrain the training process must be consistent in specifications. e RoI pool layer integrates and manages RoI characteristic matrices of different sizes. Figure 3, two output layers are finally obtained, and the classification results and the coordinate values of the detection box can be calculated, respectively. e first part of the output layer is to solve the probability P of each RoI in class K, which is mainly calculated by the softmax regression method. e other part of the output layer calculates the coordinate value of the K-type detection frame. Finally, the multitask loss function is used for regression calculation of each calibrated RoI type and detection frame coordinate value as follows:

Proposed Algorithm. As shown in
where [u ≥ 1] means that when u is greater than or equal to 1, the value of the brackets is 1, and the other values are 0. L loc indicates the logarithmic loss value of a certain type of probability. e other L loc is detection frame coordinate loss. ey are obtained according to the actual detection frame result v � (v x , v y , v w , v h ) of class u and the predicted coordinate value t u � (t u x , t u y , t u w , t u h ) of class u, which can be expressed as follows: In the training process, a random descent small batch is defined, which consists of N sample images as well as R RoIs. In the paper, we assume that R � 128 and N � 2. ere are 2 images per batch and 64 RoIs per image.

Advances in Multimedia
When passing through the RoI pool layer during the process of reverse propagation, the calculation of reverse propagation is carried out by the following formula: where [·] indicates that the residual is propagated backwards in the RoI layer. It is necessary to judge whether the residual node i is connected to the maximum value of the input value of the RoI. If it is true, it accumulates the residual and the bracket value of is 1. Otherwise, the bracket value is 0.

Experiment and Analysis
In the paper, the common dataset and network acquisition from flame and flue gas are used as the dataset for the experiment. e experimental data mainly include the training set for model construction and the test set for model testing. e specific experimental data are listed in Table 2, while Figure 4 shows the experimental data example.
From Table 2 and Figure 4, it can be seen that the dataset used in this paper belongs to the small dataset. e ImageNet classification dataset contains 1000 categories of image data. e large amount and variety of data provide strong support for the model construction based on deep migration learning. e methods of Tensorflow [9] and Keras [23] are used to train the fast R-CNN proposed in the paper. For comparison, we use Tensorflow and Keras to realize deep CNN. Training data are mainly completed on Intel Xeon computer and NVIDIA gtx1060 GPU. e evaluation methods of detection rate (DR), false alarm rate (FAR), and accuracy rate (AR) are used to compare the proposed algorithm with others, which can be expressed as follows: where Q p denotes the positive samples, P p indicates true positive samples detected correctly, and Q n expresses negative samples, whereas N p is negative samples misclassified as positive. When the recognize algorithm performs good, it will get larger AR and DR and smaller FAR. e proposed algorithm is compared with several traditional smoke detection methods, such as HLTPMC [24] and MCLBP [25]. At the same time, it is also compared with some typical deep CNNs, including AlexNet [26] and ZF-Net [27]. e comparison results obtained through the experiment are shown in Table 3.
Compared to traditional algorithms and several classical deep CNNs, the algorithm based on deep learning (fast R-CNN) is improved in detection rate and false positive rate. According to the comparison results of various indicators in Table 3, under the same test set, the algorithm in this paper is improved compared with HLTPMC, MCLBP, AlexNet, and ZF-Net. As a typical fire detection algorithm based on candidate areas, fast R-CNN will consume a lot of time when recommending candidate areas by RPN (region proposal networks). Although the overall performance of the algorithm is good, it takes a long time to detect a fire image. Although the detection time of Fast R-CNN is much shorter than that of other algorithms, it is still insufficient in the realtime requirements of fire detection.  Advances in Multimedia e experimental results show that this algorithm not only has good detection results in the images of large fire areas but also can effectively detect small fire areas. When detecting fire-like objects, it can effectively distinguish street lamps, firefighters, and sanitation workers. e missed detection rate and false detection rate are low, and it has a good performance in fire detection.

Conclusion
In view of the shortcomings of existing methods in the application of fire detection, for improving the low smoke recognition rate, we proposed a Fast R-CNN smoke detection method. In this paper, a deep convolution neural network is used to extract the convolution features of visual task example images. Fast R-CNN normalization and parallel regression methods are used for calculation. Finally, the visual task-related smoke detection model was obtained. e experimental results show that the fast R-CNN smoke detection method proposed in this paper can effectively improve the detection rate and reduce the false alarm rate. Compared with similar fire detection algorithms, the improved method in this paper has better robustness to fire detection and has good performance in both accuracy and speed. In the following work, the network will be optimized and improved to further improve the detection effect and algorithm speed. At the same time, the existing fire dataset will be expanded to increase the diversity of samples in the dataset and improve the sample quality of the training set [18][19][20].

Data Availability
e labeled dataset used to support the findings of this study is available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.