A new online detection method of tobacco impurities for tobacco robot

In the tobacco industry, impurity detection is an important prerequisite for ensuring the quality of tobacco. However, in the actual production process, the complex background environment and the variability of impurity shapes can affect the accuracy of impurity detection by tobacco robots, which leads to a decrease in product quality and an increase in health risks. To address this problem, we propose a new online detection method of tobacco impurities for tobacco robot. Firstly, a BCFormer attention mechanism module is designed to effectively mitigate the interference of irrelevant information in the image and improve the network's ability to identify regions of interest. Secondly, a Dual Feature Aggregation (DFA) module is designed and added to Neck to improve the accuracy of tobacco impurities detection by augmenting the fused feature maps with deep semantic and surface location data. Finally, to address the problem that the traditional loss function cannot accurately reflect the distance between two bounding boxes, this paper proposes an optimized loss function to more accurately assess the quality of the bounding boxes. To evaluate the effectiveness of the algorithm, this paper creates a dataset specifically designed to detect tobacco impurities. Experimental results show that the algorithm performs well in identifying tobacco impurities. Our algorithm improved the mAP value by about 3.01% compared to the traditional YOLOX method. The real-time processing efficiency of the model is as high as 41 frames per second, which makes it ideal for automated inspection of tobacco production lines and effectively solves the problem of tobacco impurity detection.


Introduction
Tobacco is a critical cash crop, and the quality of its products directly impacts consumer health and satisfaction, as well as the tobacco industry's development and competitiveness.The presence of impurities such as metals, plastics, paper and plant materials can seriously affect the quality and safety of tobacco products (Girma Regassa and Chandravanshi, 2016).In this context, neural networks offer a promising solution for the effective detection and removal of these impurities.Using deep learning and computer vision techniques (Lu et al., 2022), tobacco robots can autonomously detect and classify impurities, thereby improving the quality of tobacco products.At present, the common impurities detection methods used in the tobacco production line are mainly manual visual detection and mechanical sorting (Chao et al., 2020).Artificial visual detection has the disadvantages of low efficiency, large error, high labor intensity, etc., which is difficult to meet the requirements of highspeed production lines.Mechanical sorting refers to the use of mechanical devices for screening, blowing, adsorption and other operations to separate the impurities from the tobacco.Although this method can improve the detection speed, for the shape and size of small impurities similar to tobacco, such as metal shavings, plastic pieces, etc., it is difficult to effectively identify and remove, and the instrument is expensive, which limits its promotion and application.In recent years, deep learning (Kumar et al., 2022;Lin et al., 2022;Zhang et al., 2023;Han et al., 2024a,b;Lakatos et al., 2024) and neural network techniques (Sun et al., 2023(Sun et al., , 2024a,b) ,b) have flourished in the field of computer vision, and object detection methods based on these techniques have been widely used in industrial scenarios (Qi et al., 2020).Compared to traditional computer vision methods deep learning uses multiple layers of complex nonlinear mapping (Nagamine et al., 2016), these methods can learn more complex features and improve the accuracy and speed of detection.Secondly, deep learning techniques are capable of end-to-end learning (Coleman et al., 2017) and can automatically capture valid features.The development of object detection techniques based on deep learning is mainly divided into two types: two-stage detection methods and single-stage detection methods.The two-stage detection methods are R-CNN (Girshick et al., 2014), Fast R-CNN (Girshick, 2015), Faster R-CNN (Ren et al., 2016), Cascade R-CNN (Cai and Vasconcelos, 2018) and Mask R-CNN (He et al., 2017).Based on Faster R-CNN, Ma et al. (2022) and others used the K-means algorithm to generate clustering centers based on the actual distribution characteristics of object sizes, and performed a homogenization operation on the clustering centers to generate the adaptive anchor box parameters, which improves the ability of the regional suggestion network to search for multi-scale objects.Sha et al. (2022) used a multi-level fusion structure to generate multiscale feature maps with precise position information and semantic features, and then corrected the scale of candidate regions in RPN to improve the detection accuracy of multi-scale aircraft objects in remote sensing images.Xin et al. (2023) removed redundant deep features to improve the network accuracy and reduce the number of parameters by 38.4%.To reduce the background clutter, the CBAM attention module (Woo et al., 2018) is introduced into the backbone of the feature extraction network, which improves the model's detect ability.
For single-stage object detection, commonly used detection algorithms are the SSD (Liu et al., 2016), RetinaNet (Lin et al., 2017), YOLO series (Redmon et al., 2016), EfficientDet (Tan et al., 2020), and CornerNet (Law and Deng, 2018).Yin and Wang (2022), proposed Attention Feature Fusion SSD (AFF).First, the shallow feature information is fused using the Attention Feature Fusion module to reduce noise and improve the correlation of distant pixels in the feature map.Second, a focused classification loss function is used to solve the model degradation problem caused by the imbalance of positive and negative samples during the training process.Wenlong et al. (2024) used Swin Transformer (Liu Z. et al., 2021) as the backbone network to improve the feature extraction capability of the network.The Adaptive Contextual Feature Extraction module is proposed to adaptively adjust the sensory field using deformable convolutions with different zero rates to extract contextual features and improve the effect of multi-scale object detection.The FreeAnchor module (Wang et al., 2022) is introduced to solve the problem of dense small objects in images by designing an optimized anchor frame matching strategy from the perspective of large release estimation.Xiao-pei et al. (2023) introduced the channel-global attention mechanism (Liu Y. et al., 2021) in the backbone network to improve the feature extraction ability for objects of different scales and suppress the interference of redundant information.A dense upsampling convolution   (2009) presented a methodology for identifying foreign matter that involved constructing a standard color library and a typical foreign matter color library.They removed typical foreign matter colors and revised the standard color library.Additionally, they proposed monitoring the action of solenoid valves as a novel approach to detecting larger foreign objects.Additionally, Fuguang and Xiaoqing (2012) utilized 20 × 20 pixel blocks as the unit of measure for cigarette images.These blocks were categorized into two groups: containing foreign objects and not containing foreign objects.A clustering algorithm based on the Adaptive Iterative Self-Organizing Data Analysis Technique (ISODATA) was employed to cluster the pixel blocks that did not contain foreign objects, and subsequently to determine the clustering center.Finally, the authors converted the clustering center into HSI color space and traversed the cells in the cigarette image to detect foreign objects.Furthermore, Li et al. (2021) proposed a stacked convolutional neural network, which effectively recognized and detected moldy tobacco images by extracting and aggregating image features using convolutional kernels of varying sizes from three branches in a stepwise manner.
The following conclusions can be drawn from the above analysis: (1) These methods deserve more processing power and memory when dealing with complex feature extraction tasks.
(2) When the object size is too small, the model will have misdetection and false detection, which will seriously affect the detection accuracy and lead to the failure of the reliable tobacco impurity detection task.
(3) Impurity detection methods are capable of detecting fewer types of impurities, resulting in an inability to .
/fnbot. .effectively identify and analyse different types of impurities in complex industrial scenarios, thus affecting product quality and safety.To address the above issues and improve the detection accuracy of tobacco impurities, a new online detection method of tobacco impurities for tobacco robot is proposed in this article.The main innovation points of this article are as follows: (1) A BCFormer module is proposed to improve the network's ability to detect regions of interest by weakening non-critical information in the image.
(2) An DFA module is proposed to enhance the fusion of detail information from shallow feature maps and semantic information from deep feature maps.
(3) An optimized loss function is proposed to provide more feature information and reduce the inference time during the training phase, thereby improving the accuracy and robustness of the detection algorithm against tobacco impurities.
(4) In this paper, images of tobacco impurities in different weather conditions have been collected to create a tobacco impurity dataset.

Methods . Overall network structure
Figure 1 illustrates the structure of our network.The network uses YOLOX as a base model for detecting tobacco impurities in complex environments.Firstly, we added the BCFormer attention mechanism to the backbone network to extract key feature information of the object region by learning the importance of each feature.Secondly, the Dual Feature Aggregation module (DFA) module is added to the Neck part to better achieve the fusion of the low-level detail information with the high-level semantic information, so as to reduce the interference of the complex background on the detection.Finally, the GIoU loss function is  used as a regression loss function to enable the model to learn more accurate bounding box predictions, further improving the accuracy of object detection.

. BCFormer module
Attention is an important mechanism that focuses on highlighting important information and adjusting weights to improve model performance.It is widely used in various object detection tasks to improve the accuracy and robustness of detection by selecting and focusing on regions or features that are relevant to the object.In general, the attention mechanism, usually referred to as self-attention, is a key technique in the Transformer model.It assigns weights by calculating the dependencies between different positions in the input sequence, thus adjusting the level of attention to each position in the sequence.However, in the original Transformer network, this structure requires the computation of the correlation between any two elements in the input sequence.The computational complexity grows exponentially with the length of the sequence, so models using such techniques require a large amount of computational resources.
To solve the above problem, we propose a visual transformer model called BCFormer, which introduces a Bi-Level Routing Attention (BRA) module in the visual transformer.This module improves the feature representation by enabling information interaction between global and local attention levels.This is because the global attention mechanism captures the overall structure and global information of the image, while the local attention mechanism captures the details and local features of the image.Therefore, the introduction of the Bi-Level Routing Attention (BRA) module can better deal with the global and local relationships in the image and effectively capture the structural and detailed features of the image.This model significantly improves the performance in the detection task, and its structure is shown in Figure 2.
In this paper, Bi-Level Routing Attention is used as the basic building block, so the BCFormer module first uses 3 × 3 convolution to implicitly encode the relative position information, and then performs the normalization operation by Layer Normalization (LN).Then, after the BRA (Bi-Level Routing Attention) module, the feature map is divided into S × S nonoverlapping regions, and most of the irrelevant key-value pairs are filtered out from the rough regions, and only a small portion of the routing regions are retained.Finally, a multilayer perceptron (MLP) and CspLayer are used to select the most appropriate weights and biases for feature transformation, information reorganization and feature extraction.
In the BRA module we define several key concepts: Q is used to compute the weighted relevance of a query with respect to a keyword, K denotes the keyword or identifier that provides information or is used for matching purposes, V is the value associated with the query result and the keyword information, C is a scalar factor used to adjust the allocation of attention and control the focus of attention, A is the adjacency matrix used to represent the semantics between two regions of correlation, O is the output of the attention mechanism.

. Dual Feature Aggregation module
There are objects of different sizes in the image, and objects of different sizes have different features.Simple objects can be discriminated using shallow features, while complex objects can be discriminated using deep features.Previous studies have shown that shallow features are able to extract more information such as location and details due to their high resolution, but lack semantic information and are susceptible to noise; on the contrary, deep features contain more semantic information but have lower resolution.
The YOLO algorithm will gradually lose some of the image's feature information as the network layers deepen during the detection process, resulting in insufficient feature learning for small objects and poor detection performance.Therefore, some researchers try to adjust the feature maps to the same size and then merge them.However, this method may lose some information of the feature map itself during the merging process.
To address this problem, we propose the Dual Feature Aggregation (DFA) module, shown in Figure 3, which aims to improve the information fusion between feature maps at different scales.The specific workflow of this module is as follows: first, we perform a simple alignment fusion process on the feature tensor of two branch networks for subsequent processing.Next, the fusion results are used as guidance information to generate weight parameters by the GPM module.Finally, these weights are used to perform a weighted transformation on the fused feature tensor, and the weighted features are used as inputs to the next module.
Since the DFA module cannot be directly integrated into the structure of YOLOX, we have modified it to suit YOLOX's needs.The specific structure is shown in Figure 3.We apply the DFA module structure to replace the Concat operation in Neck to establish an attention mechanism between Input1 and Input2.DFA module is not only able to fuse the information of feature maps of different scales, but also able to learn to establish the attention to feature maps of different scales, which improves the accuracy of the network in detecting tobacco impurities.

. Loss function
The loss function of YOLOX mainly includes regression loss, confidence loss and classification loss.Among them, the cross-entropy loss function is used for confidence loss and classification loss, while the IoU loss function is used for regression loss.
In regression loss, the IoU loss function is used to calculate the degree of overlap between the predicted frame and the real frame, but it cannot reflect the distance between the two frames, resulting in a gradient of 0, which cannot be optimized.To solve this problem, this paper uses GIoU instead of IoU as the loss function.The GIoU is shown in Equations 1 and 2: Where A represents the prediction frame, B represents the true frame, and C represents the A and B minimal outer box, as shown in Figure 4.
When YOLOX adopts the GIoU loss as the regression loss function, the function integrates the overlap area and the scale, which better satisfies non-negativity and symmetry during training and improves detection accuracy.The L GIoU formula is shown in Equation 3: Therefore, the loss function of this model is specified is shown in Equation 4: Where λ is the regression loss compensation coefficient, which in this paper is assumed to be 5. N pos is the number of anchor points divided into positive samples, and confidence loss and classification loss are shown in Equation 5: Where y = 1 is a positive sample and y = 0 is a negative sample.

Experiments . Dataset
To verify the effectiveness of the algorithm in detecting tobacco impurities, a dataset of tobacco impurities was constructed in this paper.The images of this dataset were taken by researchers from the Zhengzhou Tobacco Research Institute of China Tobacco Corporation on a real production line in two scenarios: day and night, as shown in Figure 5.
The total dataset contains 2,454 images, including 200 fabrics, 250 sponges, 100 sawdust, 150 belts, 200 PV bands, 256 latex, 344 plastics, 150 plastic films, 100 leaves, 150 filaments, 100 wire tubes, 200 rubber, 50 feathers and 204 paper scraps, among other common tobacco impurities categories.In the dataset, 2,000 images were selected as the training set and the remaining images were used as the validation set.The specific data classification is shown in Table 1.
In this dataset, the object area is <32 × 32 pixel, we consider it a small object; when the object area is between 32 × 32 pixels and 96 × 96 pixels, it is considered as a medium object; and when >96 × 96 pixels, it is considered as a large object.In this case, we can classify small, medium and large objects based on the area ratio of the label box and the image.After classification, the number of small, medium and large objects in our dataset is 1,869, 322 and 263, respectively.The specific sample distribution for each category and the overall sample distribution are shown in Figure 6.

. Implementation details
The experimental environment of this paper uses a computer with Windows 10, 64-bit operating system, Intel(R) Core(TM) i7-11700 processor, NVIDIA RTX4000 discrete graphics card, and 8G of RAM.It runs under Python 3.6, Pytorch 1.7.0 and CUDA 11.8.
In this work, a freeze and thaw procedure is used in the training phase.At the beginning of training, for the first 50 generations, a batch size of 16 and a learning rate of 1e-3 are used for freeze training.From 50 to 300 generations, a batch size of 8 and a learning rate of 1e-4 is used for thaw training.The optimizer used was adam with an internal dynamics parameter of 0.937.Due to the limitations of the mAP calculation principle, the network has to obtain almost all prediction frames when calculating the mAP.Therefore, the confidence level was set at 0.5.Note that all images were automatically scaled to 640 × 640 for the detection task.

. Evaluation metrics
In this experiment, Precision (P), Recall (R), Mean Average Precision (mAP) and FPS are used as evaluation indices.P, R, and mAP are calculated as shown in Equations 6-9: And mAP is the mean of all APs in N classes.
Frontiers in Neurorobotics frontiersin.orgWhere TP are positive samples predicted as positive samples, FP are negative samples predicted as positive samples, FN are positive samples predicted as negative samples, AP κ refers to the average precision of the kth class of detected objects, i.e. the area under the PR curve, and mAP is the sum of the average precision of all classes divided by the number of classes.

. The ablation experiment
In this paper we present a series of comparative experiments, each designed to evaluate the effectiveness of the proposed modules.Table 2 shows the results of the removal experiments performed on the Tobacco Impurities dataset, with the bold graph showing the best results.Our results show that the addition of the BCFormer module leads to a 0.16% improvement in P, a 0.37% improvement in R and a 0.93% improvement in mAP, despite the reduction in FPS.This suggests that the BCFormer module improves the focus of the network on relevant information and suppresses irrelevant data.Replacing the Concat section in Neck with the DFA module resulted in a 4.2% improvement in R and a 2.19% improvement in mAP, with a decrease in FPS.The applicability of the DFA module for tobacco impurity detection is demonstrated.In addition, the inclusion of the GIoU loss improved P by 1.89%, R by 3.63% and mAP by 2.19%, demonstrating the effectiveness of this module in the detection of tobacco impurities.Finally, the BCFormer, DFA and GIoU losses were combined to give YOLOX+BCFormer+DFA+GIoU.The experimental results showed that this model increased precision, recall and mean average precision by 0.91%, 9.86%, and 3.01% respectively.
Table 2 displays the results of the ablation experiments that were conducted on tobacco Impurities data using this method.

. Comparison with existing methods
To demonstrate the effectiveness of the method in detecting tobacco impurities, the same trained parameters were used to compare with other state-of-the-art methods on the same tobacco impurity dataset.Table 3 compares the mAP, FPS, F1, and Params obtained by different algorithms, with the bold graph showing the best results.
From Table 3 it can be concluded that the mAP and F1 values obtained by this algorithm are better than the other algorithms and have a smaller number of parameters.Among the 14 types of impurities detected, nine of them achieved 100% mAP detection.Compared with other types of impurities, the detection accuracy of xj is relatively low, mainly due to the severe occlusion of xj, which makes it difficult to extract effective feature information.However, in our method, the realtime detection performance of the model is affected to some extent, as the number of parameters increases while the object detection accuracy improves.
According to Figure 7, we can find that 17,28,8,13,17,7,11,14,24,94,18,34,7,6 were detected for cloth, sponge, wood chip, belt, pv belt, latex, plastic, plastic film, leaf, filament, wire pipe, rubber, feather and paper, respectively, which is more than the original model, with two more belts, one more leaves, and two more filaments detected.The experimental data proved the correctness of the method proposed in this paper, and provided a certain reference significance for the improvement of the detection technology of tobacco filament impurities.
Figure 8 shows the detection results of four types of impurities, namely pvd, pd, xj, and xg, in daytime scenarios.It can be seen that compared with the other two state-of-the-art algorithms, the algorithm proposed in this paper not only effectively detects impurities, but also has a relatively high confidence level.As can be seen, the addition of the BCFormer module to extract multiscale feature information can significantly improve the sensory field of the network.In addition, incorporating the DFA module can enable the network to focus more precisely on the object region of interest, thereby improving the object detection accuracy.Thus, it is evident that the proposed in this paper is capable of significantly reducing the leakage object detection rate under daytime conditions.
The detection results of different algorithms for impurities such as bm in the night environment are shown in Figure 9. Similar conclusions can be drawn that our method can achieve excellent impurity detection results, the clutter detection level of which is higher than that of the comparison methods.In addition, this algorithm can also achieve excellent detection performance for impurities with strong occlusion.
In summary, this algorithm can achieve accurate impurity detection in all weather conditions, providing accurate data for accurate impurity removal in subsequent tobacco production lines.
Figure 10 shows the recall detection results of 14 types of impurities obtained by the algorithm in this article.It can be seen that the algorithm in this article can achieve a high recall rate for most impurities, but the detection effect for belts is poor.After analysis, it was found that the shape volume of belts is small and their morphological features are similar to those of tobacco, which further affects the accuracy of belt feature extraction.

. Practical scenario application
According to the algorithm proposed in this paper, a tobacco impurity detection system is designed and applied to a tobacco impurity detection robot, which mainly consists of an image data acquisition subsystem and a data processing subsystem, as shown in Figure 11.
Figure 12 shows the detection effect of tobacco impurities on the actual production line.It can be seen that the system can achieve accurate detection of various impurities, and the average confidence level is more than 0.9, which fully verifies the effectiveness of the algorithm in this paper.At present, the system has been put into operation and is in good working condition.The accuracy of tobacco impurity detection is high, effectively improving the intelligent level of tobacco impurity detection.

Conclusions
To improve the detection accuracy of tobacco impurities, this paper proposes a new online detection method of tobacco impurities for tobacco robot, which significantly improves the accuracy of tobacco impurity detection.The YOLOX-based algorithm with BCFormer module and DFA module enables the tobacco robot to pay more attention to the region of interest and effectively suppress irrelevant information in the image.The GIoU loss function further reduces the model complexity and improves the detection efficiency.The tobacco robot based on this proposed method, implemented in a real production line, performs well and achieves accurate detection of impurities with high confidence.However, there are still some challenges in detecting leakage in complex contexts.To further improve the efficiency of this method, we will strive to optimize it in our next research to reduce the leakage detection rate in complex environments.
In future work, we will continue to collect images of more types of tobacco impurities to improve the model to detect a larger number of impurities.Meanwhile, we will also improve and optimize the model according to detection tasks in other scenarios to extend the applicability of the model in this paper.

FIGURE
FIGUREOverall structure of the network.

FIGURE
FIGURENetwork structure of DFA module.

FIGURE
FIGUREIllustration of GIoU loss formula.

FIGURE
FIGURE Tobacco impurities image acquisition platform and image.(A) Tobacco debris image acquisition platform.(B) Images collected during the day.(C) Images collected at night.

FIGURE
FIGUREObject class size allocation.

FIGURE
FIGUREFourteen types of recalls obtained.

FIGURE
FIGURERobotic system for tobacco impurity detection.

FIGURE
FIGUREDetection e ect of tobacco impurities on the production line.

TABLE
Introduction to the dataset.
TABLE The results of the ablation experiments.
TABLE A comparison of this method with other methods.