Detection and Classification of Cotton Foreign Fibers Based on Polarization Imaging and Improved YOLOv5

It is important to detect and classify foreign fibers in cotton, especially white and transparent foreign fibers, to produce subsequent yarn and textile quality. There are some problems in the actual cotton foreign fiber removing process, such as some foreign fibers missing inspection, low recognition accuracy of small foreign fibers, and low detection speed. A polarization imaging device of cotton foreign fiber was constructed based on the difference in optical properties and polarization characteristics between cotton fibers. An object detection and classification algorithm based on an improved YOLOv5 was proposed to achieve small foreign fiber recognition and classification. The methods were as follows: (1) The lightweight network Shufflenetv2 with the Hard-Swish activation function was used as the backbone feature extraction network to improve the detection speed and reduce the model volume. (2) The PANet network connection of YOLOv5 was modified to obtain a fine-grained feature map to improve the detection accuracy for small targets. (3) A CA attention module was added to the YOLOv5 network to increase the weight of the useful features while suppressing the weight of invalid features to improve the detection accuracy of foreign fiber targets. Moreover, we conducted ablation experiments on the improved strategy. The model volume, mAP@0.5, mAP@0.5:0.95, and FPS of the improved YOLOv5 were up to 0.75 MB, 96.9%, 59.9%, and 385 f/s, respectively, compared to YOLOv5, and the improved YOLOv5 increased by 1.03%, 7.13%, and 126.47%, respectively, which proves that the method can be applied to the vision system of an actual production line for cotton foreign fiber detection.


Introduction
Cotton is the largest natural fiber in the textile industry. During the processes of cotton cultivation, harvesting, transportation, and processing, a large number of foreign fibers is inevitably mixed in due to various factors, such as cotton hulls, leaves, mulch films, chemical fibers, and paper pieces. These foreign fibers have adverse effects on the textile products, not only reducing the spinning efficiency, but also causing fabric defects and reducing product grade [1]. Therefore, the detection of foreign cotton fibers is an important and necessary step before spinning. It is time-consuming and inefficient to rely on workers to manually detect foreign fibers from cotton, and the detection accuracy of foreign fibers is low [2,3]. In recent years, numerous detection methods for foreign fibers have been developed, including photoelectric, ultrasonic, and optical detection, according to the detection principle [4,5]. However, photoelectric detection technology can only detect colored foreign fibers but not white transparent foreign fibers [6]. Ultrasonic detection technology can only detect foreign fibers in a large area, and its speed is slower [7]. Presently, foreign fiber detection mainly uses machine vision technology with high recognition rate, high detection speed, and low cost. Machine vision technology can be divided into module (CA) network is proposed in this paper, which can realize the real-time detection of foreign fibers of multiple types of small targets.
The following contributions are made by our work: • A polarization imaging device of cotton foreign fiber was constructed using line laser polarization imaging technology.

•
In order to reduce the model volume and improve the detection speed, the lightweight network Shufflenetv2 with Hard-Swish function was added as the backbone feature extraction network.

•
In order to increase the detection accuracy of foreign fibers in small targets, an improved PANet was added to YOLOV5. • The CA module was added before the Head of YOLOv5 to allocate the weight of the channel features and spatial features to improve the accuracy of foreign fiber recognition and classification.
In summary, the line laser polarization imaging approach proposed in this study has an important guiding value for the online identification and classification of cotton foreign fibers and the control of foreign fiber generation in cotton planting and picking. Compared with other typical object detection algorithms, our proposed algorithm has a higher detection speed, smaller model size, and higher detection accuracy and is more suitable for foreign fiber detection tasks.

Experiment Materials
The cotton and foreign fiber samples used in the experiment were provided by the Henan Fiber Inspection Bureau and originated from the Xinjiang Uygur Autonomous Region, China. The experiment was conducted using 20 common types of foreign fibers in cotton, as shown in Figure 1, and the sizes of foreign fibers were categorized as 0.5 mm 2 , 1 mm 2 , 1.5 mm 2 , 3 mm 2 , and 5 mm 2 . Group 1 comprised colored foreign fibers, and it was easier to distinguish them in cotton, whereas Group 2 comprised white transparent foreign fibers that were more difficult to detect because they are extremely similar to cotton fiber in color and appearance.
Sensors 2023, 23, x FOR PEER REVIEW 3 of 26 the YOLOv5 algorithm, and improved methods of Shufflenetv2 and PANet are introduced into YOLOv5. An improved YOLOv5 algorithm combined with an a ention mechanism module (CA) network is proposed in this paper, which can realize the real-time detection of foreign fibers of multiple types of small targets.
The following contributions are made by our work: • A polarization imaging device of co on foreign fiber was constructed using line laser polarization imaging technology. • In order to reduce the model volume and improve the detection speed, the lightweight network Shufflenetv2 with Hard-Swish function was added as the backbone feature extraction network. • In order to increase the detection accuracy of foreign fibers in small targets, an improved PANet was added to YOLOV5. • The CA module was added before the Head of YOLOv5 to allocate the weight of the channel features and spatial features to improve the accuracy of foreign fiber recognition and classification.
In summary, the line laser polarization imaging approach proposed in this study has an important guiding value for the online identification and classification of co on foreign fibers and the control of foreign fiber generation in co on planting and picking. Compared with other typical object detection algorithms, our proposed algorithm has a higher detection speed, smaller model size, and higher detection accuracy and is more suitable for foreign fiber detection tasks.

Experiment Materials
The co on and foreign fiber samples used in the experiment were provided by the Henan Fiber Inspection Bureau and originated from the Xinjiang Uygur Autonomous Region, China. The experiment was conducted using 20 common types of foreign fibers in co on, as shown in Figure 1, and the sizes of foreign fibers were categorized as 0.5 mm 2 , 1 mm 2 , 1.5 mm 2 , 3 mm 2 , and 5 mm 2 . Group 1 comprised colored foreign fibers, and it was easier to distinguish them in co on, whereas Group 2 comprised white transparent foreign fibers that were more difficult to detect because they are extremely similar to co on fiber in color and appearance.

Experiment Equipment
In actual detection, cotton containing foreign fibers was first made into a thin layer with a width of approximately 10 cm and thickness of approximately 2 mm. The cotton thin layer sample was irradiated by a uniform line laser, and the scattered light of cotton was mist-like. Mulch film, plastic and paper pieces, and other white foreign fibers are mostly dense materials, and the reflected light is approximately a mirror reflection [12].
The experiment found that the characteristic information of cotton foreign fiber image was the most prominent when the incident angle of the line laser was about 45 • . For example, when the laser incident angle was 45 • , the average gray value (M(X)) of the foreign fiber image was larger, and the contrast value (Var(x, y)) was the largest, as shown in Table 1. Because of the different polarization characteristics of different foreign fibers, the reflected light waves have polarization information of the foreign fibers, and different types of foreign fibers can be distinguished through polarization imaging [14].
A physical image of the cotton foreign fiber polarization imaging detection device is shown in Figure 2. The sensor of the camera (MV-CH050-10UP, HIKROBOT) was equipped with four-way (0, 45, 90, 135) pixel-level polarization filters with a resolution of 2448 × 2048 and a target surface size of 2/3" using USB power output. The light source was a 405 nm line laser (SL-405-35-S-B-90-24V, OSELA) with a power of 35 mW.

Experiment Equipment
In actual detection, co on containing foreign fibers was first made into a thin layer with a width of approximately 10 cm and thickness of approximately 2 mm. The co on thin layer sample was irradiated by a uniform line laser, and the sca ered light of co on was mist-like. Mulch film, plastic and paper pieces, and other white foreign fibers are mostly dense materials, and the reflected light is approximately a mirror reflection [12].
The experiment found that the characteristic information of co on foreign fiber image was the most prominent when the incident angle of the line laser was about 45°. For example, when the laser incident angle was 45°, the average gray value (M(X)) of the foreign fiber image was larger, and the contrast value (Var(x, y)) was the largest, as shown in Table 1. Because of the different polarization characteristics of different foreign fibers, the reflected light waves have polarization information of the foreign fibers, and different types of foreign fibers can be distinguished through polarization imaging [14].
A physical image of the co on foreign fiber polarization imaging detection device is shown in Figure 2

Dataset, Environment, and Parameters
The target detection dataset in this study was acquired using the image acquisition system shown in Figure 2, containing a total of 3944 foreign fiber target images of 20 categories, which were divided into training, validation, and test sets. The data were enhanced by gaussian blur, affine transformation, brightness transformation, dropping pixel transformation, and flip transformation [33,34]. The enhanced dataset consisted of 21,381 images, and the data format was JPG. Table 2 lists the statistical information of the dataset. The hardware environment and software versions of the experiments are listed in Table 3. In this study, the SGD (stochastic gradient descent) method was used to optimize the learning rate, and the epochs were determined by comparing the loss functions of the training set and validation set. The parameters of the training network are listed in Table 4.

Loss Function and Model Evaluation Metrics
The loss function of YOLOv5 consists of three components, which are confidence loss, bounding box regression loss, and classification loss. The expression of the YOLOv5 loss function is shown below: L obj , L box , and L cls represent confidence loss, bounding box regression loss, and classification loss, respectively. λ 1 , λ 2 , and λ 3 are weight coefficients for the three losses, and changing these coefficients can adjust the emphasis on the three losses. In YOLOv5, L box is calculated using L CIoU [35], which can improve both the speed and accuracy of bounding box regression. The expression for L CIoU is shown below: In the above expression, b and b gt represent the predicted box and ground truth box, respectively; w gt , h gt , w, and h represent the width and height of the ground truth box and the predicted box, respectively; ρ represents the distance between the centers of the two boxes; c represents the maximum distance between the boundaries of the two boxes; and α is a weight coefficient. Both L obj and L cls use the BCEWithLogitsLoss, and their calculation formula is shown below: The BCEWithLogitsLoss function includes both the Sigmoid layer and the BCELoss layer and is suitable for multi-label classification tasks; y n represents the ground truth label, and x n represents the predicted label.
To verify the superior performance of the improved Yolov5 model, we measured the mAP, FPS, model volume, etc. Some commonly used metrics of precision (P), recall (R), average precision (AP), F1 Score (F1), and mean average precision (mAP) were selected to evaluate the model performance [36], and the metrics were defined as follows: Sensors 2023, 23, 4415 TP denotes the positive samples predicted to be correct, FP denotes the negative samples predicted to be incorrect, FN denotes the positive samples predicted to be incorrect, and N denotes the number of sample categories.

Improvement of YOLOv5 Network Architecture
YOLOv5 combines the characteristics of YOLOv1, YOLOv2, YOLOv3, and YOLOv4. YOLOv5 mainly contains four network models, namely, YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x, and the model size and parameters increase sequentially in the four network structures. This study was based on the YOLOv5s network structure, as shown in Figure 3.
TP denotes the positive samples predicted to be correct, FP denotes the negative samples predicted to be incorrect, FN denotes the positive samples predicted to be incorrect, and N denotes the number of sample categories.

YOLOv5 Network Architecture
YOLOv5 combines the characteristics of YOLOv1, YOLOv2, YOLOv3, and YOLOv4. YOLOv5 mainly contains four network models, namely, YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x, and the model size and parameters increase sequentially in the four network structures. This study was based on the YOLOv5s network structure, as shown in Figure 3. The YOLOv5 network structure consists of a backbone, neck, and head, and the image input first goes through the backbone for continuous feature extraction. The focus performs a slice operation on the input image; for example, if the input image size is 640 × 640 × 3, the slice operation will take a value for every other pixel on the image, and the result will be stacked on the channel to obtain a feature layer of 320 × 320 × 12. It is commonly understood to expand the image channel and compress the image height and width. The focus-module structure is shown in Figure 4. The YOLOv5 network structure consists of a backbone, neck, and head, and the image input first goes through the backbone for continuous feature extraction. The focus performs a slice operation on the input image; for example, if the input image size is 640 × 640 × 3, the slice operation will take a value for every other pixel on the image, and the result will be stacked on the channel to obtain a feature layer of 320 × 320 × 12. It is commonly understood to expand the image channel and compress the image height and width. The focus-module structure is shown in Figure 4.  The second layer of the backbone is the CBS module with a convolution kernel size of 3 × 3, which performs convolution the calculation, batch standardization calculation, and SiLU activation function on the input data, adds nonlinearity to the network, and accelerates the convergence speed of the network. The third layer is the C3 module, which The second layer of the backbone is the CBS module with a convolution kernel size of 3 × 3, which performs convolution the calculation, batch standardization calculation, and SiLU activation function on the input data, adds nonlinearity to the network, and accelerates the convergence speed of the network. The third layer is the C3 module, which is mainly composed of n bottleneck modules, three CBS modules, and two convolution layers of size 1 × 1, it and is designed to better extract the deep features of the image. The structures of the bottleneck and C3 modules are shown in Figures 5 and 6, respectively. The second layer of the backbone is the CBS module with a convolution of 3 × 3, which performs convolution the calculation, batch standardization c and SiLU activation function on the input data, adds nonlinearity to the net accelerates the convergence speed of the network. The third layer is the C3 mod is mainly composed of n bo leneck modules, three CBS modules, and two co layers of size 1 × 1, it and is designed to be er extract the deep features of the structures of the bo leneck and C3 modules are shown in Figures 5 and 6, resp  The last layer of the backbone is the SPP module. First, the number of chan input image is halved using the first CBS module, and then the feature map o the first CBS module is passed through three maximum pool layers of differen × 13, 9 × 9, and 5 × 5), and the residual edges constructed together with the ou first CBS module are connected in parallel. Finally, the number of channels is the second CBS module to ensure that the height and width of the feature map o size inputs can be kept consistent after pooling; the structure of the SPP modul in Figure 7.  The second layer of the backbone is the CBS module with a convolution kernel size of 3 × 3, which performs convolution the calculation, batch standardization calculation, and SiLU activation function on the input data, adds nonlinearity to the network, and accelerates the convergence speed of the network. The third layer is the C3 module, which is mainly composed of n bo leneck modules, three CBS modules, and two convolution layers of size 1 × 1, it and is designed to be er extract the deep features of the image. The structures of the bo leneck and C3 modules are shown in Figures 5 and 6, respectively.  The last layer of the backbone is the SPP module. First, the number of channels of the input image is halved using the first CBS module, and then the feature map output from the first CBS module is passed through three maximum pool layers of different sizes (13 × 13, 9 × 9, and 5 × 5), and the residual edges constructed together with the output of the first CBS module are connected in parallel. Finally, the number of channels is halved by the second CBS module to ensure that the height and width of the feature map of different size inputs can be kept consistent after pooling; the structure of the SPP module is shown in Figure 7. The last layer of the backbone is the SPP module. First, the number of channels of the input image is halved using the first CBS module, and then the feature map output from the first CBS module is passed through three maximum pool layers of different sizes (13 × 13, 9 × 9, and 5 × 5), and the residual edges constructed together with the output of the first CBS module are connected in parallel. Finally, the number of channels is halved by the second CBS module to ensure that the height and width of the feature map of different size inputs can be kept consistent after pooling; the structure of the SPP module is shown in Figure 7.
The neck network constructs feature pyramids for enhanced feature extraction to tain more contextual information. Three feature maps are generated in the backbone work; the three feature layers are 80 × 80, 40 × 40, and 20 × 20 from shallow to deep. A the three effective feature layers are obtained, the FPN feature pyramid structure is structed first, and the 20 × 20 feature layer is upsampled to obtain the 40 × 40 feature l The neck network constructs feature pyramids for enhanced feature extraction to obtain more contextual information. Three feature maps are generated in the backbone After the three effective feature layers are obtained, the FPN feature pyramid structure is constructed first, and the 20 × 20 feature layer is upsampled to obtain the 40 × 40 feature layer and then stacked with the corresponding 40 × 40 feature layers in the backbone network. A feature layer of 80 × 80 was obtained by upsampling twice in the FPN structure, and strong semantic features were transferred. Subsequently, the PAN structure was constructed to convey stronger localization features, and the 80 × 80 feature layer was downsampled to obtain a 40 × 40 feature layer, which was stacked with a 40 × 40 feature layer in the FPN network structure. The PAN network structure was downsampled twice, and the final outputs were 80 × 80, 40 × 40, and 20 × 20 enhanced effective feature layers, respectively. Finally, we used these three enhanced feature layers to input the Yolo Head to obtain the regression prediction and classification prediction results.

Proposed Approach: YOLOv5-CFD
This study made corresponding improvements to the backbone, neck, and head of YOLOv5. First, Shufflenetv2 was introduced as the backbone feature extraction network under the premise of ensuring detection accuracy. The weight parameter and volume of the network were reduced, and the lightweight design of the model was realized. Moreover, because the foreign fibers were mostly small-sized targets, the FPN + PAN structure was modified to obtain feature maps with more fine-grained information. Finally, the CA attention module was added to the front of the Yolo Head to improve the detection accuracy. The improved YOLOv5 (YOLOv5-CFD) network structure is illustrated in Figure 8. tain more contextual information. Three feature maps are generated in the backbone network; the three feature layers are 80 × 80, 40 × 40, and 20 × 20 from shallow to deep. After the three effective feature layers are obtained, the FPN feature pyramid structure is constructed first, and the 20 × 20 feature layer is upsampled to obtain the 40 × 40 feature layer and then stacked with the corresponding 40 × 40 feature layers in the backbone network. A feature layer of 80 × 80 was obtained by upsampling twice in the FPN structure, and strong semantic features were transferred. Subsequently, the PAN structure was constructed to convey stronger localization features, and the 80 × 80 feature layer was downsampled to obtain a 40 × 40 feature layer, which was stacked with a 40 × 40 feature layer in the FPN network structure. The PAN network structure was downsampled twice, and the final outputs were 80 × 80, 40 × 40, and 20 × 20 enhanced effective feature layers, respectively. Finally, we used these three enhanced feature layers to input the Yolo Head to obtain the regression prediction and classification prediction results.

Proposed Approach: YOLOv5-CFD
This study made corresponding improvements to the backbone, neck, and head of YOLOv5. First, Shufflenetv2 was introduced as the backbone feature extraction network under the premise of ensuring detection accuracy. The weight parameter and volume of the network were reduced, and the lightweight design of the model was realized. Moreover, because the foreign fibers were mostly small-sized targets, the FPN + PAN structure was modified to obtain feature maps with more fine-grained information. Finally, the CA a ention module was added to the front of the Yolo Head to improve the detection accuracy. The improved YOLOv5 (YOLOv5-CFD) network structure is illustrated in Figure 8.

Improvement of Backbone Network
ShufflenetV2 was proposed by Ma et al. [37] and was based on ShufflenetV1 and four efficient network design principles. The ShufflenetV2 model excels in speed and accuracy, making it an ideal lightweight network for deployment in mobile devices. First, ShufflenetV2 divides the input of the feature channel into two branches by the "Channel Split" operation. One branch has the same structure, and the other branch consists of three convolutions with the same input and output channels. The two branches are concatenated after convolution to keep the number of channels constant. Finally, the "Channel Shuffle" operation is used to ensure the information exchange between the two branches. ShufflenetV2 contains a basic unit and a unit for spatial downsampling (2×), as shown in Figure 9.
flenetV2 divides the input of the feature channel into two branches by the "Channel Split" operation. One branch has the same structure, and the other branch consists of three convolutions with the same input and output channels. The two branches are concatenated after convolution to keep the number of channels constant. Finally, the "Channel Shuffle" operation is used to ensure the information exchange between the two branches. Shuf-flenetV2 contains a basic unit and a unit for spatial downsampling (2×), as shown in Figure  9. In this paper, ShufflenetV2 units with stride = 2 and stride = 1 were chosen to construct a new backbone network, and the output of each stage in the new backbone was connected to PANet. Moreover, we replaced the activation function in the ShufflenetV2 unit with the H-swish activation function, as shown in Equation (9):

Improvement of PANet Network
Among the three effective features of the FPN + PAN structure output, the 20 × 20 and 40 × 40 feature maps were used to detect larger targets, whereas foreign fibers in cotton are mostly small-sized targets. Moreover, the image size of our input network was 2448 × 2048, and the grid pixels corresponding to the 20 × 20 and 40 × 40 feature maps were 128 × 108 and 64 × 54, respectively, when performing the bounding box regression. The k-means clustering statistics showed that nearly 75% of the foreign fiber target pixels In this paper, ShufflenetV2 units with stride = 2 and stride = 1 were chosen to construct a new backbone network, and the output of each stage in the new backbone was connected to PANet. Moreover, we replaced the activation function in the ShufflenetV2 unit with the H-swish activation function, as shown in Equation (9):

Improvement of PANet Network
Among the three effective features of the FPN + PAN structure output, the 20 × 20 and 40 × 40 feature maps were used to detect larger targets, whereas foreign fibers in cotton are mostly small-sized targets. Moreover, the image size of our input network was 2448 × 2048, and the grid pixels corresponding to the 20 × 20 and 40 × 40 feature maps were 128 × 108 and 64 × 54, respectively, when performing the bounding box regression. The k-means clustering statistics showed that nearly 75% of the foreign fiber target pixels were below 60, as shown in Figure 10 To solve the problem of an excessive number of small targets, the PANet network connection was improved to obtain a feature map with fine-grained information. A new 160 × 160 feature map was generated by upsampling the output of the backbone network twice and fusing it with the feature map of the corresponding size of the backbone. Because the improved backbone network generated three layers of feature mapping of 320 × 320, 160 × 160, and 80 × 80, the FPN did not require secondary upsampling; hence, the final YOLO detection heads were 160 × 160 and 80 × 80; Figure 11 shows the PANet network improvement schematic diagram of YOLOv5.
to anchors ([116, 90], [156,198], [373,326]) and ( [30,61], [62,45], [59, 11 were larger, and many operations were useless when performing the b gression. The 20 × 20 and 40 × 40 feature maps and large target identifica discarded, making the bounding box regression more accurate and minim computational cost. To solve the problem of an excessive number of small targets, the connection was improved to obtain a feature map with fine-grained inf 160 × 160 feature map was generated by upsampling the output of the ba twice and fusing it with the feature map of the corresponding size of th cause the improved backbone network generated three layers of feature × 320, 160 × 160, and 80 × 80, the FPN did not require secondary upsam final YOLO detection heads were 160 × 160 and 80 × 80; Figure 11 show work improvement schematic diagram of YOLOv5.  To solve the problem of an excessive number of small targets, the PANet n connection was improved to obtain a feature map with fine-grained information 160 × 160 feature map was generated by upsampling the output of the backbone n twice and fusing it with the feature map of the corresponding size of the backbo cause the improved backbone network generated three layers of feature mapping × 320, 160 × 160, and 80 × 80, the FPN did not require secondary upsampling; he final YOLO detection heads were 160 × 160 and 80 × 80; Figure 11 shows the PAN work improvement schematic diagram of YOLOv5.

CA Module Design
Hou et al. [38] proposed a novel attention mechanism for mobile networks called "Coordinate Attention" by embedding location information into channel attention in 2021, as shown in Figure 12.

CA Module Design
Hou et al. [38] proposed a novel a ention mechanism for mobile networks ca "Coordinate A ention" by embedding location information into channel a ention in 20 as shown in Figure 12. Coordinate A ention focuses on the image width and height and encodes pre position information. First, the input feature map was divided into the width and hei directions for global averaging pooling to obtain the feature maps in the width and hei directions. The output of the c-th channel with the height and width is expressed as lows: The above equation integrates the features from different directions and outpu pair of feature maps with known directions. The module can capture long distance r tionships in one direction while retaining spatial information in the other, helping network locate targets more accurately.
Stitching together the feature maps in the width and height directions of the obtai global perceptual field, the channel is compressed to the original C/r using a 1 × 1 con lution. Subsequently, the BatchNorm and H-swish activation functions are used for coding, followed by a 1 × 1 convolution to adjust the channels of the feature map to equal to the number of channels of the input feature map. The a ention weights gh gw of the feature map on the height and width, respectively, are obtained after the moid function, as shown below: Figure 12. Coordinate Attention network structure.
Coordinate Attention focuses on the image width and height and encodes precise position information. First, the input feature map was divided into the width and height directions for global averaging pooling to obtain the feature maps in the width and height directions. The output of the c-th channel with the height and width is expressed as follows: The above equation integrates the features from different directions and outputs a pair of feature maps with known directions. The module can capture long distance relationships in one direction while retaining spatial information in the other, helping the network locate targets more accurately.
Stitching together the feature maps in the width and height directions of the obtained global perceptual field, the channel is compressed to the original C/r using a 1 × 1 convolution. Subsequently, the BatchNorm and H-swish activation functions are used for encoding, followed by a 1 × 1 convolution to adjust the channels of the feature map to be equal to the number of channels of the input feature map. The attention weights g h and g w of the feature map on the height and width, respectively, are obtained after the sigmoid function, as shown below: Finally, a weighted fusion is performed on the original feature map to obtain the final feature map with attention weights in the height and width directions, as shown in the following equation: Based on the characteristics of multiple types and small targets with different fibers, this study added a CA module at the front end of each of the two detection heads of the Yolo Head to improve the performance of the network at a low cost, thus improving the overall accuracy of target detection.
The flow chart of the foreign fiber detection method used in this study is shown in Figure 13.
Finally, a weighted fusion is performed on the original feature map to obtain the final feature map with a ention weights in the height and width directions, as shown in the following equation: Based on the characteristics of multiple types and small targets with different fibers, this study added a CA module at the front end of each of the two detection heads of the Yolo Head to improve the performance of the network at a low cost, thus improving the overall accuracy of target detection.
The flow chart of the foreign fiber detection method used in this study is shown in Figure 13.  Figure 14 shows the loss reduction curves of the YOLOv5-CFD model for the training and validation sets of foreign fiber images. As can be observed from the loss curve, the loss value dropped to a relatively small value when the number of training rounds was 20, and the network stabilized when the number of training rounds was 120.  Figure 14 shows the loss reduction curves of the YOLOv5-CFD model for the training and validation sets of foreign fiber images. As can be observed from the loss curve, the loss value dropped to a relatively small value when the number of training rounds was 20, and the network stabilized when the number of training rounds was 120.

Results and Discussion
The confusion matrix of the YOLOv5-CFD model is shown in Figure 15. It can be observed from the figure that most of the targets of different fiber types were correctly predicted with a low target miss rate, indicating that the model exhibited good performance. Figure 16 shows the PR curve of YOLOv5-CFD test set and shows the change curve of the accuracy and recall of the detection results of twenty kinds of foreign fiber targets. According to statistics, the overall detection result mAP@0.5 was 96.9%.

Ablation Experiment
The effect of the improved method on the model performance was analyzed by ablation experiments. For comparison purposes, the experiment was divided into five groups. The first group was the original YOLOv5 network. In the second group, the ShufflenetV2 module was introduced into the backbone feature extraction network module of the YOLOv5. The third group modified the PANet network connection method using YOLOv5. In the fourth group, a CA module was added to the front of each of the two detection heads of YOLOv5. The last set of experiments was the result of the model used in this study. The experimental results are listed in Table 5.
As seen in Table 5, the use of the ShufflenetV2 module in the back-bone feature extraction network reduced mAP@0.5 and mAP@0.5:0.95 by 1.95% and 2.73%, respectively, but the model volume decreased The confusion matrix of the YOLOv5-CFD model is shown in Figure 15 observed from the figure that most of the targets of different fiber types wer predicted with a low target miss rate, indicating that the model exhibited go mance.  Figure 16 shows the PR curve of YOLOv5-CFD test set and shows the change curve of the accuracy and recall of the detection results of twenty kinds of foreign fiber targets. According to statistics, the overall detection result mAP@0.5 was 96.9%.

Ablation Experiment
The effect of the improved method on the model performance was analyzed by ablation experiments. For comparison purposes, the experiment was divided into five groups. The first group was the original YOLOv5 network. In the second group, the ShufflenetV2 module was introduced into the backbone feature extraction network module of the

Ablation Experiment
The effect of the improved method on the model performance was anal tion experiments. For comparison purposes, the experiment was divided int The first group was the original YOLOv5 network. In the second group, the module was introduced into the backbone feature extraction network m

Comparison of Different Models
To verify the superiority of the YOLOv5-CFD model in cotton foreign fiber detection, we compared it with the most advanced foreign fiber detection models, YOLOv5, YOLOv4, SSD, and Faster-RCNN. The relevant parameters of the experiments were also strictly controlled using a uniform image size as the input and a uniform training and test set for experimental testing.
Comparing the overall test results of Faster-RCNN, SSD, YOLOv4, YOLOv5, and YOLOv5-CFD with mAP@0.5, as shown in Figure 17, it can be seen that YOLOv5-CFD model had better performance.
Sensors 2023, 23, x FOR PEER REVIEW 17 Figure 17. P-R curves of different detection models.
The pictures used in the comparative experiment in Figure 18 are from the test this paper [39]. Each experiment was conducted in the same environment. Figu shows the detection effects of different models in different cases. The images contain plex light environments, small target foreign fibers, and multiple types of foreign fi so the problems of multiple types of small target foreign fibers in a complex light env ment are fully considered, providing a convenient way to fully demonstrate the ro ness and generalization ability of the model. The pictures used in the comparative experiment in Figure 18 are from the test set of this paper [39]. Each experiment was conducted in the same environment. Figure 18 shows the detection effects of different models in different cases. The images contain complex light environments, small target foreign fibers, and multiple types of foreign fibers, so the problems of multiple types of small target foreign fibers in a complex light environment are fully considered, providing a convenient way to fully demonstrate the robustness and generalization ability of the model.  From the image detection results, it could be observed that for large foreign fibers, most of the five models were recognized, and YOLOv5-CFD had the highest correct classification rate. For small foreign fibers, YOLOv5-CFD had the highest recognition rate and correct classification rate. For the first image, YOLOv5-CFD was identified and classified correctly. In the second image, YOLOv5-CFD had the highest recognition rate with only one missed target, and YOLOv5 and Faster-RCNN had the highest correct classification rate. For the last image, YOLOv5-CFD, YOLOv5, and Faster-RCNN were all identified correctly, and only YOLOv5-CFD and SSD were classified correctly; however, the SSD model had multiple overlapping detection frames in the detection. In summary, the YOLOv5-CFD model outperformed the other four models in terms of the test results.
As shown in Table 6, the model volume, mAP@0.5, mAP@0.5:0.95, and FPS of the YOLOv5-CFD were up to 0.75 MB, 96.9%, 59.9%, and 385 f/s, respectively, which were better than the values of YOLOv5 (13.82 MB, 95.87%, 52.77%, and 170 f/s, respectively), followed by YOLOv4 (244.78 MB, 93.59%, 50.50%, and 88 f/s, respectively), and SSD (100.29 MB, 83.07%, 39.06%, and 128 f/s, respectively). Furthermore, the results of Faster-RCNN (108.91 MB, 75.68%, 33.60%, and 9 f/s, respectively) were worse. The results showed that the overall performance of the proposed YOLOv5-CFD was the best [40]. The main improvement of the YOLOv5-CFD model is the volume size of the model and the detection speed; these enhancements meet the high requirements of the actual production line detection of cotton foreign fibers, and the detection accuracy of YOLOv5-CFD for small target foreign fibers is also the highest. Based on the above analysis, the YOLOv5-CFD object detection algorithm proposed in this study improves the detection speed and accuracy of foreign fiber targets and significantly reduces the model size.

YOLOv5-CFD Test Results
In order to test the robustness and anti-interference of the YOLOv5-CFD model, this paper repeatedly tested the miss-recognition rate, misjudgment rate, precision, recall, and F1 score of the model under different illumination, different incident angles, different cotton foreign fiber samples, different foreign fiber positions, different foreign fiber sizes, and different environments. Combined with the sampling frequency of the camera, the speed of the conveyor belt was set to 4 m/min. The misrecognition rate is the rate of failure to identify the presence of foreign fibers, and the misjudgment rate is the rate of judging the position where there is no foreign fiber as the presence rate. For each test condition, the precision and recall values for each category are first calculated, and then the averages of the precision and recall values for each category are taken. The test results of the YOLOv5-CFD model are shown in Table 7.
The experiments of foreign fibers (including mulch film, foam, feather, white paper, polyethylene, polypropylene, and chemical fiber) detection and classification were made. The results showed that the environmental light intensity changes had some influence on the foreign fiber classification, but little effect on the detection. The interference of strong light such as sunlight caused an increase in the misrecognition rate. The classification performance of the model was the best under dark conditions and the worst under sunlight conditions. Foreign fibers were difficult to identify with a small or large incidence angle such as 15 • or 90 • . When the incident angle was around 45 • , the detection and classification of foreign fibers were optimal. For the different variety of samples, the YOLOv5-CFD model could generally detect foreign fibers well, and the average F1 score of the three numbered samples was about 0.69. Under the condition of different positions of foreign fibers, there were no omissions and misjudgments, and the classification results were the same. Under the condition of different sizes of foreign fibers, the minimum size of foreign fibers detected by the YOLOv5-CFD model was 0.5 mm 2 . Smoke and dust almost had no interference of linear laser polarization imaging. In summary, the proposed method has good robustness and anti-interference, meets the basic detection of cotton foreign fibers on the actual production line, and has practical application value.

Conclusions
To address the problem of foreign fiber detection in cotton, a polarization imaging device of cotton foreign fiber was constructed using the difference in optical properties and polarization characteristics between cotton fibers and foreign fibers. Moreover, an object detection algorithm for cotton foreign fiber based on the improved YOLOv5 was proposed, which consisted of three key steps: The lightweight network Shufflenetv2 with the Hard-Swish activation function was used as the backbone feature extraction network, an improved PANet was added to YOLOV5, and a CA module was added before the Head of YOLOv5. The robustness and anti-interference of the improved YOLOv5 model under various conditions were also tested. Compared with the YOLOv5 foreign fiber detection model, the improved YOLOv5 foreign fiber detection model had better performance in mAP@0.5, mAP@0.5:0.95, and FPS, which increased by 1.03%, 7.13%, and 126.47%, respectively. The improved model is capable of performing online identification and classification of small foreign fiber targets of various types in cotton transportation.