Defect Detection in Synthetic Fibre Ropes using Detectron2 Framework

Fibre ropes with the latest technology have emerged as an appealing alternative to steel ropes for offshore industries due to their lightweight and high tensile strength. At the same time, frequent inspection of these ropes is essential to ensure the proper functioning and safety of the entire system. The development of deep learning (DL) models in condition monitoring (CM) applications offers a simpler and more effective approach for defect detection in synthetic fibre ropes (SFRs). The present paper investigates the performance of Detectron2, a state-of-the-art library for defect detection and instance segmentation. Detectron2 with Mask R-CNN architecture is used for segmenting defects in SFRs. Mask R-CNN with various backbone configurations has been trained and tested on an experimentally obtained dataset comprising 1,803 high-dimensional images containing seven damage classes (placking high, placking medium, placking low, compression, core out, chafing, and normal respectively) for SFRs. By leveraging the capabilities of Detectron2, this study aims to develop an automated and efficient method for detecting defects in SFRs, enhancing the inspection process, and ensuring the safety of the fibre ropes.


Introduction
In offshore industries, SFRs made from materials like Dyneema are nowadays used by cranes to lift and hoist heavy equipment to the platform facilities.These newly developed ropes are lightweight, hydrophobic, UV resistant, and offer 15 times higher strength than the traditional steel ropes [McKenna et al., 2004, Hoppe, 1997].Due to their prolonged use in critical systems, SFRs suffer from plastic wear, abrasive wear, slack strands, and slack wires, necessitating continuous testing and evaluation of defects and damages to estimate their remaining useful life (RUL) [Feyrer, 2007, Ridge et al., 2001, Onur and İmrak, 2011].Manual inspection of SFRs is a costly, challenging, inefficient, and time-consuming task.Therefore, there is a need to develop a faster and more automatic inspection scheme for inspecting damages in SFRs, reducing the total maintenance cost.In the literature, non-destructive testing (NDT) techniques [Antin et al., 2019] such as magnetic [Yan et al., 2017, Zhang et al., 2006], acoustic emission [Casey and Taylor, 1985], ultrasound, X-rays, and γ-rays [Schlanbusch et al., 2017], fibre optics [Huang et al., 2020, Paixao et al., 2021], and computer vision method (CVM) [Vallan andMolinari, 2009, Platzer et al., 2010] etc. have been used for defect detection in steel wire ropes (SWRs).In recent years, DL models have been used for various applications such as condition monitoring, image analysis, video surveillance, etc. DL models process data collected from sensors and cameras to detect defects or damages.DL models in computer vision for object detection have been designed to provide additional information on the location and shape of objects.In condition monitoring applications, these models have shown remarkable performance in detecting damages for a variety of settings such as electrical systems [Tabernik et al., 2020], conveyor belts [Yang et al., 2019], manufacturing industries [Yang et al., 2020], railway tracks [Wei et al., 2019], roads [Pham et al., 2020] etc.Recently developed one-stage detectors such as YOLO version (v2,v3,v4,v5,v6,v7,and v8) [Zhu et al., 2021] and single shot multi-box Detector (SSD) [Liu et al., 2016] have been used in defect detection due to their simple structure and fast speed.The single-stage detector utilizes the same feed-forward network fully convolutional network (FCN) [Long et al., 2015] for detecting bounding boxes (BBox's) for object classification.FCN architecture allows the model to efficiently process the input image to generate predictions for BBox's coordinates and corresponding object classes.This integrated approach simplifies the detection pipeline and reduces computational complexity, making it suitable for real-time applications.However, its accuracy is comparatively lower than two-stage detectors based on Region-based Convolutional Neural Networks (R-CNN) [Bharati and Pramanik, 2020].The two-stage detectors have separate stages for region proposal and classification, allowing more precise localization and improved classification accuracy.R-CNN was a breakthrough for object detection, and semantic segmentation [Cai andVasconcelos, 2018, Wang et al., 2017], as it introduced a novel approach by combining DL with a region proposal network (RPN) generating object regions in an image.These proposed regions are then processed by a CNN-based classifier to determine the presence of objects and classify them into predefined categories.This multi-stage approach improved accuracy and efficiency compared to traditional methods.Variations such as Fast R-CNN [Girshick, 2015], Faster R-CNN [Ren et al., 2015], and FCN [Long et al., 2015] have demonstrated the influence of deep R-CNN in achieving high performance for semantic segmentation tasks to detect and localize objects in an image.Semantic segmentation classifies each pixel in an image into a set of pre-defined classes.Later, Mask R-CNN [He et al., 2017, Bolya et al., 2019] framework was developed, enabling instance segmentation that not only classifies each pixel as in the case of semantic segmentation but divides an image into separate, distinct parts corresponding to the object of interest.Panoptic segmentation [Kirillov et al., 2019] combines the features of semantic and instance segmentation for a better understanding of an image by providing class labels for each pixel and unique instance IDs for each object.Recently developed Detectron2, a successor of the Detectron is a state-of-the-art library providing a wide range of functionalities for object detection and segmentation tasks [Wu et al., 2019].Detectron2 includes a collection of models, such as Mask R-CNN, Faster R-CNN, Fast R-CNN, RetinaNet, TridentNet, DensePose, Cascade R-CNN, and Tensor-Mask, etc. Detectron2 has support for three different types of segmentation: semantic, instance, and panoptic segmentation, respectively.
In the present paper, the state-of-the-art Detectron2 framework with the Mask R-CNN architecture was used to detect defects on experimentally collected high-resolution SFR image datasets.Detectron2 utilizes pre-trained models to harness the power of transfer learning (TL).These models are pre-trained on massive image datasets and then fine-tuned on the specific task.In this way, Detectron2 is a scalable and efficient approach for faster inference on real-time scenarios.The Mask R-CNN architecture in Detectron2 effectively handles the complex and overlapping damage instances thereby ensuring accurate and detailed segmentation.The primary objective of this study is to train a model capable of accurately detecting and segmenting the defects in the SFRs dataset by assigning specific labels to each pixel or region of interest.The model's performance is then assessed using appropriate evaluation metrics.The present paper analyses the problem of defect detection for experimentally obtained high-dimensional SFR image datasets.We use the Detectron2, a successor of the Detectron and Mask R-CNN benchmark.Mask R-CNN extends the capabilities of Faster R-CNN by incorporating valuable information about object masks in addition to bounding box coordinates and class labels.
The rest of the paper is structured as follows: Section 2 provides an overview of the proposed Detectron2 framework, covering aspects such as experimental setup, data collection methodology, annotation process, and model configuration.Additionally, the evaluation metrics employed in this study are presented, which serve as the basis for assessing the model's performance.Section 3 discusses the results, showcasing the performance of the proposed work.Finally, Section 4 concludes the paper, summarizing the findings and discussing their implications.

Model
Figure 1 illustrates the fundamental architecture of the Detectron2 framework.The architecture consists of two main stages: the backbone and the head.The backbone stage allows a combination of sub-networks, including ResNet, ResNeXt, Feature Pyramid Networks (FPN), and other similar architectures, as long as their input and output dimensions are compatible.Each network has its own strengths and characteristics making them suitable for specific tasks.ResNet introduces residual blocks with skip connections addressing the challenge of training very deep neural networks allowing them to learn complex features with high accuracy.ResNeXt, an extension of the ResNet network introduces cardinality (number of parallel internal paths or branches) within a single block to enhance its feature extraction capabilities.This improves the network accuracy in comparison to increasing the number of layers of the network.On the contrary, FPN addresses the issue of multi-scale object detection and feature learning by creating a pyramid of features making it effective for tasks where objects can appear in various sizes.In the present work, the performance of different backbone architectures is compared to choose the most suitable architecture for detecting defects in SFRs.The chosen architecture is utilized to extract significant features from the input image.These extracted features are then directed to a region proposal network (RPN) responsible for determining the presence of objects within specific regions.Subsequently, these regions are processed by the head stage through an FCN layer to predict the coordinates of BBox's and associated class labels.During this phase, Intersection over Union (IoU) metrics are computed.The correct predictions of BBox's can be calculated using the Intersection over Union (IoU) score defined by: where IoU ≥ 0.5 depicts a good match while no match otherwise.Also, ∩ and ∪ represents intersection and union between predicted BBox's (PB) and actual/ground-truth BBox's (GB) respectively.
The capabilities of the model are further enhanced by incorporating a mask branch into the existing architecture.This introduces a segmentation mask for each detected object region, facilitating precise object segmentation.These additional components enable robust object detection, accurate BBox prediction, and effective instance segmentation.

Experimental Setup
The experimental setup consists of a motor, three sheaves (one sheave for holding weight, two rotation pulley blocks, two wire guide wheels), four LED lights, NVIDIA Jetson Nano P3450, and three defective SFRs each of length 8 cm subjected to a weight of 50 kg.During the data collection process, the SFRs were rotated on sheaves supported by rotation pulleys and wheels for guiding the rope used for lifting purposes.This rotational movement helps simulate real-world scenarios where the ropes are subjected to rotational forces.The experimental setup is shown in Figure 2.

Rope Description
Dyneema [dyn, 2023] is a gel-spun, multi-filament fibre manufactured from HMPE (high-modulus-poly-ethene) or UHMWPE (Ultra-high-molecular-weight-poly-ethylene).It possesses several notable characteristics such as high strength, low weight, low elongation at break, and resistance to most chemicals or harsh environments.These excellent mechanical properties with low density, result in a high performance-to-weight ratio.It serves as a valuable resource for researchers and industry professionals to monitor and assess the conditions of fibre ropes as potential replacements for steel wire ropes (SWRs).In the present work, Dyneema fibre is used to construct the rope for conducting the experiment.The characteristics of the rope used in the experiment are given as follows: • Fibers: Dyneema SK 75/78

Data Collection
The image dataset has been acquired using a Basler acA2000 camera with a Basler C11-5020-12M-P Premium 12megapixel lens.To read the images from the camera, an NVIDIA Jetson Nano P3450 was used as the processing (for reading and analyzing the captured data) platform.To provide sufficient and uniform illumination, four Aputure AL-MC RGBWW LED lights each having an illumination of 1000 lux have been used.A total of 1,803 images having a resolution of 2000x 1080 pixels have been collected to apply the defect detection algorithms.The ISO standard 9554:2019 "Fibre rope -General specifications" provides comprehensive information regarding the potential damages that may occur during the lifespan of SFRs.[ISO, 2019].This standard serves as a valuable reference for understanding the types of defects that can occur in these ropes.The experiment has been performed on three SFRs where defects have been artificially introduced by an expert roper from Dynamica Aps, Denmark [dyn, 2023].Figure 3 depicts the various rope damages considered in the paper.The three ropes contain loop defects (pulled strands) as one of the potential defects that can occur in various applications.The severity of the loop defects varied, ranging from high to medium levels.In addition to the loop defects, each of the three defective ropes also contained other types of defects, including compression, abrasion, and core-out.By including these specific defects in the experiment, the aim was to provide a realistic representation of the types and severity levels of defects that can occur in SFRs during their operational lifespan.The detailed layout of the collected SFRs dataset related to the number of images in each set, defect type, and distribution is depicted in Table 1.

Object Detection and Segmentation
Detectron2 has robust object detection and segmentation capabilities.It can accurately identify and differentiate objects even when they overlap or occlude each other.This is especially important in scenes with multiple objects of varying sizes and orientations.As a result, Detectron2 provides rich information beyond classification, allowing for spatial understanding, precise localization, and instance-level analysis.
The primary outcome of an object detection method is in the form of BBox.In the case of SFRs, the defects to be detected are generally asymmetric in shape.Therefore, a rectangular BBox may not be suitable for annotation.In such cases, polygons can be used as an alternative solution for annotating defects.A polygon may have arbitrary points, making it more accurate for defining the defects in SFRs.In the present work, labeling was performed on the collected dataset with the polygon annotation method using VGG image annotator (VIA), an open-source tool.The labeled dataset includes seven classes; loop high, loop medium, loop low, compression, core out, abrasion and normal respectively.Figure 4 depicts the annotated image using the VGG annotator.

Model Configuration
The training configuration is described as follows: • Training dataset: A custom train dataset of SFRs was introduced to the platform consisting of 1,315 images each having a dimension of 2040 x 1086 pixels.
• Learning rate (LR): The learning rate for the model was set to 0.00025.
• Total iterations: The training phase was performed for a total of 30,000 iterations.
• Number of classes: The model was configured to have 7 classes: loop high, loop medium, loop low, compression, core out, abrasion and normal respectively.These classes represent various damages on the SFRs.
• Threshold: The testing threshold for object detection was set to 0.70.During the inference, objects with a detection score above 0.70 are considered positive detections.

Results
The Detectron2 framework was tested using the Mask R-CNN architecture for detecting defects on the experimentally collected dataset of SFRs.The framework was implemented with a CPU speed of 3 GHz, 32 GB RAM, and an Nvidia GeForce RTX 4090 GPU.The entire dataset of 1,803 images has been divided into three subsets: train, validation, and test images.The model was trained with 1,315 images and then fine-tuned and validated over 331 images.Also, 157 images have been used for evaluating the model's performance, serving as the test images.Additionally, due to the limited size of the dataset, in order to avoid overfitting, a series of data augmentation techniques were implemented such as rotation, flipping, and adjustments in brightness and contrast.52.191 66.195 58.131 25.319 53.682 * AP is average precision, AP 50 and AP 75 is the AP computed at an IoU value of 0.50 and 0.75 respectively.* AP m and AP l are AP for medium and large objects.

Data Augmentation
The training dataset encompasses a considerable number of defective images and defect types.However, despite this diversity, an inherent imbalance among the datasets may still exist.To address this, various data augmentation techniques on the collected SFR dataset have been applied in the present work.These techniques include resizing, horizontal/vertical flipping, rotation, and random adjustments to contrast and brightness.The augmentation was applied prior to training the model, serving the purpose of augmenting the instances of damages within the dataset.For resizing.the dataset was transformed with min and max sizes set at 800 pixels.The brightness and contrast augmentation was applied with a random probability of 0.5 with a maximum limit of 0.2.Similarly, vertical flipping was applied with a random probability of 0.5.In the present work, horizontal flipping does not have any impact on the current dataset.Additionally, rotation was incorporated within a range of ±15 degrees on the collected dataset.The model was subsequently trained with these augmented datasets.Table 4 provides an overview of the performance metrics for the chosen Mask R-CNN with R50-FPN-3x architecture for augmented and non-augmented SFRs datasets.The results indicate that the performance of the ResNet-50-FPN architecture with annotations is notably inferior when compared to the non-annotated model, resulting in AP 50 scores of 50.147 % and 49.520 % for object detector bounding boxes and instance segmentation masks, respectively.The brightness and contrast technique did not yield any noticeable improvement in the performance.This suggests that the collected dataset already encompasses images with different light and weather conditions.Also, the collected dataset contains images with different orientations due to the rotation of ropes across the sheaves while collecting the dataset.Therefore, rotation techniques do not have any impact on the performance of the model.Similarly, other augmentation techniques, including flipping and resizing, did not contribute to enhancing the model's performance.In conclusion, neither of these augmentation approaches appeared to significantly enhance the model's overall performance.

Visual Evaluation
Figure 6 presents the segmentation mask and accuracy of the polygon applied to each instance in SFRs for the Mask R-CNN with ResNet-50-FPN architecture.The predicted polygon also has corresponding labels and a confidence percentage ranging from 0 to 100 % as low to high confidence of having a defect on the SFR.This confidence parameter (threshold value) provides an estimation for the detected defect to be a true positive (TP).In this case, the threshold value is set to 0.70.If the set value is very low, then the results may show a large number of non-secure detections.

Model Assessment
The performance of the model is illustrated in Figure 7. Overall, the provided results demonstrate the effectiveness of the model in accurately detecting defects in SFRs, particularly those with irregular and complex shapes.The high accuracy achieved, along with the observed improvements in the FP and FN curves, suggests the model's potential for practical application in defect detection tasks for SFRs.

Conclusion and Future Work
The paper presents a state-of-the-art Detectron2 framework that uses Mask R-CNN with ResNet50-FPN architecture for object detection and instance segmentation.We conducted a comparative analysis involving ResNet101-FPN and ResNet101-FPN architectures alongside ResNet50-FPN to determine the most suitable model for in-situ applications.
The effectiveness of the model was evaluated on an experimentally collected dataset obtained from introducing artificial damages on real SFRs.The model's performance demonstrates its precision in detecting damages across diverse categories, including those with irregular and complex shapes.The model highlights its potential for practical application in damage detection tasks for SFRs.During the model's training, a key challenge arises from the need for manual annotation to evaluate and understand the performance of each backbone architecture.Another constraint emerges due to the limited number of experimentally collected SFRs images for training the model.To enhance the model's performance, synthetic data may be incorporated into the dataset.This approach is anticipated to contribute substantially to elevating the model's effectiveness in condition monitoring applications.

Figure 3 :
Figure 3: Examples of data collected for rope with various damages.
Figure 5 depicts the average precision for Bbox and instance segmentation for ResNet-50-FPN-3x architecture for the experimentally collected dataset of SFRs.Also, detailed performance metrics of different backbone architectures for different IoU's and object sizes have been obtained on the custom dataset of SFRs.Table 3 presents the performance metrics for ResNet-50-FPN, ResNet-101-FPN and ResNeXt-101-FPN.In each case, the training was performed for 30,000 iterations with a base learning rate of 0.00025.It can be observed during the validation of backbone architectures, that the Mask R-CNN with ResNet-50-FPN backbone presented the best results for the precision metrics in less training time distributed for BBox and mask type in comparison to ResNet-101-FPN and ResNeXt-101-FPN backbone architectures.
1. Accuracy: Figure7(a) illustrates the training accuracy of the model which was found to be approximately 97.6%.The accuracy represents the percentage of correct predictions made by the model compared to the actual data.Out of the 451 test images, 418 were correctly identified as defective by the model with a high confidence score threshold of over 0.70 (threshold).2. False Positive (FP) Curve: Figure 7 (b) shows the FP curve, which represents the instances where the model identified a defect incorrectly (FP) as the training progressed.It can be observed that the FP curve decreases as the training progresses, indicating the model's improved ability to correctly detect defects with a high confidence score.3. False Negative (FN) Curve: Figure 7 (c) illustrates the false negative curve, which represents instances where the model failed to identify a defect (false negative) with a confidence score below 0.70.The FN curve decreased as the training steps increased, indicating that the model improved in minimizing false negatives and accurately identifying defects with high confidence.4. Loss: The total loss value and validation loss, depicted in Figure 7 (d) and (e), indicates the number of mistakes made by the model during each training or validation iteration.It can be observed that the total loss reduces to 0.18 after 30,000 iterations, indicating the model's improved performance over time.

Figure 7 :
Figure 7: Evaluation results for the proposed model.

Table 1 :
Layout of collected SFRs dataset.

Table 2 :
Comparison of the performance metrics of pre-trained models.The architectures are chosen based on the high AP (average precision) on object segmentation BBox's and instance segmentation masks.Table2presents the performance comparison of Mask R-CNN on the pre-trained dataset and custom dataset.The IoU evaluation metric has been used to measure the accuracy of the object detector given by equation (1).Here, the result of IoU greater than 0.5 (AP 50 ) is considered a good forecast.It can be observed that ResNeXt-101-FPN-3x shows high AP 50 for both BBox (44.3 %) and segmentation mask (39.5 %) with a comparatively large training time of 0.690 s/iter in comparison to other backbone models.Though ResNeXt-101-FPN has better AP 50 for BBox and mask on but takes longer time to train or predict.However, in the case of the custom dataset results show high AP50for BBox (77.009 %) and segmentation (77.975 %) in less training time of 0.1267 s/iter for ResNet-50-FPN-3x.Therefore, Mask R-CNN with ResNet-50-FPN-3x is chosen as the backbone architecture for the present work.
Initially, the paper compares the performance of Mask R-CNN witharchitectures respectively.The model's nomenclature follows a format:[backbone]-[feature]-[learning schedule], where the backbone represents the chosen DL neural network architecture, and the feature denotes the feature extraction methodology using (FPN).The training was performed using TL by initializing the model's weights by pre-training it on the COCO dataset for 3 epochs (notated as x3 learning schedule).

Table 3 :
Comparison of the performance metrics of different models in order to choose the best-suited model for in-situ application.