Utilization of Both Machine Vision and Robotics Technologies in Assisting Quality Inspection and Testing

Inspecting the smoothness and durability of the exterior and interior surfaces of manufactured products such as large passenger aircraft requires utilizing manufacturing precision instruments. Manual techniques are mainly adopted in inspections, which are, however, much costly and have low efficiency. Moreover, manual operation is more prone to missing and false inspections. To cope with these issues, effective precision instruments have been needed to devise intelligent inspections for impalpable damages as necessary tools. When many inspection tasks are investigated, very few public datasets are available to recognize the intelligent inspection of precision instruments. On the other hand, precision instruments can be applied to a wide variety of damages. YOLO V3 was proposed based on acquiring and processing the image appearances, which deal with the surface inspection of the precision instrument in this study. +e image dataset of the impalpable damage was initially established. More specifically, the YOLO V3 detection network is leveraged to roughly calculate the location of the damage appearances and identify the damage type. Afterward, the designed level set algorithm was employed to obtain more accurate damage locations in the image block utilizing the characteristics of different types of damages. Finally, a quantitative analysis was performed employing the refined detection results. A deep architecture that can intelligently conduct damage detection was proposed. Besides, the proposed method exhibited a strong tolerance for unknown types of damages and excellent flexibility and adaptability. Extensive experimental results demonstrated that the proposed method can alleviate the shortcomings of the traditional inspection methods. +us, it provided technical guidance for the applications of the intelligent orbital inspection of the robots. +e proposed method with a more precise and noninvasive inspection technique remarkably accelerates the inspection time of impalpable damages on surfaces.


Introduction
e manufacturing industry is the core of any national economy. Although China is a major manufacturer across the world, the qualities of various manufactured products and semimanufactured products are far behind the ones that are manufactured by developed countries such as the United States. In particular, there is a lack of surface quality, namely, the smoothness and precision of various important components in steel manufacturers. e surface quality of the main steel products such as automobile plates, home appliance plates, and decorative stainless steel plates plays an indispensable role in the assessment of the overall quality of manufactured products and can even affect their competitiveness in the market. By leveraging the traditional manual and visual inspection techniques to examine surface qualities of steel plates, it is highly probable to generate false-positive and wrong-negative detections related to surface defects such as the so-called "visual fatigue." is is the case in many steel companies that try to cope with when the quality issues of the steel surface are a concern.
is problem directly hurts economic benefits. Instead of conducting manual inspection techniques, for example, for the parts of aircraft by manufacturers, machine vision techniques can be utilized to detect those. Noticeably, the method of intelligently detecting and localizing the impalpable damages of the aircraft has the advantages of high efficiency, low cost, and does not include the interventions of human actions. However, more advanced techniques could be possibly needed to detect impalpable damages on the surfaces of the manufactured products, for example, machine vision and robotics applications concerning deep architecture tools.
A systematic description of a deep architecture can be briefed as follows. e hardware of the proposed method is mainly composed of a complementary metal oxide semiconductor camera (CMOS), lens, light source, and computer. More specifically, the CMOS cameras and lenses are responsible for real-time image acquisition that can convert optical images into digital signals featuring of small size, lightweight, and hardly influenced by external interference. erefore, it is pervasively used in industrial visual inspections. When concurrently considering the detection accuracy and the working distance, the Gray Dot Company's industrial camera called the Blackfly PoE GigE Color Camera (BFLY-PGE-31S4C-C) model is utilized. We choose the appropriate lens regarding the selected camera model and the size of the field view. e interface of the selected camera is based on C language; thus, the resolution size is fixed to 2048 × 1536. e surface size of the sensor target is assigned to 1/8″ in the implementation, and the size of the aluminum foil in the actual test is set to approximately 9 mm. us, the maximum target surface is set to 1/8″ and a fixed-focus lens with a focal length of 25 mm is utilized. e light source is an important module in the implementations and extensively determines the qualities of captured images [1,2]. A selection of a successful lighting source can substantially reduce the difficulty of the subsequent image processing. When considering many factors such as response speed, life span, and stability, the light source called light-emitting diode (LED) is selected in the proposed method. When the front side is used to highlight the wrinkles on the surface of the aluminum foil, a ring light source with uniform light is utilized. Besides, we incorporate a light source on the backside to make the edge of the outer contour of the aluminum foil more prominent. e visual inspection software primarily conducts image processing on the pictures acquired by the camera based on semantically analyzed results in the proposed method [3], and the interface of the visual inspection software is implemented by QT software under the VS2015 platform. To crunch the outputs of the processed pictures from the camera in real time, the software adopts a multithread mode. e main thread is responsible for the interface display of the entire software and controls the opening and closing of the second thread in the used software. e main interface includes the menu bar, image displaying area, and a display area. e menu bar contains the buttons with six different functions presented in Table 1.
e function of the second thread is to conduct the calculation of the camera communication and image processing. e camera communication module is mainly responsible for the image transmissions between the camera and the software. When the "turn on the camera" function button is activated in the main interface, the second thread gets started and the picture is read by the camera. Subsequently, the picture is transmitted to the image processing module that carries out different types of image processing tasks such as filtering, threshold segmentation, and edge extraction and, then, subsequently displays the extracted results in the image displaying area of the main thread. us, the obtained parameters are transmitted to the calculation of the camera module. Hence, the calculation module finalized the process.
In this study, we investigate the visual processing algorithm based on YOLO V3, which primarily resolves the problems of deep learning-based damage detections regarding the types of known damages to detect and analyze unknown damages by utilizing the extracted knowledge. Empirical evaluations have shown that the key techniques based on the conventionally conducted manual and visual inspections have some sort of weakness. On the other hand, the proposed method can remarkably accelerate the inspection time of impalpable damages of surfaces. Moreover, the proposed method can substantially improve the maintenance and inspection efficiency of the manufactured products by utilizing a few parameters.
With the combination of the charge-coupled device (CCD)-based applications and machine vision techniques, we propose a more precise and noninvasive inspection technique that can be leveraged to detect the surface quality of various products. e proposed method can seamlessly combine the high-speed CCD cameras with computer vision systems that determine the necessity of discovering deep visual features in visual quality prediction. en, it can timely provide both detected images and accurate feedback. Experiments have shown that the proposed method has become an effective tool to inspect the surface quality of manufactures products, for example, steel plates. e rest of the manuscript is organized as follows: Section 2 presents the related work mentioning the necessary hardware, software, and configurations to extract images and its digitalization process that leads to conducting machine vision. Section 3 presents one of the deep learning methods called the convolutional neural network and its fundamental structure. Section 4 introduces the proposed model with an experimental setup. Section 5 concludes the research.

Machine Vision Hardware.
Machine vision is a comprehensive concept including many techniques such as image processing, mechanical engineering technology, electric lighting source, optical imaging, digital video technology, computer software, and hardware technology [3]. An application of a typical machine vision involves . When the hardware cost and the interface difficulty of the platform operation were a concern, many systems leveraged the Open VINO as the development kit of the computer vision to process the captured images. Hence, it significantly impacted on the system performance. e platform requirements were relatively low when compared to other computer vision toolkits. Herein, we chose the Windows 10 with Intel 10-th Corel CPU. e video-capturing device was chosen as a handheld internet protocol (IP) camera. Network cameras were a modern stock of products that agreed with traditional cameras and reticulation video technology [1,3,4]. e apprehension of the video foreshadowing by the camera was a finger and compacted by a high-effectiveness compression turning. e detention video was transmitted to the web salver through the reticulation bus bar. Users on the plexus could use the browser to instantly invigilate the camera shown on the web salver. Meanwhile, the accredited users could also superintend the actions called result/incline crystalline of the camera or the exercise of the system configuration. IP fret camera was a digital scheme supported by the plexus transmission. e reticulation camera also had a netting interface production regarding the common compound of extraordinary video production utilizing the British National Corpus interface (BNC). Hence, this could instantaneously associate the camera to the topical local area network (LAN) for video transmission.

Deep Learning for Visual Modeling.
A convolutional neural network (CNN) was proposed by [5]. When the description of the shoal show is under investigation, the CNN appears to be an upper hand when in the case of inputting the meshwork as a conception or video. Each resemblance/video can be instantaneously placed as the input of the mesh in a dusky-spar journey in this text by voiding the obstacle of the old-fashioned optics modeling. e obscure erudition-supported shape essence and optical reconstruction have powerful advantages when manipulating idol/videos. e flexure can automatically catch pigment, interweave, conceive, and topological structure in the conception characteristics [6][7][8][9]. e mortally bluestocking form can be customized for uncertain optic labor. AlexNet is one of the most far-manner CNN nets to process the information of the system apparition, which not only makes the better authority of the CNN in the optic modeling but also aids the expanse of sagacious scholarship in address notice, the outgrowth of the native style, and the reinforcement lore [3]. e AlexNet issues convolutional neural reticulum to end and rank picture characteristics. e mesh form has a septenary stratum, comprehending a five course of convolutional coping and two seam of plenteously joined seam [7]. It relatively inserts the rectified linear input (ReLU) and dropout for data augmentation and pooling.
is cunning standard has five convolutional copings. ree convolutional couches are adjuncts to the greatest pooling course. From the structure of AlexNet, we can completely save AlexNet. e first footing is expressed as follows: the dimension of the volute nucleus is set to 11 × 11, the pace bigness is set to 4 × 4, and the scalar of the ravine is set to 96, which is succeeded by a real activating sine and provincial regularization.
en, the highest pooling couch with a volume nucleus dimension of 3 × 3 and a measured adjustment of 2 × 2 is ultimately incorporated. e subordinate couch can be shown as syn. e adjustment of the volute nucleus is set to 5 × 5, the action largeness is set to 1 × 1, the amount of passage is set to 256, and the last is the highest pooling sill with six-volume nucleus of 3 × 3 and a proceeding adjustment of 2 × 2.
e third footing, called quartern course, is set to 1/5 stratum: the gauge of the volute nucleus is set to 3 × 3, the footstep dimension is set to 1 × 1, and the enumeration of the ravine is partially 384, 384, and 256. After the convolutional coping, a pooling coping is typically incorporated. e pooling stratum can completely impair the gauge of the die, thereby, reducing the parameters in the extreme bond lift. Using the pooling lift can advance the summation of the characteristics and hinder the overbecoming proposition. e function of the fully connected layer optimally conducts the image classification. Different neural networks have different layers of the entire network, but the functions are the same in the practice. e number of neurons connected to the fully connected layer is greatly reduced after being processed by the convolutional layer and the pooling layer. Afterward, the final softmax output can be determined regarding the number of classification labels.

e Prediction Model of Visual Quality.
A video server can be examined as a plot of ranking the inventive videos [10][11][12][13][14]. us, the assessment of the video characteristics mainly turns to the assessment of the independent fallacy, in which the character of each settle is an estimate and the musty sarcastic video can be held by amalgamating the qualities of these rules. Peak signal-to-noise ratio (PSNR) is widely utilized for the disposition assessment of the videos. However, it is ignorant to calculate the source of the video's crookedness. However, the calculation of the PSNR is possessed on a prominent pixel dispute. us, the delusion remissness of reflection comments on the score of the visibility of the crookedness. erefore, the PSNR cannot always muse the unbiased mortal taste. In the transmission system of the web real-time communication (WebRTC) back videos, the rank assessment of no-advertent video acquits a sizable party in video transmission. Reference [8] proposed to allay the fame of the variegated and adjusted distortions of the measured video apart. en, the calling ratio of the probative video is multiplied by the design distortions. Reference [10] suggested the four pure parameters that an idol video describes to break down rubbish video profit. Reference [10] introduced an unnatural direct framework of no-deference video over an obstruct network. e opposite look aims to standardize the shut of treatise Mathematical Problems in Engineering losses of this kind. Reference [11] examined a divisive framework and quality profit without specification (QoS) feed to psychoanalyze the dash of network on video avowal. Reference [12] fitted a noise reduction (NR) video that overlooks the NORM system toward H.264/AVC-coded video to anatomize the boom of swell errors on videos. In the macroblock equitable, NORM contributes a valuation of the maxillary skeletal expansion (MSE) crookedness. NORM had a generous linear reciprocation with the crookedness dress by the full reference (FR) methods. Figure 1 shows the defect detection of the proposed framework that consists of three modules: (1) digital image acquisition, (2) embedded system for image preprocessing, and (3) defect detection based on deep learning.

The Proposed Method and Experiments
e proposed method consists of three modules, namely, image acquisition system, image processor, and convolutional neural network that defines the system. Module 1. e image acquisition system is an important part of the online quality inspection systems for the surface of the manufactured product, which functions as the "eye" to the system. According to the machine vision-based surface quality inspection system, the proposed method intelligently selects a top-performance linear array charged-coupled device camera (CCD) or an area array CCD camera and presents the comparison between linear array CCD camera and an area array CCD camera [4]. en, the system is applied to detect the surface quality of the plate on the cold rolling production line. e role of the linear array high-speed CCD camera is the sensor of the system. e Gigabit Ethernet camera (GigE) including a highspeed line scan camera leverages the CCD technique. Its speed is consistent with the Ethernet and the frame representation format.
e GigE camera generates frame data, wherein each frame is divided into multiple Ethernet packets during transmission. e frame is composed of the data header "Leader," data load "Payload," and data tail "Trailer." Each image is delivered to the data payload. e bandwidth of the camera is determined by the number of transmitted bytes and the transmission time. e number of bytes of each X bytes/frame is then calculated.
e payload is the image information; the package is the size of the Ethernet packet, and the payload (packet) denotes the number that is the closest to the next biggest integer. e overload is the overhead of the Ethernet packet. e value is the closest to the payload. e leader is set to 70 bytes, and the trailer is set to 66 bytes. e arrangement direction of the camera is perpendicular to the direction of the movement of the measured object. Moreover, the synchronizer simultaneously controls the starting time of each camera. e GigE camera first needs to set the host IP address and camera IP address. Besides, it decides the host network interface controller (NIC) and the address of the media access control (MAC) camera and subsequently sets the gateway and subnet mask. e system server broadcasts the starting signal through the Ethernet network, and the digital signal processing (DSP) receives the signal that is leveraged to control the camera to acquire images through the general-purpose input-output (GPIO). After completion, the server broadcasts the termination signal through the Ethernet network. Besides, the DSP controls the camera to acquire the image through GPIO. e choice of light source directly influences the quality of the image acquisition and largely determines whether the defects can be revealed in the image. Module 2. e choice of an embedded image processor is closely related to the real-time acquisition and processing of the defective surface images of the manufactured product. It is a bottleneck problem in a system design. It is necessary to customize a high-performing image processor since the real-time process of massive data requires fast data processing and calculation. us, the computer bus and parallel multicomputer network process jointly fulfill the task. However, this will inevitably cause the system to become complex, expensive, and difficult to maintain. erefore, we jointly utilize the GigE camera + DSP + a server for multiprocessor processing to conduct the tasks of collection, processing, transmission, and storage of massive data. e Texas Instruments' TMS320DM648 (DaVinci) digital signal processor (DSP) as the core module of the embedded image processing system is chosen in the implementation of the proposed method. TMS320DM648 is based on TI's latest TMS320C64x + DSP core. e 64x series of the DSP chips have extremely higher frequencies (up to 900 MHz) and abundant peripheral interfaces. It is fully compatible with the C64X series of the DSP target codes. e chip integrates resources such as secondary Cache, 64-bit EMIF interface, high-precision video port, Gigabit Ethernet interface, and an interintegrated circuit (IIC) bus interface [1]. e system chooses TMS320DM648DSP as the core device of the embedded image processing system, and the four tailored submodules are incorporated to realize different subfunctions. Each camera in the system is connected to the DSP through the Ethernet interface to conduct parallel processing. Also, each image is transmitted to the system server through the tailored embedded system. Noticeably, it is necessary to ensure a high speed of image collections and appropriate network topology to guarantee of not encountering conflict when implementations are conducted.

Digital image acquisition hardware
Embed system for image preprocessing Deep learning for defect detection Figure 1: Key modules in the proposed framework.

Mathematical Problems in Engineering
Module 3. Convolutional neural network (CNN) was proposed by [5] in 1998. e advantages of CNN significantly grow when the input of the network is an image set. When the image set can be directly fed into the deep network, the complexity of the conventional visual recognition models is avoided. e process of feature extractions and data reconstruction of the CNN exhibit competitiveness in the modeling of the 2D data. For example, the network can extract image features including color, texture, shape, and image topological structure by itself. It demonstrates exalted robustness and blaze computational effectiveness when the optical acknowledgment of the administration and the crookedness is invariant [4]. In the belles-lettres, AlexNet is the most ideal CNN cobweb workmanship that has established the authority of CNN on the data when the eyesight of the processor also excites the growth of the intricate erudition in the language notice, essential speech preserver, and reinforcement erudition. AlexNet trails a close-to-conclusion convolutional nerve plexus to end and classify resemblance shape. e fret form has a septenary couch, an intercept of the five strata of the convolutional sill, and a two-course of plentifully related courses [5]. It produces ReLU and dropout and begins data augmentation and pooling for data enlargement. For the five convolutional seams, three of them are supported on the workmanship sketch of AlexNet. Generally, AlexNet has a full of 5 convolutional belts and must fit as an unmixed banner to be built for manipulation of perception labors of many tempers [10].
We unravel the MobileNet fret by upgrading the test of AlexNet. e MobileNet can ameliorate the AlexNet under circumscribed ironmongery qualification [14]. It can lessen the number of parameters without losing precision. e usefulness of whippersnapper and jackanapes plexus can be observed instead of using the VGGNet in a single-shot multibox detector (SSD) as the fundamental convolutional nerve plexus. Figure 2 depicts the bare-bones twist makeup of MobileNet. On the other hand, Conv-Dww is a strongly separable volute construction that consists of an unmixed volume coping (depthwise layer, Dw) and a characteristic volume bed (pointwise sill, Pw). While Dw uses the intense twist coping of the 3 × 3 volume nucleus, Pw uses the inferior volume belt of the 1 × 1 twist nucleus. e effect of each volume is progressed by the batch normalization (BN) rule [14] and activates the cosine chastisement lineal one (ReLU). e BN rule settles the data allotment by coagulation of twoletter parameters, avoiding the disappearance of gradients and the setting of complex parameters.
e MobileNet convolution is calculated by e standard convolution is defined by where D k represents the broadness of the volute nucleus, M is an example of the numerousness of the input idol canal during volume, N depicts the content of the volute nucleus; DF is the latitude of the input appearance. Utilizing equations (1) and (2), MobileNet volume is only suffering from the summation of the flag volute. us, it is computationally cumbersome and is well subjugated. In a method to answer the haste requirements of the fault perception, we bind the two to rectify the computation velocity of the SSD. e MobileNet only uses 1/33 of the parameters of the ocular geometry cluster 16 (VGG-16) to affect the same assortment propriety when the computational burden is considered. Each belt has a quantity standard and ReLU nonlinearity when the drilling is in process, which mend the convergence of the MobileNet and the steadiness of birth form. e nonlinearity of ReLU mends the action of the shape correspondence. To hasten the volute trading operations, the speed of SSD is increased by going to 1.3 sets, and the nicety failure is only 0.1%. It more precisely causes the facture of the reticulation to discover a realistic age failure. e stronger characteristic of the proposed MobileNet workmanship is to disapprove the use of enumerating funds within the meshwork. It is conceded to wax the profoundness and broadness of the mesh while holding the computational side unchanged. e input has 4 ramifications, which are convolved or confederated with the percolate of the distinct gauge, and the extended form of the stitched together. Volume on manifold scales can have a quotation form of distinct scales. More shape richness degraded that the latest assortment decision is more exact. Before conducting 3 × 3 and 5 × 5 convolutions, we utilize a 1 × 1 volute to plan the diminution. e input has 112,000 images with one or multiple defective regions manually annotated in the training of MobileNet.
en, the MobileNet is used to learn the knowledge of detecting defective regions. Finally, the proposed model can annotate the defective region regarding the output of the learned Mobile Net, when a new test image is inputted. Herein, we use the mAP to measure the accuracy of defective detection. e mAP utilizes two metrics calculated by where TP and FP mean positive samples, respectively. Utilizing equations (3) and (4) lead to the calculation of the mAP by PRdR.
e pipeline of the proposed method is represented by Algorithm 1.
Mathematical Problems in Engineering 5

e Comparison of the Results of the Proposed Method with ose of the Different Deep Models.
We fine-tune the parameters of the proposed model concerning the well-known ImageNet [11] as a parameter tuning strategy. e optimizer uses the stochastic gradient descent algorithm (SGD) with an initial learning rate set to 0.001, momentum set to 0.9, weight decay set to 0.0005, and the batch size is fixed to 32 in the training process of the model. e newly incorporated convolutional network leverages the Xavier method for initialization. We set the number of iterations to 80,000, and the positive sample threshold is assigned to 0.4. e deep model-MobileNet training process is given as follows: we select the training set and the test sets from the VOC2007 dataset. e SSD-MobileNet model is utilized to detect the five types of targeted data as aforementioned.
e optimal results are obtained through multiple iterations through training and the adjustment of the parameters. e variation of the loss function during the training phase is shown in Figure 2. As a result, the stability of the loss function decreases when the iteration number goes up in both training and test phases. Table 2 presents the three different feature extraction networks, which are called ResNet18, VoVNet39, and ESPNetV2. e MobileNet proposed in this manuscript is employed to compare the accuracy of the defect detection. e mAP proposed by the MobileNet has a better result than do other deep networks used for feature extraction, which has 3.5% higher.
is better accuracy result designates the advantage of the MobileNet in the capture of the defective regions in the proposed method.
Hence, the speed of the defect detection is significantly higher than VoVNet39 and ESPNetV2, and it outperforms ResNet18 by over 2 frames per second. erefore, as the number of epoch iterations increases, the value of the loss function gradually concurrently decreases. Finally, when the number of iterations reaches 15,000, the loss value stabilizes.

4.2.
e Evaluation of the Objective Visual Quality. e evaluation of the objective visual quality is treated as the benchmark of the test of defect detection. Table 3 depicts that the performance of the proposed method is close to the standard subjective quality evaluation. e highly competitive performance of the proposed method suggests the necessity of discovering deep visual features in visual quality prediction. Herein, the architecture of the multiple deep quality evaluation is employed to engineer the deep features for both the training and test images. Afterward, we incorporate the important feature histogram-oriented gradients (HOGs) into the prediction model of the visual quality. e well-known gradient features are extracted to serve as the feature similarity measurement. To utilize these two features well, we concatenate them into a long vector rowwise. We compare the proposed method with the multiple IQA models and present the prediction results in Table 3.
us, the employed feature concatenation strategy is found to be more robust. To verify the generalization capability of the proposed method, we subsequently carried out a visual quality evaluation method on three well-known datasets.
ese results have demonstrated the advantage of the proposed algorithm.    Table 4 presents our rule of Asher competitive performance compared with the well-assumed IQA algorithms. SSIM that is betokened as an organic likeness measurement between the respective and judgment images is used. However, SSIM cannot accomplish effective exploitation in Vahan likeness rank assessment due to the obstacle of the constructional queue war of the vehicles. e aid characteristic is not supportive. Table 4 shows the results of three common gradient operators, and Table 5 shows the comparative results of different methods. e experiment is conducted under unalterable parameters. e SROCC record is gained by the three slope operators on the tuning dataset. e flower auction of the slope speculator is chosen. us, we cull Scharr speculator to calculate grade tips.

Ablation Study.
Generally speaking, the proposed method is comprised of three key modules: 1. digital image acquisition, 2. hardware-based image preprocessing, and 3. MobileNet to detect the defective regions. We verify the usefulness of each component by underweighting each of them. First, we replace our digital image acquisition module with a functionally reduced one in [12]. en, we report the defect detection accuracy in Table 6. e accuracy of the defect detection has decreased by more than 10% when module 1 is removed. is observation clearly shows the advantage of the first module presented in this study. Afterward, we evaluate the importance of the second module when removed. Table 6 presents the increased accuracy of the defect detection over 7% on the three datasets when the preprocessing operation is used, which shows the necessity of preprocessing the images before conducting defect detection.   We also replaced the MobileNet with different deep architectures presented in Table 7. While the MobileNet reached the best performance, the requirement of running time was noticeably the smallest for the MobileNet.

Conclusions
is study proposes a perception-aware MobileNet-SSD method that consists of three key modules, namely, (1) digital image acquisition, (2) hardware-based image preprocessing, and (3) MobileNet architecture to detect the defective regions or surfaces of wood-based products. MabileNet extracts more necessary features and uses few inceptions attached to multiple feature maps to improve the network's ability to distinguish different types of defects without losing precision. Besides, the speed of detecting defected products is significantly higher than those that are called VoVNet39 and ESPNetV2 and outperforms ResNet18 by over 2 frames per second.
Training the proposed model utilizing the constructed data of the surface defects helps classify and detect rough shavings, watermarks, and sand marks on the surface of wood-based panels by utilizing test data. e mAP of the network model in this study reaches over 0.88, and the detection speed is 60+ frames/s. By comparing the proposed feature detection model with the other three feature extraction networks, the proposed method has better efficiency and effectiveness and provides a reference solution for the real-time detection of the surface defects of the wood-based panels. Besides, hardware implementation makes the proposed method much more efficient than its counterparts.

Data Availability
Data will be provided by the corresponding author with the third party's request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.