A Fast and Effective Image Preprocessing Method for Hot Round Steel Surface

With the development of computer vision technology, more and more enterprises begin to use computer vision instead of manual inspection for steel surface defect detection. However, classical image processingmethods often face great difficulties when dealing with images containing noise and distortions, which leads to low computational efficiency and poor accuracy of detection. In view of the particularity of hot round steel production, a computational intelligence method is proposed in this paper. On the basis of preliminary image preprocessing, we combine the improved PCA with genetic algorithm for feature selection and then use evolutionary computing and CUDA-based parallel computing to screen out the suspected defective image of round steel surface intelligently, quickly, and accurately. -is method can provide decision support for subsequent defect analysis and production process improvement.


Introduction
Digital image processing originated in the 1920s. In the 1950s, people began to study digital image systematically [1]. In the past 30 years, with the development of computer and other related fields, digital image processing has been widely valued and made great achievements in many fields. For example, in the field of industrial engineering, the computer vision is used for quality test [2][3][4][5]. Some mainstream scientists believe that the computing is based on computing devices and algorithms, and there are many problems that can be solved by people but cannot be solved by computing equipment [6]. But other scientists disagree. ey believe that the use of algorithm is not the only way to use computing devices [7]. As an important branch of artificial intelligence, the development of computer vision should draw lessons from and develop the ability of intelligent computing in artificial intelligence, such as knowledge learning and evolution, so as to break through the bottleneck of adaptability and intellectualization of computer vision.
As early as the 1950s, scientists have noticed that there may be a close relationship between human intelligence and machines. Subsequently, scientists tried to use machines to realize and simulate human intelligence and produced the subject of artificial intelligence. But up to now, the level of intelligence is still very limited [8]. In general, there are three basic fields of intelligent computing: fuzzy computing, neural computing, and evolutionary computing. In evolutionary computing, the genetic algorithm was founded in 1975 by professor Holland and his students at the University of Michigan. It was the first evolutionary computing method to be studied and applied and has been successfully applied to different fields and solved many problems, such as image processing [9][10][11] and data mining. Nowadays, the research on genetic algorithm is still continuing, and more and more scholars are involved in the research and application of genetic algorithm. Among them, the most important work is to improve the traditional genetic algorithm to solve the problem of high-dimensional computing. e surface defect detection technique initiated in the United States and Britain in the 1960s [12]. In the mid-1970s, Japan and Netherlands also joined the field of surface quality testing and improved the ability of defect detection [13]. In the 1990s, with the development of CCD technology, computer pattern recognition theory, artificial intelligence theory, and any other related technologies, the research work of steel surface automatic defect detection system is becoming more and more extensive and many representative technologies have emerged [14]. ere are many literature studies about the methods of industrial products surface defect detection. For example, Liu et al. proposed a CNNbased defect classification method, which can detect six kinds of strip steel defects with high accuracy and meet the real-time requirements of actual production line [15]. Soukup and Huber-Mörk trained CNNs on a database of photometric stereo images of metal surface defects; by using this method, defects of rail can be recognized early in order to take countermeasures in time [16]. In these two literature studies, CNN is used to automatically extract the image features. However, CNN needs a lot of training data and has poor interpretability, which restricts its application in the practical production line. Fekri-Ershad and Tajeripour proposed a method for detecting abnormalities in surface textures based on single dimensional local binary patterns. High detection rate and low computational complexity are advantages of the proposed approach [17]. Fekri-Ershad and Tajeripour also proposed a new noise-resistant and multiresolution surface quality detection method based on colour and texture features. is method has a high detection rate, low computational complexity, low sensitivity to noise, and rotation invariance [18].
ose literature studies rely on prior knowledge to select the features that best represent image information for defect detection. e disadvantage is that it may ignore other useful feature information and results in inaccurate classification results. At present, there are few literature studies on surface defect detection of round steel. In this case, it is necessary to find an effective method for surface defect detection of round steel. We hope the method can meet the requirements of the actual production line and can find out the features that can best reflect the image information comprehensively and accurately, so as to quickly and effectively identify the defect image.
Original images acquired from the image acquisition system are affected by different conditions, such as uneven brightness and noise. erefore, it is inefficient to use original images directly for defect detection. For this reason, the original images must be preprocessed at the early stage of image processing [19]. In general, the image preprocessing method can be divided into three processes: image graying, image geometric changing, and image enhancement. e purpose of image graying is to transform the colour images into grayscale images to reduce the amount of data. e purpose of image geometric changing is to correct the image error caused by the image acquisition system. e purpose of image enhancement is to improve the image effect, remove the background noise, expand the difference between different object features in the image, and improve the image quality [20]. At present, most image preprocessing literature studies are basically based on these three processes to optimize the image preprocessing methods. For example, Li and Liu proposed an improved image preprocessing method. In this method, image size normalization, median filtering, and image enhancement are adopted to achieve image denoising and enhancement.
is literature proposed some new ideas based on the traditional image preprocessing method [21]. Li et al. proposed another new image preprocessing method. is method takes full account of the skew, blur, and damaged images caused by various reasons and gets better image effect than the traditional preprocessing method [22]. However, the methods in these literature studies are all the innovations of the traditional preprocessing method. ey only consider the effectiveness of the methods in terms of image effect without considering the processing speed requirement.
erefore, in a modern steel production process, these preprocessing methods are difficult to meet the real-time requirements. e basic idea of parallel computing is to use multiple processors to solve the same problem cooperatively. e significance of parallelism is to shorten the time of problem solving, increase the scale of problem solving and get better solution. GPU processing scheme has the advantages of easy components, convenient programming, and high cost-effective, and it has been widely accepted and used to improving the processing speed [23][24][25]. In 2007, NVIDIA introduced the CUDA programming interface as a new hardware and software architecture for parallel computing. It regards GPU as a data parallel computing device and distributes and manages computing on it without mapping it to graphical APU [26]. With CUDA, GPU can be more easily used for general purpose computing [27]. Many literature studies have proved that using CUDA computing in image processing can improve processing efficiency [28][29][30][31][32][33][34][35]. For example, Zhan et al. proposed a fast CUDA-based image preprocessing method which includes image graying, Gaussian filtering, histogram equalization, and other processes of image preprocessing and achieved high-speed parallel processing. Experiments on images with different image resolutions were provided to prove the effectiveness of the method [35]. However, the experimental data used in these literatures are all from public database, without experimental support from actual production line. Some literature studies proved the efficiency of CUDA-based parallel computing in one or two specific processes in image preprocessing [36][37][38][39][40]. For example, Xia et al. proposed a CUDA-based image denoising method for steel plate and proved that the speed of the CUDA method was faster than that of the traditional CPU method [40].
By taking the surface images from a hot round steel production line as research objects, this paper proposed a fast and effective image preprocessing method. e main contributions of this research are as follows: (1) A preliminary preprocessing method is designed by considering the particularity of hot round steel surface images. CUDA is used in several processes in this preprocessing method for parallel computing, so that the high-quality round steel surface images can be obtained rapidly.
(2) An image screening algorithm based on intelligent computing is designed. We combine improved PCA with genetic algorithm for feature selection to solve the time-consuming problem of traditional genetic algorithm due to the high dimensions. CUDA is also used in this process, so as to improve the processing speed.
(3) A defect image screening algorithm is designed. rough evolutionary computing and CUDA-based parallel computing, the suspected defective images of hot round steel can be screened out intelligently, quickly, and accurately. e remainder of this paper is organized as follows: Section 2 describes the architecture of the work. Section 3 gives the preliminary preprocessing method by taking the images from a hot round steel production line as research objects, and the result images are provided at the end of each process. Section 4 introduces the image screening algorithm based on intelligent computing. e technological setup and the comparisons of experimental results are given in Section 5, and the comprehensive conclusion and future research directions are provided in Section 6.

Architecture of the Work
e general framework of the method proposed in this research is shown in Figure 1.
is article is organized from the image preliminary preprocessing method and the image screening algorithm. How to get the defect images rapidly and accurately is something that we need to consider in these two steps. In the image preliminary preprocessing step, effective region extraction and image denoising are the two processes we select in this article after considering the particularity of hot round steel surface images. In the process of effective region extraction, the Sobel operator is introduced for edge detection, and we use CUDA-based parallel computing here to reduce the processing time; the projection calculation is also adopted after edge detection to remove invalid and incomplete images. In the process of image denoising, illumination equalization is the prerequisite step. en, the Gaussian filter is introduced after noise component analysis, and we also use CUDA-based parallel computing here to reduce the processing time. In the image screening algorithm step, feature extraction and defect image screening are the two processes in this step. In the process of feature extraction, we firstly use improved PCA to reduce the feature dimensions, and then the features after PCA are selected as the input in genetic algorithm to solve the time-consuming problem and improve the accuracy. In the process of defect image screening, the decision algorithm step and the learning algorithm step based on evolutionary computing are contained. e CUDA-based parallel computing is also used here to reduce the algorithm time.

Image Preliminary Preprocessing
A simple surface defect detection system in the hot round steel production line is given in Figure 2. Two image preprocessing units cooperate in parallel, and then the images are transmitted to the cloud for defect recognition and analysis. As a prerequisite step in defect detection, image preprocessing is extremely important to realize rapid defect detection.
e original images of this research are the hot round steel surface images acquired by six linear-array CCD cameras.
e diameter of the round steel products is 14 mm-27 mm. e images are grayscale images. In order to facilitate image processing, the resolution of the images after captured with a specific size is normalized to 1024 × 1024. e example of the original image of 20 mm diameter round steel is shown in Figure 3. Figure 3 gives the original image (20 mm in diameter) of round steel taking by one CCD camera, where the effective region of the round steel is about 1/2. e effective region may vary according to the diameter of the round steel. e round steel is mostly in the centre of the image, with a black background on either side. e gray threshold algorithm is the mostly used effective region extraction method. However, due to the high surface temperature of hot round steel, the oil particles in the air and the complex construction site environment, the qualities of images taken by the cameras can be varied. e effect will be unstable if the gray threshold algorithm is used, which may cause background misjudgement. In this case, the Sobel operator is adopted in this research for edge detection by considering the particularity of surface images of hot round steel.

Effective Region Extraction.
e Sobel operator is one of the most important operators in edge detection, composed by two templates corresponding to the X-axis and Y-axis edges, respectively ( Figure 4). For each pixel in the image, these two templates are used for correlation calculation to obtain the edge of the image. When using f(i, j) as the input image, the output image f′(i, j) can be obtained by the following formula: where (1) f ′ 1 and f ′ 2 are the convolution in the horizontal (Xaxis) and vertical (Y-axis) directions, respectively; (2) |·| represents the absolute value.
Referring to the original images in Figure 3, in general, the effective edge of the image on the Y-axis is the two ends of image after normalized by the specific size (1024 × 1024 in this research). However, the effective edge on the X-axis needs to be obtained by the horizontal direction convolution Mathematical Problems in Engineering calculation with the Sobel operator. Traditionally, we can use the CPU method to get the result by serial processing of each pixel in the image. However, due to the high resolution of the image and large amount of computation, this research proposed a CUDA-based parallel Sobel edge detection method. In CUDA, the CPU is responsible for logical task processing and serial computation, while the GPU focuses on executing highly thread parallel structures. e CUDA thread structure has three important concepts: grid, block, and thread. e relationship among them is shown in Figure 5. e CUDA thread configuration affects the final execution effect, which is determined by the task characteristics and the computer hardware capacity. Multiple experiments are required to get the optimal method.
In the CUDA-based parallel Sobel edge detection method, we firstly introduce an edge pixels autocompletion technique for the pixels at the boundary position. e Sobel operator needs to read the edge pixels. However, for the pixels at the boundary position, the pixels at the top and left have exceeded the normal index range. In this case, pixels at the top and left need to be completed with appropriate value. Since the size of the Sobel operator is 3 × 3, Figure 6 gives the circle of pixels that needs to be completed. e purpose of edge pixels autocompletion is to make the boundary pixels    and nonboundary pixels get the same processing, so as to speed up the processing speed. Traditional edge pixels autocompletion methods include symmetry method, adjacent region replication method, and zero-padding method.
Because of the high resolution of the image, this research adopts the zero-padding method to fill the gray value of the boundary pixels as 0 directly. Secondly, an optimal CUDA thread configuration is designed after experiments. e size of the input image in this step is 1024 × 1024, and the data type is unsigned char. After fully considering the task characteristics and the computer hardware capacity, the optimal thread configuration obtained by the experiments is as follows: the size of block is 64 × 4, and the size of grid is 1 × 256. e 256 blocks process the entire 1024 rows of data. Because each pixel requires a domain operation, each block must pass six rows of data including the original three rows of image data into the shared memory. On the basis of this thread configuration, each thread in one black serially calculates the output at the position of four pixels (Figure 7). Because the adjacent pixels overlap the areas required for Sobel correlation operations, when the output of a position is calculated, it shifts to the right column by one. At this point, the first two columns have been read before, so only the last column needs to be read to calculate the output. In this case, the memory access can be reduced to improving the processing speed. Furthermore, experiments prove that this configuration conforms to the access requirements of share memory and does not result in bank conflict. erefore, the thread configuration proposed in this research is reasonable. e result of Sobel edge detection is shown in Figure 8: After obtaining the Sobel edge detection result, in order to avoid background misjudgement, it is necessary to use projection calculation to search the image from left and right sides to the middle to obtain more accurate and clear images of the effective region of round steel. Since the round steel is generally located in the centre of the image, in order to avoid misjudgement caused by interference factors, the left and right starting position of the search is set at GrayScaleX. e GrayScaleX is a configurable parameter, and a certain left and right region is skipped according to the diameter of the round steel. When there is a "mutation" in the projection, it is judged that the edge point of the round steel is reached, and the left (right) edge is defined at this position. Similarly, the top and bottom boundaries can be found, but no region is skipped. e parameters of GrayScaleX and GrayScaleY are shown in Table 1.
At the same time, invalid images need to be removed. If the width of the effective region of the image after projection calculation is less than the DropWidth or the height is less than the DropHeight, the image is considered invalid. e parameters of DropWidth and DropHeight are shown in Table 2: en, we can obtain the final effective region extraction result, as shown in Figure 9:

Image Denoising.
e surface image of hot round steel is collected by six cameras. e brightness of the image may vary with the angle of the camera, the intensity of the light source in the production site, and the diameter of the round steel, which will affect the effect of subsequent image analysis. erefore, before image processing, it is necessary to keep the brightness of all images consistent. e brightness of a gray image ranges from 1 to 256, so we normalized the brightness of all images to the mean value 128 to ensure the overall brightness of all images is uniform. In this research, we proposed an illumination equalization method based on column projection. e specific steps are as follows: (1) Use column projection to obtain the grayscale vector G of the image in the column: (2) Calculate the average gray value G of the images in the column: (3) Calculate the compensation coefficient K of the images in the column: (4) Multiply the value of each pixel by the compensation coefficient K.
is formula ensures that the obtained compensation coefficient will compensate the average grayscale value of each column to 128, and the difference between columns will not be too great. In this case, the overall brightness of the images can be improved and the details can be well retained. e result of illumination equalization method is shown in Figure 10: e noise component analysis is also necessary before image denoising. e images of round steel surface are collected using a linear-array CCD camera. In general, the zeromean Gaussian white noise can be used as the model noise inside the CCD camera [41]. At the same time, as the image acquisition is carried out on the production line, external factors such as vibration of the lens and material and coating of the round steel will affect the image quality. In addition, the imperfection and sudden failure of the image acquisition system will also cause the noise interference to the images. ese noises have a strong randomness, which is usually shown as the signal-independent Gaussian additive noise [42]. erefore, the noises in the round steel images are all signalindependent Gaussian additive noise. In this case, the Gaussian filter is selected as the basic algorithm for image denoising.
Gaussian filter is a linear smoothing filter, the template used is called a Gaussian kernel, and it can be calculated by formula. e core size is called the GaussSize, and it is a configurable parameter. e larger the GaussSize, the much smoother the image effect, but the more the details of image lost. e kernel size of the Gaussian filter is determined by the GaussSize parameter. For round steels with different diameters, the optimal GaussSize parameters obtained by experiments are shown in Table 3. is table contains the parameters of round steel with most diameters. If the round steel diameter is not in the table, look for the nearest parameter to use.
Traditionally, we can use the CPU method to get the result of Gaussian filter denoising by serial processing each pixel in the image. However, due to the high resolution of the image and large amount of computation, this research proposes a CUDA-based parallel Gaussian filter denoising method. is step is similar to (Section 3.1) the CUDA-based parallel Sobel edge detection method. e design of the edge pixels autocompletion method is the first step in this process. Because the Gaussian filter needs to read edge pixels, the pixels at the boundary position need to be completed according to the filter size. Traditional edge pixels autocompletion methods include symmetry method, adjacent region replication method, and zero-padding method. Because of the high resolution of the image, this research adopts the zero-padding method to fill the gray value of the boundary pixels as 0 directly.
Secondly, an optimal CUDA thread configuration is designed after experiments. e size of the input image in this step is 1024 × 512, and the data type is unsigned char. Taking the images of round steel with 15 mm-19 mm diameter as the example, the GaussSize is 7 and the blur radius is 3. After fully considering the task characteristics and the computer hardware capacity, the optimal thread configuration obtained by the experiments is as follows: the size of block is 16 × 8, and the size of grid is 1 × 64. Each thread processes 64 pixels. Based on this configuration, each thread in the block serially calculates the output at the position of 8 pixels. Because the adjacent pixels overlap the areas required for the Gaussian filter, when the output of a position is calculated, it shifts to the right column by one. At this point, the first six columns have been read before, so only the last column needs to be read to calculate the output. In this case, the memory access can be reduced to improving the processing speed. Furthermore, experiments prove that this configuration conforms to the access requirements of share memory and does not result in conflict. erefore, the thread configuration proposed in this research is reasonable. en, we can obtain the final effective region extraction result, as shown in Figure 11:

Image Screening Algorithm
e purpose of image screening algorithm is to divide the surface images of round steel into normal and defect images. e defect images will be numbered and uploaded to the  Mathematical Problems in Engineering database for subsequent defect recognition and analysis. In this step, feature selection and extraction is the most important part. e commonly used image features include colour feature, texture feature, shape feature, and spatial relationship feature. According to different purposes, there are great differences in the selection of features. e judgment of image screening is closely related to image feature selection.

Feature Extraction Based on Improved PCA and Genetic
Algorithm. Genetic algorithm is a highly parallel, stochastic, and adaptive optimization algorithm based on "survival of the fittest." By replication, crossover, and mutation, the "chromosome" group represented by the problem solution coding evolved from generation to generation, and eventually converged to the most suitable group, so as to obtain the optimal or satisfactory solution of the problem. Using the genetic algorithm to find the optimal feature subset of the problem space, which can greatly reduce the space of classification system and improve the search efficiency, is one of the mainstream ideas of intelligent computing at present. But one of the disadvantages of the traditional genetic algorithm is that it takes a long time to deal with and optimize the problem with high dimension. In this case, we propose a feature extraction method based on improved PCA and genetic algorithm. Firstly, we use an improved PCA method to reduce the feature dimension, and then the features after PCA are selected as the input features to reduce the time consumption of feature extraction in the genetic algorithm.

Improved PCA Method.
PCA is a multivariate statistical method, and it is one of the most commonly used dimension reduction methods. By orthogonal transformation, a group of variables that may be correlated can be transformed into a group of linear uncorrelated variables. e transformed variables are called the principle components. In this research, we propose an improved PCA method. We firstly use traditional PCA to extract the main features (principal component) of normal and defective samples, respectively, and then according to the projection residuals of these selected features in two principal component spaces, the expression of the difference values can be evaluated. e larger the difference is, the stronger the ability of selected features that can distinguish between the normal and defect samples. 1000 images including 200 labeled defect images are selected as the samples in this step. e main steps are as follows: to represent the normal sample matrix and defect sample matrix, respectively, where N is the number of two kinds of samples and Q is the total number of features. en, the PCA model of a normal sample is A e n � A n − repmat M n , N n , 1 , A e n � T n P n + E n , where A e n is the centralized matrix of A n , M n is the mean vector of normal samples, and repmat(M n , N n , 1) represents that the vector N n is used as a module, which is flattened and repeated into a matrix according to N n rows and one column. e number of rows of the matrix is N n , and the number of column is the length of Table 1: Parameters of starting position of left (right) and top (bottom) edge of column and row projection.
Diameter (mm)  14  15  16  17  18  19  20  21  22  23  24  25  26  27  DropWidth  970  970  970  970  970  970  970  970  970  970  970  970  970  970  DropHeight  160  170  190  200  210  220  230  240  250  250  250  260 270 280 vector M n . T n , P n , and E n represent the score matrix, load matrix, and residual matrix obtained by decomposition of principal components of normal samples, respectively. e number of features (principal components) we selected in this step is represented by K n . Similarly, we can get the PCA model of defect sample: (2) Establish the residual projection of the defect sample in the principle component space of the normal sample: where i represents the sample number, x f,i is the feature expression data of defect samples in normal samples, and T is the matrix transposition.
(4) Calculate the differences between the mean residual variance of the projection of the defect sample in the principle component space and the mean residual variance of the projection of the normal sample in the principle component space:

Feature Extraction Based on Genetic Algorithm.
After improving the PCA method, the first 32 features with the strongest distinguishing ability are obtained. In this step, we use these 32 features as the input features for feature extraction in order to reduce the time consumption in genetic algorithm. e main processes of the genetic algorithm are shown in Figure 12: e establishment of evaluation criterion is the key in this step. e purpose of this research is to find the defect images of round steel. erefore, we use the variance between different regions and within same region as the evaluation criterion. In this case, the fitness value calculation formula is designed to evaluate the advantage of the feature subset. Assuming the sample type is C, the number of samples in each type is N i , and p i represents the prior probability of each sample. en the mean vector m i and the total mean vector of each sample m can be calculated as follows: en, we can get the total discreteness matrix S d between different sample classes: Generate random initial population Evaluate fitness value for each individual Selection, crossover, mutation Criteria satisfied or termination conditions reached?
Put out the optimal data Performance testing N Y Figure 12: Genetic algorithm for feature extraction. and the total discreteness matrix S s within the sample class: e bigger the dispersion between different sample classes, the better the separateness; and the smaller the dispersion with sample class, the better the separateness, too. erefore, the fitness value calculation formula of the genetic algorithm for feature optimization is Other parameters are selected as follows: the group size M � 80, crossover probability P c � 0.8, and the mutation probability P m � 0.01.

Defective Image Preliminary Screening.
After the genetic algorithm, 16 dimensional feature vectors of the image are selected. However, when deciding whether an area is defective, in addition to analyzing the features of region, the image features around it must also be considered. In this case, this research proposed a parallel defective image preliminary screening method based on CUDA and evolutionary computing. 1000 images including 200 defective images are selected as the samples in this step. e main steps are as follows.
(1) Design of the decision algorithm of defect images: Step 1: input image P (1024 × 512) Step 2: divide image P into B blocks (B � 128) Step 3: divide each block into Z regions (Z � 64), and record the adjacent relations of these regions with the adjacency matrix Step 4: carry out 64 computations, parallel computing the following tasks: (2) Design of learning algorithm of defect images: Step 1: input defect image P′ (1024 × 512) Step 2: divide image P′ into B′ blocks (B′ � 128) Step 3: divide each block into Z′ regions (Z′ � 64) and record the adjacent relations of these regions with the adjacency matrix Step 4: record the corresponding area of defects Step 5: carry out 64 computations, parallel computing the following tasks: (i) Randomly generate a group of criteria according to T and set the initial global evaluation function MT: MT � T i − T n , where T i is the internal feature difference of one defective region and T n is the internal feature difference of one nondefective region. (ii) Calculate the evaluation function MT z : (a) If MT z < MT, the current judgment basis T is recorded and the MT z is updated as the global evaluation function MT (b) If MT z > MT, update the judgment according to T (iii) Return the last step, until the evaluation function value remains unchanged for 100 times (the difference value is less than 0.000001) Step 6: each calculation is synchronized to obtain the optimal judgment basis T and the threshold value MT to minimize the judgment error of the defect region Step 7: replace the input defective image, return to step 1, and calculate the new judgment feature T and threshold value MT, until the T and MT are stable Step 8: output the optimal judgment feature T and threshold value MT

Technological Setup.
e computer hardware configuration for program execution in this research is ① Graphic card: NVIDIA GeForce GTX 960 and ② CPU: Intel(R) Core(TM) i5-4660 3.20 GHz. More details are shown in Table 4.
According to the hardware and software configuration of the computer in Table 4, the computer performance parameters are shown in Table 5. e CUDA thread configuration proposed in this research is the optimal configuration after considering the selected computer performance.

Comparison of Experimental Results.
e main purpose of this research is to divide the surface images of round steel into normal and defective images. In order to facilitate the comparison, 2000 round steel surface images samples (including 300 defective images) from a hot round steel production line are selected.
ere are four common defect types in round steel surface: scratch defect, rolling skin, crack defect, and wire ear defect. We firstly use the traditional PCA method to detect the four kinds of defect samples, and get the detection rate. en the combination of improved PCA and the genetic algorithm proposed in this research is applied to the same samples. e experimental results are shown in Table 6:

Mathematical Problems in Engineering
We can see clearly from the table, traditional PCA cannot detect the rolling skin, crack defect and wire ear defect completely correctly, However, by combining genetic algorithm to reduce the dimension of the feature, the defect images can be screened out completely correctly. e reason is in traditional PCA, if the feature dimension is high but the training sample is small, the estimation of model parameters will be incorrect. In this case, we should combine the genetic algorithm to reduce the number of features to achieve better detection results.
At the same time, in order to prove the effectiveness of the parallel computing, the tasks of Sobel edge detection, Gaussian denoising and defect image preliminary screening are also carried out with OpenCV on the CPU side. e time comparison of CPU and CUDA is shown Table 7: As can be seen from Table 7, the CUDA-based parallel computing greatly improves the processing speed in the three tasks. In Sobel edge detection task, it gets the highest degree of parallelism, the acceleration ratio reaches 74.24.

Comprehensive Conclusion and Future Research
By taking the surface images from a hot round steel production line as research objects, this paper proposed a fast and effective image preprocessing method. Firstly, a preliminary preprocessing method is designed by considering the particularity of hot round steel surface images. en, an image screening algorithm based on intelligent computing is designed. We combine improved PCA with genetic algorithm for feature selection to solve the time-consuming problem of traditional genetic algorithm due to the high dimensions. Finally, a defect image screening algorithm is designed to efficiently screen out the normal images and the defect images. e CUDA-based parallel computing is also used in several processes to improve the speed of the method. Future research can be carried out from the following aspects: (1) e proposed image preprocessing method is based on the particularity of round steel surface images from a hot round steel production line. Future research can focus on combining CUDA parallel computing with other preprocessing processes by considering the task particularity, in order to get more accurate results. (2) e CUDA thread configuration and the computer capacity have great influence on the processing speed. In actual the production process, we should focus on the tasks particularity and computer capacity and find the more suitable CUDA thread configuration. (3) e genetic algorithm used in this paper is a classical intelligent computing algorithm. At present, many advanced intelligent algorithms are emerging, so we can combine these advanced intelligent algorithms with production practice to better serve the production. (4) Although the propose method in this research can quickly and effectively screen out the defect images from normal images, how to classify the identified defect type still needs to be matched with the appropriate classification method and actual production line to achieve better defect classification results.    (5) e proposed method can be applied to the steel surface images with high contrast, and the background texture of the image is similar. e applicability of this method to other images with low contrast and changeable background texture still needs to be verified in practice.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.