Variation Mechanism of Three-Dimensional Force and Force-Based Defect Detection in Friction Stir Welding of Aluminum Alloys

As a direct reflection of the interaction between the stirring tool and the base metal in the friction stir welding process, the force signal is an important means to characterize welding quality. In this paper, the variation mechanism of three-dimensional force and its relation with welding quality were explored. The acquired signals were subject to interference from high-frequency noise, so mean filtering and variational mode decomposition were applied to obtain the real signals. The denoised signals were analyzed and the results showed that the traverse force was ahead of the lateral force by a ratio of π /4, while the phase difference between the axial force and the other two forces changed with the process parameters. Through application of the least square method and polynomial fitting, the empirical formulas of three-dimensional force were obtained, and these were applicable regardless of tunnel defects. The minimum value of the lateral force increased several times more than that of traverse force when the welding speed increased from 80 mm/min to 240 mm/min. When the pole radiuses of most data points had a value greater than 4, tunnel defects were highly likely to generate. In order to predict welding quality more accurately, a prediction model based on long short-term memory was constructed. The model recognized the various modes of good welds and tunnel defects with 100% accuracy. The identification ability for large and small defects was relatively poor, and the average accuracy of classifying the three categories of welding quality was 84.67%.


Introduction
As a solid welding technology, friction stir welding had the advantages of small stress deformation and high joint strength. During friction stir welding (FSW), the heat generated by the friction between the stirring tool and the workpiece brings the base material to a plastic softening state. Under the stirring and extrusion action of the stirring tool, the plasticized metal fills the cavity behind the tool, realizing the solid phase connection of the material [1]. As the friction stir welding process avoids the problems of pores and cracks that are easily generated in the fusion welding of aluminum alloys [2], it has been widely used in automobile manufacturing, aerospace technology, and other fields. However, when the process parameters are inappropriately selected, inner defects are prone to emerge, seriously affecting the welding quality [3]. Common post-welding detection methods such as X-ray detection and ultrasonic detection allow poor accessibility to complex structural parts [4,5]. Therefore, the development of real-time detection technology has been of great significance to improve production efficiency and achieve feedback control of welding quality.

Experimental Setup
Welding tests were conducted on a two-dimensional friction stir welding apparatus HT-JM16×15/2, capable of realizing the connection of aluminum alloys with a maximum thickness of 16 mm. In this study, 5 mm thickness 2A14-T6 aluminum alloy sheets with a size of 250 × 80 mm 2 were selected as base materials. The chemical composition is shown in Table 1. The upper and lower surfaces were polished by a steel brush to remove the aluminum oxide film before welding. The 5 mm length stir pin used in the experiment was made of H13 tool steel with a three-conical structure. The upper and lower diameters of the pin were 8 mm and 3.86 mm, respectively. The tool shoulder was concave with a diameter of 18 mm. During the welding process, three-dimensional force sensors were employed for synchronous acquisition of the traverse force, lateral force, and axial force. The schematic diagram of the measurement platform is shown in Figure 1, in which the x-direction coincides with the welding direction. The maximum measurement value of the axial force was 240 kN and the limit values of traverse force and lateral force were 160 kN, with measurement accuracy of 0.3%F.S. The three-dimensional force sensor was based on the principle of resistance strain, consisting of elastic elements, strain gauges, and a shell. Upon contact between the tool and the base material, strain was generated in the elastic element in the shell. The sensitive grid of the strain gauge pasted onto the elastic element underwent deformation, resulting in a change of resistance value proportional to the change of strain in the base material. Then, the resistance value was converted into a voltage signal by the Wheatstone bridge circuit. Finally, the voltage was processed by the amplifier circuit and the measurement collected by the data acquisition card. After being converted to the actual force values, the force data were saved in the computer terminal.
agram of the measurement platform is shown in Figure 1, in which the x-direction coincides with the welding direction. The maximum measurement value of the axial force was 240 kN and the limit values of traverse force and lateral force were 160 kN, with measurement accuracy of 0.3%F.S. The three-dimensional force sensor was based on the principle of resistance strain, consisting of elastic elements, strain gauges, and a shell. Upon contact between the tool and the base material, strain was generated in the elastic element in the shell. The sensitive grid of the strain gauge pasted onto the elastic element underwent deformation, resulting in a change of resistance value proportional to the change of strain in the base material. Then, the resistance value was converted into a voltage signal by the Wheatstone bridge circuit. Finally, the voltage was processed by the amplifier circuit and the measurement collected by the data acquisition card. After being converted to the actual force values, the force data were saved in the computer terminal.
During the experiments, the sampling frequency of three-dimensional force was set as 7.8 kHz. All welds were produced employing a tool-tilt angle of 2° and a plunge depth of 0.1 mm. In order to obtain defective and defect-free samples on the premise of sound surface-formation quality, the welding speed v ranged from 80 mm/min to 240 mm/min (with steps of 40 mm/min) and the rotational speed n ranged from 300 r/min to 600 r/min (with steps of 100 r/min). After welding, X-ray detection was employed to determine the presence of defects in the weld. Subsequently, several 5 mm × 20 mm metallographic samples were taken from the joint along the direction perpendicular to the welds. The samples were polished with sandpaper of 600 #, 800 #, 1000 #, 1200 #, and 1500 #, respectively, and then polished with a diamond polishing agent. To clearly observe the macroscopic morphology of the weld, Keller reagent was applied to corrode the welded joint for 30 s. The weld morphology was observed by optical microscopy to determine the internal quality of the joint.  During the experiments, the sampling frequency of three-dimensional force was set as 7.8 kHz. All welds were produced employing a tool-tilt angle of 2 • and a plunge depth of 0.1 mm. In order to obtain defective and defect-free samples on the premise of sound surface-formation quality, the welding speed v ranged from 80 mm/min to 240 mm/min (with steps of 40 mm/min) and the rotational speed n ranged from 300 r/min to 600 r/min (with steps of 100 r/min). After welding, X-ray detection was employed to determine the presence of defects in the weld. Subsequently, several 5 mm × 20 mm metallographic samples were taken from the joint along the direction perpendicular to the welds. The samples were polished with sandpaper of 600 #, 800 #, 1000 #, 1200 #, and 1500 #, respectively, and then polished with a diamond polishing agent. To clearly observe the macroscopic morphology of the weld, Keller reagent was applied to corrode the welded joint for 30 s. The weld morphology was observed by optical microscopy to determine the internal quality of the joint.

The Variation Mechanism of Three-Dimensional Force
A group of typical three-dimensional force signals obtained at welding speed of 160 mm/min and rotational speed of 600 rpm were selected for analysis. As shown in Figure 2a, the complete welding process was divided into five stages: plunge I, dwell, plunge II, travel, and retract. For the first 1.5 s, the stir tool was driven by the electric cylinder at the set rotational speed. From 1.5 s, the stir tool made contact with the base material, and the axial force increased rapidly with the increase of the plunge. Until 10 s, the stir pin completely entered the base metal, which was fully softened by the friction heat and deformation heat. Then, the axial force began to decrease and was maintained at a stable value. After remaining in position for a period of time, the stir tool was pressed down again, according to the set value under the control of the equipment operating program, at which moment the axial force surged again. The stir tool then started to travel, entering the formal welding stage. The axial force first increased and then gradually decreased to the dynamic stable value. Accordingly, the traverse and lateral forces increased and maintained dynamic stability. At the end of welding, the stir tool stopped moving and started to retreat, and the three forces decreased. When the tool was completely removed from the aluminum alloy, the three forces decreased to zero. The details of the three-dimensional force (the red box in Figure 2a) during stable welding are shown in Figure 2b. The blue curve represents traverse force Fx, the red curve lateral force Fy, and the yellow curve axial force Fz. The figure demonstrates that the three forces all showed periodic fluctuations even with the interference of noise, among which the waveform contours of Fx and Fy were more significant.

The Variation Mechanism of Three-Dimensional Force
A group of typical three-dimensional force signals obtained at welding speed of 160 mm/min and rotational speed of 600 rpm were selected for analysis. As shown in Figure  2a, the complete welding process was divided into five stages: plunge I, dwell, plunge II, travel, and retract. For the first 1.5 s, the stir tool was driven by the electric cylinder at the set rotational speed. From 1.5 s, the stir tool made contact with the base material, and the axial force increased rapidly with the increase of the plunge. Until 10 s, the stir pin completely entered the base metal, which was fully softened by the friction heat and deformation heat. Then, the axial force began to decrease and was maintained at a stable value. After remaining in position for a period of time, the stir tool was pressed down again, according to the set value under the control of the equipment operating program, at which moment the axial force surged again. The stir tool then started to travel, entering the formal welding stage. The axial force first increased and then gradually decreased to the dynamic stable value. Accordingly, the traverse and lateral forces increased and maintained dynamic stability. At the end of welding, the stir tool stopped moving and started to retreat, and the three forces decreased. When the tool was completely removed from the aluminum alloy, the three forces decreased to zero. The details of the three-dimensional force (the red box in Figure 2a) during stable welding are shown in Figure 2b. The blue curve represents traverse force Fx, the red curve lateral force Fy, and the yellow curve axial force Fz. The figure demonstrates that the three forces all showed periodic fluctuations even with the interference of noise, among which the waveform contours of Fx and Fy were more significant. For the friction stir welding process, the frequency of low frequency useful signal and the time of high frequency noise signal are major concerns. Thus, it was necessary in this study to analyze the signal in the time-frequency domain. Since wavelet transform has the characteristics of window adaptivity, i.e., high-frequency signals have high time resolution and low-frequency signals have high frequency resolution, this model was adopted to process the three forces. The results are shown in Figure 3a-c, respectively. It is known that the wavelet coefficient is the convolution of the window function and the wavelet. When the window was at the edge of the signal, the signal was forced to fill zero at the edge, which was specifically manifested in the time-frequency diagram as frequency widening and a decrease of signal intensity. In order to determine the influence of this edge effect, a curve was drawn as indicated with white dashed lines in Figure 3a-c. The signal inside the curve had little or no edge effect, while the signal outside the curve For the friction stir welding process, the frequency of low frequency useful signal and the time of high frequency noise signal are major concerns. Thus, it was necessary in this study to analyze the signal in the time-frequency domain. Since wavelet transform has the characteristics of window adaptivity, i.e., high-frequency signals have high time resolution and low-frequency signals have high frequency resolution, this model was adopted to process the three forces. The results are shown in Figure 3a-c, respectively. It is known that the wavelet coefficient is the convolution of the window function and the wavelet. When the window was at the edge of the signal, the signal was forced to fill zero at the edge, which was specifically manifested in the time-frequency diagram as frequency widening and a decrease of signal intensity. In order to determine the influence of this edge effect, a curve was drawn as indicated with white dashed lines in Figure 3a edge effect. It was found that the amplitudes of all three forces were at their maximum at 10 Hz, at which the amplitude of the traverse force Fx was the greatest that of the axial force Fz the smallest. In the range 10 Hz-100 Hz, the amplitude of Fx was smallest, while that of Fz was largest. Moreover, the existence time of high-frequency information in Fz was significantly longer than that in the other forces. showed greater edge effect. It was found that the amplitudes of all three forces were at their maximum at 10 Hz, at which the amplitude of the traverse force Fx was the greatest that of the axial force Fz the smallest. In the range 10 Hz-100 Hz, the amplitude of Fx was smallest, while that of Fz was largest. Moreover, the existence time of high-frequency information in Fz was significantly longer than that in the other forces. In order to remove the interference of high frequency noise and obtain the essential characteristics of the force signal, the smoothing denoising method was employed to process the original three-dimensional force signal. Specifically, a sliding window of length k was applied to divide adjacent elements in the signal, and a mean array of local k points was calculated. When the number of elements in the window was less than the defined window length, the elements within the range of window length were automatically intercepted as the end points of the interval, and only the mean value of the elements filling the window position was calculated. In this paper, k was selected as 120, i.e., the mean value calculation was performed on 120 points, using a sliding window. The results after denoise processing are shown in Figure 3d-f, respectively. It was observed that the graphs for the three denoised forces were similar to sine curves. The spectrum of denoised Fz was calculated, and the result shown in Figure 3g. It was obvious that the denoised Fz continued to be affected by high-frequency signals.
In order to confirm the physical significances of high-frequency components indicated in Figure 3g, the plunge force signal was collected before welding. This acquired signal indicated the noise generated by the sensor and the surrounding environment, as shown in Figure 4a. To analyze the components, fast Fourier transform was conducted to obtain the frequency spectrum, as shown in Figure 4b. It was seen that the biggest difference between pre-welding and stable welding was in the signal component that fluctuated at 10 Hz, which represented the real plunge force in the welding process. To separate the  In order to remove the interference of high frequency noise and obtain the essential characteristics of the force signal, the smoothing denoising method was employed to process the original three-dimensional force signal. Specifically, a sliding window of length k was applied to divide adjacent elements in the signal, and a mean array of local k points was calculated. When the number of elements in the window was less than the defined window length, the elements within the range of window length were automatically intercepted as the end points of the interval, and only the mean value of the elements filling the window position was calculated. In this paper, k was selected as 120, i.e., the mean value calculation was performed on 120 points, using a sliding window. The results after denoise processing are shown in Figure 3d-f, respectively. It was observed that the graphs for the three denoised forces were similar to sine curves. The spectrum of denoised Fz was calculated, and the result shown in Figure 3g. It was obvious that the denoised Fz continued to be affected by high-frequency signals.
In order to confirm the physical significances of high-frequency components indicated in Figure 3g, the plunge force signal was collected before welding. This acquired signal indicated the noise generated by the sensor and the surrounding environment, as shown in Figure 4a. To analyze the components, fast Fourier transform was conducted to obtain the frequency spectrum, as shown in Figure 4b. It was seen that the biggest difference between pre-welding and stable welding was in the signal component that fluctuated at 10 Hz, which represented the real plunge force in the welding process. To separate the components from the signal, empirical mode decomposition (EMD), variational mode decomposition (VMD), and wavelet transform may be applied [22,23]. Among them, both VMD and EMD perform adaptive signal decomposition without being affected by sampling frequency. Since the EMD algorithm has some shortcomings such as modal aliasing and edge effect, the VMD method was used in the current study.
where ( ) = ′ ( ) is instantaneous frequency, ( ) is the Dirac function, * the volution operator. By introducing a Lagrange multiplication operator, the above prob was transformed into unconstrained model problems and all intrinsic mode data w obtained. VMD was carried out on pre-welding signals and five IMFs were obtaine shown in Figure 4c. The fifth IMF center had the largest energy, which was consistent the power spectrum results for 127.5198 Hz frequency resolution and 20.1282 ms resolution, as shown in Figure 4d. It was assumed that the original signal consisted of a series of narrow-band signals with a central frequency, i.e., the original signal sequence f consisted of K intrinsic mode components µ k (t). These intrinsic mode components, also called intrinsic mode functions (IMFs), were a group of discrete signals. Each IMF had a different bandwidth in the time-frequency spectrum, expressed as follows: where A k (t) is the instantaneous amplitude of µ k (t), ϕ k (t) indicates a non-monotone decreasing phase function. The process of adaptive decomposition was applied to solve the variational problem, which required the minimum sum of estimated bandwidths of all modes. The constraint condition was that the sum of all modes was equal to the original signal, expressed by the formula as follows: where ω k (t) = ϕ k (t) is instantaneous frequency, δ(t) is the Dirac function, * the convolution operator. By introducing a Lagrange multiplication operator, the above problem was transformed into unconstrained model problems and all intrinsic mode data were obtained. VMD was carried out on pre-welding signals and five IMFs were obtained, as shown in Figure 4c. The fifth IMF center had the largest energy, which was consistent with the power spectrum results for 127.5198 Hz frequency resolution and 20.1282 ms time resolution, as shown in Figure 4d. Variational mode decomposition was performed on the axial force signal, as shown in Figure 3f, and five intrinsic mode functions were obtained, illustrated in Figure 5. The morphology of the first three IMFs was highly consistent with that of the signal illustrated in Figure 4c. Therefore, it was determined that curves in Figure 5a-c corresponded to the high-frequency components of the noise signal in Figure 4c. The curve in Figure 5d represents the low-frequency interference on the real signal. Figure 5e,f indicate the real signal and its frequency spectrum, respectively. The axial force signal after VMD had no interference and the curve was smooth.  Variational mode decomposition was performed on the axial force signal, as shown in Figure 3f, and five intrinsic mode functions were obtained, illustrated in Figure 5. The morphology of the first three IMFs was highly consistent with that of the signal illustrated in Figure 4c. Therefore, it was determined that curves in Figure 5a-c corresponded to the high-frequency components of the noise signal in Figure 4c. The curve in Figure 5d represents the low-frequency interference on the real signal. Figure 5e,f indicate the real signal and its frequency spectrum, respectively. The axial force signal after VMD had no interference and the curve was smooth. Without loss of generality, three-dimensional signals under the other two sets of process parameters were investigated to determine whether forces were in periodic sinusoidal fluctuation. The weld and X-ray test results obtained are shown in Figure 6a,c, respectively. The weld surface was observed to be well formed and there were no internal defects when processed under a rotational speed of 400 rpm and welding speed of 200 mm/min. Under a rotational speed of 300 rpm and welding speed of 120 mm/min, the weld surface was again well formed, but there were tunnel defects inside. Three force signals in the stable welding stage were selected and processed, as shown in Figure 6b,d. It was found that the three force signals under various process parameters showed periodic fluctuations with a consistent fluctuation period that was related to the rotational speed; the higher the rotational speed, the shorter the period. In each process, the traverse force Fx exceeded the lateral force Fy π /4 in the phase angle. As shown in Figure 6b,d, the arrow means the phase difference between the traverse force and the lateral force, while the phase difference between the axial force and the other two forces changed with the process parameters. Without loss of generality, three-dimensional signals under the other two sets of process parameters were investigated to determine whether forces were in periodic sinusoidal fluctuation. The weld and X-ray test results obtained are shown in Figure 6a,c, respectively. The weld surface was observed to be well formed and there were no internal defects when processed under a rotational speed of 400 rpm and welding speed of 200 mm/min. Under a rotational speed of 300 rpm and welding speed of 120 mm/min, the weld surface was again well formed, but there were tunnel defects inside. Three force signals in the stable welding stage were selected and processed, as shown in Figure 6b,d. It was found that the three force signals under various process parameters showed periodic fluctuations with a consistent fluctuation period that was related to the rotational speed; the higher the rotational speed, the shorter the period. In each process, the traverse force Fx exceeded the lateral force Fy π /4 in the phase angle. As shown in Figure 6b,d, the arrow means the phase difference between the traverse force and the lateral force, while the phase difference between the axial force and the other two forces changed with the process parameters.
In order to obtain accurate force expression, the least square method was applied to fit the nonlinear three-dimensional force signal. IMF5 obtained by VMD was used for the fitting of Fz. After fitting, the R-squared value was used to determine whether the fitting was good or bad, the expression of which is shown in Formula (3). In the formula, y i represents the original data, y indicates the mean of the original data, andŷ i the fitted data. A value closer to 1 indicates a better fit. The specific results of fitted force signals with the process parameters in Figure 6c are shown in Table 2  the results are shown in Table 3. It was found that the goodness of fit was higher than 0.95 no matter whether defects were produced. In order to obtain accurate force expression, the least square method was applied to fit the nonlinear three-dimensional force signal. IMF5 obtained by VMD was used for the fitting of Fz. After fitting, the R-squared value was used to determine whether the fitting was good or bad, the expression of which is shown in Formula (3). In the formula, represents the original data, indicates the mean of the original data, and the fitted data. A value closer to 1 indicates a better fit. The specific results of fitted force signals with the process parameters in Figure 6c are shown in Table 2 and Figure 7. It can be seen that the periods of Fx and Fy were approximately equal to 0.15 s. For the Fz signal, a sinusoidal signal with a period of 0.166 s was superimposed on the sine wave with a period of 0.15 s. Similarly, the force signal with the process parameters shown in Figure 6d was fitted, and the results are shown in Table 3. It was found that the goodness of fit was higher than 0.95 no matter whether defects were produced.

Effect of Process Parameters on Welding Quality and Force
According to the above section, the in-plane force signals can be visualized as follows: where A and B represent the coefficient term and constant term of Fx, C and D the coefficient term and constant term of Fy, and indicate the rotational speed and phase angle, respectively. In order to clarify the influence of process parameters on force parameters A, B, C and D, the force parameters under 20 groups of processes were calculated and the results are shown in Table 4. On this basis, the relationship between constant terms and the ratio of rotational speed to welding speed was analyzed, as shown in Figure 8a,b. It was seen that the constant term of Fx was approximately linearly correlated with the ratio, but the rule connecting the constant term of Fy and the ratio was not obvious. The relationship between the coefficient terms and the ratio was also analyzed. Unfortunately, the law was again not obvious. Therefore, the polynomial fitting method was adopted to

Effect of Process Parameters on Welding Quality and Force
According to the above section, the in-plane force signals can be visualized as follows: where A and B represent the coefficient term and constant term of Fx, C and D the coefficient term and constant term of Fy, n and θ indicate the rotational speed and phase angle, respectively. In order to clarify the influence of process parameters on force parameters A, B, C and D, the force parameters under 20 groups of processes were calculated and the results are shown in Table 4. On this basis, the relationship between constant terms and the ratio of rotational speed to welding speed was analyzed, as shown in Figure 8a,b. It was seen that the constant term of Fx was approximately linearly correlated with the ratio, but the rule connecting the constant term of Fy and the ratio was not obvious. The relationship between the coefficient terms and the ratio was also analyzed. Unfortunately, the law was again not obvious. Therefore, the polynomial fitting method was adopted to solve the relationship between force parameters and process parameters. The results are shown in Figure 9. The empirical formulas obtained are shown in Equations (6)- (9), and the accuracy of fitting is represented in Table 5. SSE is the sum of squares due to error, measuring the deviation of the responses from the fitted values. RMSE is the root mean squared error. For both of these, a value closer to 0 indicated a better fit.
A(v, n) = 27.45 − 0.5143n − 0.1603v + 0.003726n 2 + 0.002329vn + 0.000418v 2 (6) solve the relationship between force parameters and process parameters. The results are shown in Figure 9. The empirical formulas obtained are shown in Equations (6)- (9), and the accuracy of fitting is represented in Table 5. SSE is the sum of squares due to error, measuring the deviation of the responses from the fitted values. RMSE is the root mean squared error. For both of these, a value closer to 0 indicated a better fit.    In addition, the influence of process parameters on characteristics of in-plane forces was also investigated, including maximum and minimum values as well as amplitude increase, as shown in Figure 10. It was observed that under the same rotational speed, the maximum and minimum values of in-plane forces increased with the increased welding speed. Although the value of the traverse force Fx was greater, the minimum value of the lateral force Fy increased several times more than that of Fx. When the rotational speed was 600 r/min and the welding speed increased from 80 mm/min to 240 mm/min, the minimum value of Fx increased 1.789-fold and that of Fy increased 5.157-fold. In general, with the increase of the welding speed, the amplitude of in-plane forces tended to decrease, and the decrease of Fy was more pronounced than that of Fx.  In addition, the influence of process parameters on characteristics of in-plane forces was also investigated, including maximum and minimum values as well as amplitude increase, as shown in Figure 10. It was observed that under the same rotational speed, the maximum and minimum values of in-plane forces increased with the increased welding speed. Although the value of the traverse force Fx was greater, the minimum value of the lateral force Fy increased several times more than that of Fx. When the rotational speed was 600 r/min and the welding speed increased from 80 mm/min to 240 mm/min, the minimum value of Fx increased 1.789-fold and that of Fy increased 5.157-fold. In general, with the increase of the welding speed, the amplitude of in-plane forces tended to decrease, and the decrease of Fy was more pronounced than that of Fx.  After welding, the cross-section of the weld was observed with an optical microscope to detect defects. In order to accurately measure the size of defects in the weld, image processing technology [24] was applied to calculate the area characteristics of the weld zone and defect zone respectively, as shown in Figure 11. Taking the calculation process of the weld area as an example, the region of interest was first extracted, and then the color image was transformed into a gray image through gray change. Since the dividing line between the retreating side and the advancing side was fuzzy, histogram equalization technology was applied to change the gray level of each pixel in the image and enhance the contrast of the image within a small dynamic range. Then, the binarization operation was conducted to convert each pixel of the image into 0 or 1. In order to separate the weld area, the binarized metallographic images were processed by morphological processing methods of corrosion and expansion. Then, the median filter was applied to eliminate the noise, and the Sobel operator was employed to extract the edge contour. In view of the discontinuity at the bottom of the weld, the random sample consensus algorithm was adopted to fit the contour after deleting some of the data points. Finally, the number of pixels in the weld zone S1 was calculated by filling and reverse operation. Defect imaging was relatively simple, the defect pixel number S2 could be obtained by extracting the region of interest, gray transformation, image enhancement, and binarization. Thus, the area ratio of defects to the weld was obtained.  After welding, the cross-section of the weld was observed with an optical microscope to detect defects. In order to accurately measure the size of defects in the weld, image processing technology [24] was applied to calculate the area characteristics of the weld zone and defect zone respectively, as shown in Figure 11. Taking the calculation process of the weld area as an example, the region of interest was first extracted, and then the color image was transformed into a gray image through gray change. Since the dividing line between the retreating side and the advancing side was fuzzy, histogram equalization technology was applied to change the gray level of each pixel in the image and enhance the contrast of the image within a small dynamic range. Then, the binarization operation was conducted to convert each pixel of the image into 0 or 1. In order to separate the weld area, the binarized metallographic images were processed by morphological processing methods of corrosion and expansion. Then, the median filter was applied to eliminate the noise, and the Sobel operator was employed to extract the edge contour. In view of the discontinuity at the bottom of the weld, the random sample consensus algorithm was adopted to fit the contour after deleting some of the data points. Finally, the number of pixels in the weld zone S 1 was calculated by filling and reverse operation. Defect imaging was relatively simple, the defect pixel number S 2 could be obtained by extracting the region of interest, gray transformation, image enhancement, and binarization. Thus, the area ratio of defects to the weld was obtained. After welding, the cross-section of the weld was observed with an optical microscope to detect defects. In order to accurately measure the size of defects in the weld, image processing technology [24] was applied to calculate the area characteristics of the weld zone and defect zone respectively, as shown in Figure 11. Taking the calculation process of the weld area as an example, the region of interest was first extracted, and then the color image was transformed into a gray image through gray change. Since the dividing line between the retreating side and the advancing side was fuzzy, histogram equalization technology was applied to change the gray level of each pixel in the image and enhance the contrast of the image within a small dynamic range. Then, the binarization operation was conducted to convert each pixel of the image into 0 or 1. In order to separate the weld area, the binarized metallographic images were processed by morphological processing methods of corrosion and expansion. Then, the median filter was applied to eliminate the noise, and the Sobel operator was employed to extract the edge contour. In view of the discontinuity at the bottom of the weld, the random sample consensus algorithm was adopted to fit the contour after deleting some of the data points. Finally, the number of pixels in the weld zone S1 was calculated by filling and reverse operation. Defect imaging was relatively simple, the defect pixel number S2 could be obtained by extracting the region of interest, gray transformation, image enhancement, and binarization. Thus, the area ratio of defects to the weld was obtained. Figure 11. Calculation process of the area ratio of defects to the weld. Figure 11. Calculation process of the area ratio of defects to the weld. The weld cross-sections and the area ratios of defects to the weld under 20 groups of process parameters are shown in Figure 12. It was observed that when the rotational speed was low and the welding speed was high, insufficient friction heat resulted in the material around the stir tool failing to reach a completely plastic softening state. The material could not fill the cavity with the movement of the stir tool, resulting in tunnel defects. In this paper, we define defects within 1% area ratio as minor defects, and those over 1% as serious defects. According to Figure 12, it was concluded that 11 groups were in good shape and nine groups had defects, including four groups with mild defects and five with severe defects. On average, about 60 groups of force data were extracted from each weld for modeling and analysis. The weld cross-sections and the area ratios of defects to the weld under 20 groups of process parameters are shown in Figure 12. It was observed that when the rotational speed was low and the welding speed was high, insufficient friction heat resulted in the material around the stir tool failing to reach a completely plastic softening state. The material could not fill the cavity with the movement of the stir tool, resulting in tunnel defects. In this paper, we define defects within 1% area ratio as minor defects, and those over 1% as serious defects. According to Figure 12, it was concluded that 11 groups were in good shape and nine groups had defects, including four groups with mild defects and five with severe defects. On average, about 60 groups of force data were extracted from each weld for modeling and analysis. The polar diagram of in-plane force signals (at a welding speed of 80 mm/min) was analyzed at different rotational speeds, and the results are shown in Figure 13a. The details represented by points A, B, C and D in Figure 13a are shown in Figure 13b. It was found that there were no defects under these four rotational speeds, and the pole radiuses were all less than 4. Polar coordinate diagrams of in-plane forces under different processes were also analyzed, as shown in Figure 14. According to the results in Figure 12, it was found that when the polar radiuses of most data points in the polar coordinate system were greater than 4, the weld was highly likely to include defects. The polar radius in the polar coordinates is expressed as Formula (10). As shown in Figure 10, the plane forces (traverse force and lateral force) increased with the increase of welding speeds, and decreased with the increase of rotational speeds. Therefore, the polar radius was relatively larger when the tunnel defects occurred. The polar diagram of in-plane force signals (at a welding speed of 80 mm/min) was analyzed at different rotational speeds, and the results are shown in Figure 13a. The details represented by points A, B, C and D in Figure 13a are shown in Figure 13b. It was found that there were no defects under these four rotational speeds, and the pole radiuses were all less than 4. Polar coordinate diagrams of in-plane forces under different processes were also analyzed, as shown in Figure 14. According to the results in Figure 12, it was found that when the polar radiuses of most data points in the polar coordinate system were greater than 4, the weld was highly likely to include defects. The polar radius in the polar coordinates is expressed as Formula (10). As shown in Figure 10, the plane forces (traverse force and lateral force) increased with the increase of welding speeds, and decreased with the increase of rotational speeds. Therefore, the polar radius was relatively larger when the tunnel defects occurred.

Recognition of Internal Defects Based on LSTM
In order to build a more accurate quality-recognition model, a neural network was trained on the data described above. General neural networks process only a single input; the previous input and the next input are completely unrelated. The variations of threedimensional forces in friction stir welding are continuous, and the variation characteristics within a period reflect the material flow. Therefore, a diagnosis model of three-dimensional forces in which the previous input and the later input were related was required to process the sequence information. According to the literature reported by Yu et al. [25], a recurrent neural network (RNN) is a neural network for processing sequential data, which effectively extracts temporal information from data. Unlike general neural networks, the values of the hidden layer in the RNN at each moment are determined not only by the

Recognition of Internal Defects Based on LSTM
In order to build a more accurate quality-recognition model, a neural network was trained on the data described above. General neural networks process only a single input; the previous input and the next input are completely unrelated. The variations of threedimensional forces in friction stir welding are continuous, and the variation characteristics within a period reflect the material flow. Therefore, a diagnosis model of three-dimensional forces in which the previous input and the later input were related was required to process the sequence information. According to the literature reported by Yu et al. [25], a recurrent neural network (RNN) is a neural network for processing sequential data, which effectively extracts temporal information from data. Unlike general neural networks, the values of the hidden layer in the RNN at each moment are determined not only by the

Recognition of Internal Defects Based on LSTM
In order to build a more accurate quality-recognition model, a neural network was trained on the data described above. General neural networks process only a single input; the previous input and the next input are completely unrelated. The variations of threedimensional forces in friction stir welding are continuous, and the variation characteristics within a period reflect the material flow. Therefore, a diagnosis model of three-dimensional forces in which the previous input and the later input were related was required to process the sequence information. According to the literature reported by Yu et al. [25], a recurrent neural network (RNN) is a neural network for processing sequential data, which effectively extracts temporal information from data. Unlike general neural networks, the values of the hidden layer in the RNN at each moment are determined not only by the immediate input, but also relate to the hidden layer values at the previous moment. The structure is shown in Figure 15a. The outputs H t of the hidden layer and the outputs Y t of the output layer at time t are expressed as Equations (11) and (12), respectively.
where X t represents the input at time t; H t−1 is the output of the hidden layer at time t − 1; U, W and V indicate the weight matrix. immediate input, but also relate to the hidden layer values at the previous moment. The structure is shown in Figure 15a. The outputs of the hidden layer and the outputs of the output layer at time t are expressed as Equations (11) and (12), respectively.
where represents the input at time t; is the output of the hidden layer at time t − 1; , and indicate the weight matrix. Given that traditional RNN has the problems of gradient disappearance and explosion, a long short-term memory (LSTM) network was adopted. As a special RNN network, the LSTM had four network layers, as shown in Figure 15b. Each line in the diagram represents the transfer of a vector from the output of one node to the input of another node. Red circles are element-level operations of vectors, and yellow rectangles represent neural network layers. Compared with the general RNN structure, the LSTM network was unique in that it deleted or added information to the cell state through gate structures, as described by Chen et al. [26]. A gate structure is essentially a combination of a sigmoid layer and a dot product operation, which realizes the selection of information. The Sigmoid layer output values between 0 and 1 indicate the amount of information passed. The LSTM had three gates to control the cell state, as shown in Formulas (13)- (15). Forget gate determined the information to be discarded from the cell state, input gate determined the information to be added to the cell state, while output gate determined the output value.
where [ , ] represent the combination of input at time t and the hidden layer output at the last moment. , , and are weight matrixes between the input layer and input gate layer, forget gate layer and output gate layer, respectively. The actual network input was , as follows: The final output was the same as the previous Formula (12). At this stage, the hidden layer information was: Cell state Given that traditional RNN has the problems of gradient disappearance and explosion, a long short-term memory (LSTM) network was adopted. As a special RNN network, the LSTM had four network layers, as shown in Figure 15b. Each line in the diagram represents the transfer of a vector from the output of one node to the input of another node. Red circles are element-level operations of vectors, and yellow rectangles represent neural network layers. Compared with the general RNN structure, the LSTM network was unique in that it deleted or added information to the cell state through gate structures, as described by Chen et al. [26]. A gate structure is essentially a combination of a sigmoid layer and a dot product operation, which realizes the selection of information. The Sigmoid layer output values between 0 and 1 indicate the amount of information passed. The LSTM had three gates to control the cell state, as shown in Formulas (13)- (15). Forget gate Z f determined the information to be discarded from the cell state, input gate Z i determined the information to be added to the cell state, while output gate Z o determined the output value.
where [X t , H t−1 ] represent the combination of input X t at time t and the hidden layer output H t−1 at the last moment. W i , W f , and W o are weight matrixes between the input layer and input gate layer, forget gate layer and output gate layer, respectively. The actual network input was Z, as follows: The final output was the same as the previous Formula (12). At this stage, the hidden layer information H t was: where C t = Z f × C t−1 + Z i × Z. This operation was applied to change the old cell state C t−1 into a new state C t . According to the above experimental results, the welding quality could be divided into two categories, i.e., well-formed or tunnel defects. The tunnel defects classification was subdivided based on the defect size, while the welding quality was divided into three categories, i.e., well-formed, slight tunnel defects, and serious tunnel defects. The three-dimensional force signals of stable welding under various parameters were extracted and a group of samples were formed with 1600 points as a sequence length. For the binary classification model, 366 groups of well-formed samples and 343 groups of defective samples were selected. The order of samples was randomly shuffled, and 559 groups were selected as training sets for the training model and 150 groups for the test model. Since the input signal consisted of three force signals, the input was specified as a sequence with dimensionality of three. The LSTM layer, able to analyze time series forwards and backwards, mapped the input time series to 100 features. Then, the hidden layer connected to the full connection layer of size 2, followed by a softmax layer and a classification layer. The network was designed to analyze 20 samples simultaneously with an initial learning rate of 0.01. The adaptive moment estimator (ADAM) solver was employed. As the number of iterations increased, changes of loss function and accuracy were observed, as shown in Figure 16a. The results showed that all 150 groups of the testing set were classified accurately. According to the above experimental results, the welding quality could be divided into two categories, i.e., well-formed or tunnel defects. The tunnel defects classification was subdivided based on the defect size, while the welding quality was divided into three categories, i.e., well-formed, slight tunnel defects, and serious tunnel defects. The threedimensional force signals of stable welding under various parameters were extracted and a group of samples were formed with 1600 points as a sequence length. For the binary classification model, 366 groups of well-formed samples and 343 groups of defective samples were selected. The order of samples was randomly shuffled, and 559 groups were selected as training sets for the training model and 150 groups for the test model. Since the input signal consisted of three force signals, the input was specified as a sequence with dimensionality of three. The LSTM layer, able to analyze time series forwards and backwards, mapped the input time series to 100 features. Then, the hidden layer connected to the full connection layer of size 2, followed by a softmax layer and a classification layer. The network was designed to analyze 20 samples simultaneously with an initial learning rate of 0.01. The adaptive moment estimator (ADAM) solver was employed. As the number of iterations increased, changes of loss function and accuracy were observed, as shown in Figure 16a. The results showed that all 150 groups of the testing set were classified accurately. In view of the three-classification model, 316 groups of well-formed samples, 284 groups of small-defect samples and 289 groups of large-defect samples were selected. In total, 739 groups were selected as the training set for the model and 150 groups for testing the model. The structure of the model was consistent with that of the binary classification model except that the number of neurons in the full connection layer and classification  In view of the three-classification model, 316 groups of well-formed samples, 284 groups of small-defect samples and 289 groups of large-defect samples were selected. In total, 739 groups were selected as the training set for the model and 150 groups for testing the model. The structure of the model was consistent with that of the binary classification model except that the number of neurons in the full connection layer and classification layer was changed to three. The results indicated that the training accuracy of the classifier oscillated between about 70% and 90%, as shown in Figure 16b. The model was employed to classify 150 groups of test data. It can be seen from Figure 16c that well-formed samples were all correctly identified. However, there were many misjudgments between small-size defects and large-size defects, so the overall accuracy was only 84.67 %. The detailed test results were presented in the form of a confusion matrix, as shown in Figure 16d, where 0 represents well-formed, 1 and 2 indicate small defects and large defects, respectively.
According to the investigation about the effects of friction stir welding process parameters on variation characteristics of three-dimensional forces, welding quality prediction models with high accuracies were built based on long short-term memory neural network. Compared with the single-layer neural network in the literature [13] and the convolutional neural networks in the literature [20,21], the method proposed in this paper has improved the accuracy of distinguishing inner defects. Moreover, the paper attempts to predict different sizes of tunnel defects, although the model optimization needs to be carried out