High-speed vision measurement of vibration based on an improved ZNSSD template matching algorithm

This paper proposes an improved zero-mean normalization sum of squared differences (ZNSSD) algorithm to solve the problem of the inability of traditional structural measurement to extract high-frequency vibration signals. In the proposed technique, the high-speed image sequence of target vibration is captured by a high-speed camera. Then, the ZNSSD template matching algorithm with subpixel accuracy is introduced to process the captured images in the computer. Additionally, a modified search algorithm, the ZNSSD template matching algorithm based on image pyramid (ZNSSD-P), is proposed to significantly reduce the computation time and increase efficiency. Then, a jumping ZNSSD template matching algorithm based on image pyramid (J-ZNSSD-P) is proposed to further improve the efficiency of the ZNSSD-P algorithm. Vibration signals were extracted with Grating Ruler Motion Platform and sound barriers. Results show that the vibration signal extraction method has high precision and efficiency.


Introduction
Vibration, a common, natural phenomenon that exists in daily life and industrial production, is also an important topic in the field of mechanics research (see Zhang, 2019). The inevitability of vibration in engineering practice might cause harmful effects. Therefore, many foreign and domestic scientists have conducted related studies on vibration measurements that are widely used in various engineering practices (see Peng et al., 2020).
Vibration measurement algorithms are divided into contact and non-contact types. Contact-type vibration measurement algorithms require the use of contact sensors. An additional mass effect is present given that the contact vibration measurement can change the structural characteristics of the contact. Therefore, measurement error may occur when measuring objects with a light structure. Additionally, several sensors must be installed on the measured structure surface when measuring vibration data for large-scale structures. However, this setting will consume a large amount of manpower and material resources (see Poozesh et al., 2017). In addition to the problems mentioned above, the contact measurement CONTACT Bingyou Liu lby009@mail.ustc.edu.cn algorithm is restricted by the measurement environment and other factors. In comparison, the contactless measurement algorithm is not affected by the influence of the load or the stabilization limitations of the sensor. Non-contact measurement includes algorithms based on vision (see Aoyama et al., 2018;Wu et al., 2018) and the Doppler effect (see Liu et al., 2015), see With the rapid development of vision sensing, computer, and image processing technologies, the vision-based vibration measurement method has gradually become the mainstream non-contact measurement method (see Guo & Zhu, 2016;Wang, 2016). In recent years, extensive applications of vision measurement technology have been conducted in large-scale structure vibration detection, and satisfactory experimental results have been achieved (see Bell et al., 2012). For example, (2003) used a highprecision camera to measure the vertical displacement of the pier of Vincent Thomas Bridge in Los Angeles, California, United States of America through the frequency spectrum analysis of the displacement signal. The first two-order modal frequencies of the bridge vibration were obtained, thereby indicating that visual measurement is an effective way for completing the vibration measure-ment of large-scale structures. Feng and Feng (2017) used the camera to monitor the structural displacement response, successfully measured the natural frequency and mode shape of the object, and confirmed the effectiveness of the method by measuring the displacement test of the Manhattan Bridge • Generally, the vibration frequency of large-scale structures is relatively low, and an ordinary camera can satisfy the measurement requirements. When measuring an object with high vibration frequency, ordinary cameras cannot collect effective information due to their low frame rate (30 frames per second [FPS]). As a result, the vibration signal cannot be restored. The popularity of high-speed camera makes the extraction of vibration signals from objects with high vibration frequency through visual measurement algorithm possible.
To address this limitation, several high-speed cameras have been developed that can shoot 1000 fps or higher frame rate videos. Furthermore, You et al. (2014) combined high-speed camera technology with ultraviolet (UV) band-pass filter to obtain high-contrast ultraviolet images under different welding conditions. They also studied the relationship between the characteristics of the flame and spatter during welding and welding quality and realized the monitoring of working process stability in metal welding. (2016) applied highspeed vision technology to the vibration displacement measurement of underwater environment, extracted the amplitude information of underwater structures under seismic excitation through high-speed video, analyzed the effect of underwater environment on structural vibration, and calculated the natural frequency and damping of the structure. (2016) of the University of Science and Technology of China conducted in-depth research on the information extraction algorithm for high-speed visual measurement systems and applied it to the vibration analysis of forklift steering wheel systems, sound signal measurement and recovery, and bolt damage detection.
However, traditional image processing technologies used in high-speed cameras not only have high requirements for motion tracking algorithm, these technologies are also complex and troublesome (see Aqel et al., 2016) The standard method for tracking features between two images is through template matching in computer vision. The basic template matching algorithm calculates each position through a distortion function, which then measures the similarity between the template and the image. The area-based algorithm involves calculating a correlation function at each location of the search image in a raster scan manner. Several area-based methods have been introduced over the years, including mean absolute difference, product cross correlation (CC), sequential similarity detection algorithm, sum of absolute difference (SAD), and sum of variance (SSD) similarity measurement (see Sahani et al., 2011). After considering template matching, due to its better robustness, the zero-mean normalization sum of squared differences (ZNSSD) algorithm is often used in similarity measurements (see Bing et al., 2010;Pangercic et al., 2008). Compared with other algorithms, ZNSSD is less sensitive to the linear change of the illumination amplitude in the two comparison images, making it suitable for actual measurement.
However, a more efficient fast Fourier transform (FFT) cannot be used to directly calculate ZNSSD in the spectral domain. The efficiency of the ZNSSD algorithm is very low, because most thorough search operations require expensive computing costs. In the field of video compression, various algorithms, such as image pyramid decomposition algorithm (see Malik & Soundararajan, 2019), block-matching algorithm (see Muhammad et al., 2020), and so on, have been proposed to improve efficiency.
The sampling frequency of high-speed cameras for structural vibration is relatively high, and the displacement between each image frame is only a few pixels or even just one pixel. Therefore, according to the previous frame in template matching, the matching position can only be found within a very small range. Owing to these characteristics, the current paper proposes an improved ZNSSD for displacement measurement to solve the problem of traditional image processing technology. The improved algorithm can obtain the same results of vibration extraction, but the calculation time of one extraction is much less than that of the ZNSSD algorithm, which provides the potential of online extraction for highspeed captured video. The algorithm can be used to extract displacement by displacement and test displacement on a plane. Finally, the algorithm was applied to the high-speed camera system to realize the vibration measurement. Two experiments were carried out in laboratory and outdoor conditions to verify the performance. The results show the accuracy and efficiency of the camera system.

ZNSSD extraction algorithm
The ZNSSD algorithm uses a correlation value to represent the similarity between two images. This algorithm is a matching algorithm to find specific objects in the image. The value of the ZNSSD correlation is calculated as follows: ( 1 ) T(i, j) are the grey value of (i, j) in the template image T; G(i, j) is the grey value of (i, j) in the image G to be matched. Only considering the direct relationship between the corresponding grey levels will make this correlation description very sensitive to illumination change and image noise. The zero-mean normalization operation of the image can reduce the sensitivity of the correlation description function to illumination and noise, that is, ZNSSD function is insensitive to the linear change of image grey and has strong anti-interference ability.
Given a matching image G and a template image T, the area of G template T is scanned using the area of template T until the matching position with the largest correlation with T or the smallest error is determined. As shown in Figure 1, T with a size of w×h overlays on G with a size of W×H. The image is scanned by the full picture, and the correlation coefficient of T and the corresponding sub-region of G are calculated to obtain a description template for T and G. The minimum value of the correlation coefficient matrix of the matching degree of each region is in the region of G, with the highest matching degree with T.
A simple simulation example is provided to validate the performance of the ZNSSD algorithm. In the simulation, let the black circle in Figure 2 move along a predetermined path. The ZNSSD algorithm is then applied to extract the path coordinates. For conciseness, the motion equations, whose shape is an ellipse, are expressed as follows: where the vibration frequency f = 1 Hz and the sampling frequency f s = 50 Hz. In this study, black circle simulation experiments were conducted to verify the speed and accuracy of the algorithm. The ZNSSD algorithm successfully extracted the motion signals in the horizontal and vertical directions. The horizontal signal has a wave shape similar to the simulation input x, whereas the vertical signal is similar to input y. Thus, the results of the ZNSSD algorithm match well with the real input. Given that the algorithm can only determine the extraction effect of the entire pixel displacement, the displacement in the actual measurement is not exactly the entire pixel. Therefore, the extraction result of the pixel can greatly affect the positioning accuracy of the vision measurement system and may not obtain a good application effect in precision measurement. To address the above mentioned issue, we used the surface fitting (see Gu et al., 2020) algorithm to improve the extraction accuracy.

Subpixel accuracy improvement
A smooth surface exists near the maximum point within a correlation coefficient matrix. If the best matching position of the template is not in the position of the entire pixel, the extreme point ancestor of the surface will not appear in this exact position, but near it. The location of the extreme point is the best matching position of the image. Therefore, if the surface fitting is completed by the correlation coefficient matrix, the ideal peak point of the surface can be directly calculated in accordance with the surface function obtained through fitting to obtain a high-precision subpixel displacement estimation. The surface fitted by the correlation coefficient in different cases exhibits great differences. To reduce the calculation amount and increase the practicability of the fitting surface, we select the correlation coefficients of the maximum position and the surrounding eight points.
Given the relatively gentle change in the correlation coefficient surface, we use the bivariate quadratic polynomial to obtain the surface. Assuming that the matching position of the entire pixel is (x 0 , y 0 ), then the fitting surface can be expressed as follows: f (x, y) = a 0 + a 1 x + a 2 y + a 3 xy + a 4 x 2 + a 5 y 2 Once the correlation coefficient fitting surface is determined, the peak value of the surface can be used to calculated the location of the extreme point of the quadratic fitting surface.
Below is a simple example to illustrate the calculation process and the actual effect of the surface fitting subpixel algorithm. We considered a 3×3 correlation coefficient matrix in the following equation:  Figure  3 are (−0.1605, 0.1208). After the surface fitting operation, the subpixel calculation result of the optimal matching position is (x 0 − 0.1605, y 0 + 0.1208). The positioning accuracy obtained through surface fitting is 0.005-0.01 pixels, and it has the characteristics of high noise resistance and high computational efficiency. Thus, the technique can be applied in actual image matching.

ZNSSD-P template matching algorithm
The ZNSSD template matching algorithm is used to match and recognize the k-frame image. First, the k-frame image is decomposed into a pyramid, and the corresponding layer target is searched from the top image G n k with low resolution. Second, the target centre coordinate F n k (i n , j n ) of this layer is determined using the ZNSSD algorithm and then mapped to the high resolution image G n -1 k in the next layer. The new target centre coordinate F n -1 k (2i n , 2j n ) is obtained. Finally, the ZNSSD algorithm is used to search the target of G n -1 k in the nearby mapping area Q n−1 k . Thus, F n -1 k (i n -1 , j n -1 ), which is the revised target centre coordinates. Continuously cycle the above process, coordinates F 0 k (i 0 , j 0 ) is subsequently determined (as shown in Figure 4).
For 0 < d < n, W(p, q) is a window function with lowpass characteristics that can be defined as follows: Equations (8)-(10) is the process of finding target points in the n-th layer, as follows: where, G n (i, j) represents the n-th layer of the matching image, and T n (i, j) represents the n-th layer of the template image, G(i, j) represents the original image, T(i, j) represents the template image. The calculation times are as follows: Compared to the calculation of W × H times under the global search strategy, the image pyramid construction significantly reduces the matching calculation times and improves the matching efficiency.

Jumping ZNSSD template matching algorithm based on image pyramid (J-ZNSSD-P)
The ZNSSD template matching algorithm is used to match and recognize the k-frame image. First, the target of the corresponding layer is searched from the top G n k , and the target centre coordinate F n k (i n , j n ) of the layer is found by the ZNSSD algorithm. The number of skip layers is determined by the similarity score s between the target and the template and the current pyramid layer position. The number of skip layers each time is determined by the following formula: where m is the number of skip layers, k 1 and k 2 are the scale coefficients, s is the similarity score between the target and the template in the current layer (full score of 100), p is the matching skip threshold, c is the position of the current pyramid layer, and Round represents k 1 (s − p) + k 2 c rounded off to the nearest integer.
The new target centre F n−m k (2 m i n , 2 m j n ) is obtained by mapping F n k (i n , j n ) to the n − m pyramid image G n−m k after the jump. In the mapping area H n−m k centered on the coordinate F n−m k (2 m i n , 2 m j n ), the ZNSSD algorithm is used to re-search the target of this layer. Thus, F n−m k (i n−m , j n−m ) can be quickly found, which is the target centre coordinate of the corrected image F n−m k . Then, the process is continued until F 0 k (i 0 , j 0 ) is obtained. If n − m < 0 appears in the calculation process, it jumps to the bottom image; if m < 0 appears in the calculation process, it does not jump and maps directly to the next layer. In this paper, the values of k 1 , k 2 , and p are 0.02, 0.25, an 70, respectively.

General flow chart of the J-ZNSSD-P matching algorithm
After capturing the high-speed image sequence of a vibrating object by using a high-speed camera, the first frame of the image is taken, and the region of interest (ROI) is selected. Subsequently, the J-ZNSSD-P algorithm is used to search for the target and determine its centre coordinates. The surface fitting algorithm is then utilized to improve the extraction accuracy. The specific search process is shown in Figure 5.

Performance analysis of the modified algorithm
To verify the performance of the modified algorithm, we conducted the improved black circle experiments that adopted the same test conditions in Subsection 3.1. The experimental results are shown in Figure 6 and Table 1.
In the test, this paper selects four methods for comparison, which are the ZNSSD matching method, J-ZNSSD-P matching method and UCC subpixel matching algorithm. UCC algorithm is an advanced subpixel image registration technology, which allows the resolution to be adjusted by changing the up sampling factor. The upsampling factors of one integer pixel and 0.01 pixel in UCC algorithm are set to 1 and 100 respectively. Figure 6. shows the results of vibration extraction by the above four methods. The results of ZNSSD and UCC(usfac = 1) are shown in Figure 6.(a) and (c), and the results of J-ZNSSD-P and UCC(usfac = 100) are shown in Figure 6(b) and (d). From Figure 6. we can see that the matching accuracy of J-ZNSSD-P and UCC (usfac = 100) is significantly higher than the other two methods, and the extracted curve is smoother.
Quantitive contrast results regarding tracking error and computation time are given in Table 1. The table indicates that with the improvement of subpixel resolution level from 1 to 0.01 pixel, the absolute average horizontal error of the UCC algorithm reduces from 0.3029 to 0.0982 pixel, and the absolute average vertical error reduces from 0.2398 to 0.0881 pixel. Meanwhile, the time consumed increases from 11.4 to 18.44 ms/frame. When subjected to the requirement of template size, the UCC algorithm does not have the capability of giving dual attention to both accuracy and high efficiency. The analysis of ZNSSD algorithm shows that the accuracy and efficiency of ZNSSD algorithm are lower than the other three methods, and the accuracy and efficiency of J-ZNSSD-P algorithm is the highest among the four algorithms. Moreover, the improved algorithm has a wide range of potential applications in the vibration extraction of highspeed digital cameras.

Experiment on a grating ruler motion platform
In order to test the effectiveness of the improved J-ZNSSD-P algorithm in the actual displacement extraction, a visual measurement experiment based on the grating ruler platform was carried out. This is a manual operation platform equipped with a grating ruler sensor, as shown in Figure 7. The grating ruler is a commonly used highprecision displacement sensor that works according to the principle of physical formation of the moiré fringe.  It has the characteristics of large detection range, high detection accuracy, and fast response speed. The grating ruler is composed of a grating reading head and a scale grating, which is fixed on the translation table. The grating reading head is connected with the rotary screw through the connecting plate. When the screw is rotated manually, the connecting plate and grating reading head synchronously move horizontally. In the experiment, we used high-speed camera vision system and grating ruler to measure the platform motion simultaneously, and the measurement results are compared to verify the measurement effect of the visual displacement extraction algorithm. As shown in Figure 7, the grating ruler grid distance used in the experiment is 0.02 mm, the resolution is 1 μm, the accuracy is ±3 μm, and the sampling frequency is 20 Hz.
Given that the surface of the connecting plate is smooth and there is no obvious tracking object, a circular target with a diameter of 20 mm is pasted as the shooting target of the visual system, as shown in the figure. The high-speed camera is 3 m away from the platform, and the video is recorded at the rate of 200 fps, and the captured image size is 160×160 pixels, one of which is shown in Figure 7(a). The position of the red box in the figure is selected as the tracking template, and the template size is 50×50 pixels. In using the target for pixel calibration, the diameter of the circle occupies about 104.7 pixels, and the pixel resolution is about 0.191 mm/pixel.
The results obtained after using the J-ZNSSD-P algorithm to extract the displacement of the target motion video are shown in Figure 7. As a comparison, the measurement result of the grating ruler motion is also drawn in the figure. It can be seen that the displacement extracted by the improved algorithm is highly consistent with the measurement results of the grating ruler and that the average error are 0.037 mm (figure b) and 0.027 mm (figure c). Therefore, the improved algorithm can accurately extract the displacement, which has high practical value.

Vibration analysis of the sound barrier
The feasibility of using the high-speed visual vibration frequency measurement algorithm is confirmed in the first experiment. This section presents the second experiment, that is, the sound barrier experiment. Sound barrier, which is also known as sound wall or noise barrier, is a platform structure designed to protect residents from noise pollution coming from both sides of the railway or highway. Thus, this structure is the most effective method for reducing the impacts of road, railway, and industrial noises. However, with the increase in train speed, gravity and impact force are introduced when high-speed trains pass through the noise barrier. This phenomenon leads to the violent vibration of the barrier, which in turn, can cause material fatigue and loosened assemblies. In addition, when the sound barrier is damaged and falls on the railway track, disastrous consequences might occur.
The traditional measurement algorithm is difficult to use, because installing obstacles in the working train line is inconvenient. To solve this problem, a vision-based measurement algorithm is thus proposed. This equipment mainly includes a notebook and a high-speed camera, which are used to capture the displacement vibration video at a frame rate of 232 FPS. Figure 8. below displays the picture and displacement spectrum of the sound barrier of a railway section. Figure 8(a)-(c) show the noise barrier experimental environment map, selected ROI, displacement diagram of the lateral movement of the sound barrier, and spectrum analysis. The frequencies of the first two orders of the sound barrier are 10.27, 20.92, and 45.41 Hz. The dynamic characteristics of the sound barrier are analyzed on the basis of the above results. The findings reveal that the high-speed visual measurement can also be used in practical engineering applications.

Conclusion
More information can be obtained by using a high-speed camera to collect videos of vibration object, but the speed of traditional image processing algorithm is lower than that of a camera. This mismatch leads to the low realtime performance of the vibration frequency measurement method based on high-speed vision. To improve the speed and accuracy of the traditional ZNSSD template matching algorithm, an improved ZNSSD template matching algorithm is proposed in this paper. The simulation results show that the accuracy and speed of the improved algorithm are significantly improved compared with the traditional algorithm. Furthermore, two experiments were carried out under laboratory and outdoor conditions to validate the accuracy and efficiency of the Improved ZNSSD algorithm performance in practice. The results demonstrated the high accuracy and efficiency of the proposed algorithm in extracting vibrating signals.

Disclosure statement
No potential conflict of interest was reported by the author(s).