Single-pixel imaging of a translational object

Image-free tracking methods based on single-pixel detectors (SPDs) can track a moving object at a very high frame rate, but they rarely can achieve simultaneous imaging of such an object. In this study, we propose a method for simultaneously obtaining the relative displacements and images of a translational object. Four binary Fourier patterns and two differential Hadamard patterns are used to modulate one frame of the object and then modulated light signals are obtained by SPD. The relative displacements and image of the moving object can be gradually obtained along with the detection. The proposed method does not require any prior knowledge of the object and its motion. The method has been verified by simulations and experiments, achieving a frame rate of 3332 Hz to acquire relative displacements of a translational object at a spatial resolution of $128 \times 128$ pixels using a 20000-Hz digital micro-mirror device. This proposed method can broaden the application of image-free tracking methods and obtain spatial information about moving objects.


Introduction
Tracking and imaging fast-moving objects have significant application prospects in navigation, biomedical, computer vision, and other fields. The two main reasons for the deterioration of imaging quality for a fast-moving object are blurring caused by motion and the low signal-tonoise ratio caused by the high frame-rate shooting. The high-speed camera [1] was invented to capture moving objects with a very high frame rate and relatively high signal-to-noise ratio, but it is expensive, and the data flux is very high. Various tracking and imaging methods for moving objects based on spatial light modulators (SLMs) and single-pixel detectors (SPDs) with a wide spectral response have been proposed .
Among these methods, the image-free method can track a moving object at a very high frame rate. Zhang et al. proposed a real-time image-free tracking method based on Fourier basis patterns and achieved a temporal resolution of 1666 frames per second (fps) with a 10000-Hz digital micromirror device (DMD) [5]. Deng et al. extended the method to realize the threedimensional trajectory tracking of a fast-moving object with 1666 fps [6]. Zha et al. proposed a fast-moving object tracking method based on geometric moment patterns and realized a frame rate of 7400 Hz [7]. Then, Zha et al. also proposed a complementary measurement scheme, which increased the frame rate of the method to 11.1 kHz [8]. However, the above methods cannot image a moving object while tracking the object at a high frame rate.
Single-pixel imaging (SPI) [26] based on SPD requires several modulation patterns to image an object, and the operating rate of the SLM for modulation is limited, resulting in a conflict between the sampling time and the reconstructed image quality. For a moving object, the sampling time allocated to a single moving frame is very short, and combining multiple moving frames for the calculation will result in motion blur. To address this problem in SPI, a moving object can be imaged by estimating the moving speed of the object using an algorithm [10,11], choosing the proper modulation patterns or increasing the speed of SLM to shorten the sampling time [12][13][14][15][16][17], and estimating motion information based on low-resolution images [18][19][20]. In recent years, methods by estimating motion information of the moving object have been commonly used to image a moving object. Zhang et al. proposed a method for imaging a uniformly moving object by modifying patterns and velocity parameters during reconstruction [10]. Jiao et al. proposed a method for estimating the motion parameters of a moving object under the assumption that the object motion type is known [11]. In addition, many methods for obtaining the motion information of an object have been proposed, such as calculating cross-correlation [18,20] or low-order moments [19] of images, using laterally shifting patterns [21], and projecting twodimensional projective patterns [22]. Even so, the frame rate of these methods is significantly lower than that of the imaging-free tracking methods. Inspired by the above methods, a concept of tracking and imaging a moving object emerges naturally: we can first determine the object's motion information using the image-free method and then transform the spatial-coding patterns of the object using motion information; when there is a sufficient number of modulating patterns, the image of the moving object can be reconstructed using the compressed sensing [27][28][29] algorithm. Some similar ideas have been used in the most recent researches. Guo et al. combined geometric moment patterns and Hadamard patterns to achieve obtaining the relative displacements and imaging of a moving object at a frame rate of 5.55 kHz [24]. Xiao et al. achieved tracking and imaging of a fast-rotating object using Hadamard patterns and low-order geometric moment patterns [25].
In this study, we design a new pattern sequence to achieve a high frame rate of relative displacement detection and imaging of a translational object. Four binary Fourier patterns and two differential Hadamard patterns are used to modulate one frame of the object, and then the modulated light signals are obtained by SPD. The displacement of the moving object for each moving frame can be determined by these six detection values. Based on the determined displacements and patterns, we can recalculate the reconstruction matrix and reconstruct the moving object image. The frame rate of obtaining the relative displacements of a moving object using this pattern sequence can match that of the image-free method in Ref. [5]. The proposed method is verified through both simulations and experiments.

Method
In Fourier SPI (FSPI) [30], the spatial information of an object is encoded by an SLM using Fourier basis patterns, and the series of modulated total light intensities are detected by an SPD. The required Fourier basis patterns are typically described by a pair of spatial frequencies and an initial phase. A Fourier basis pattern ( , ) can be represented by its corresponding spatial frequency pair ( , ) and corresponding initial phase 0 : where 0 represents the average intensity of the Fourier basis pattern, 0 represents the contrast of the basis pattern, and ( , ) corresponds to the two-dimensional spatial coordinates of the basis pattern. The modulated total light intensity can be obtained using the above Fourier basis patterns to modulate the illumination light or the detection area: where ( , ) represents the object image. Based on the linear response of the SPD to the light intensity within its effective detection range, the modulated light intensity can be replaced by the value measured by the SPD. The Fourier coefficients of the corresponding Fourier domain are obtained by these measured values. The four-and three-step phase-shifting methods are two commonly used methods for obtaining Fourier coefficients in FSPI [12]. The four-step phase-shifting method requires four Fourier basis patterns with the same spatial frequency but different phases to obtain a Fourier coefficient. These four patterns are denoted as ( , , 0), ( , , /2), ( , , ), and ( , , 3 /2), respectively. The corresponding single-pixel values are denoted as 0 , /2 , , and 3 /2 , respectively. Then the corresponding Fourier coefficient is given by Eq. (3): Note that the patterns ( , , 0) is the inverse of the pattern ( , , ); ( , , /2) is the inverse of the pattern ( , , 3 /2). Similarly, the three-step phase-shifting method requires three Fourier basis patterns with the same spatial frequency but different initial phases. These three patterns are respectively denoted as ( , , 0), ( , , 2 /3), and ( , , 4 /3).
The corresponding single-pixel values are denoted as 0 , 2 /3 , and 4 /3 , and the corresponding Fourier coefficient is given by Eq. (4): The commonly used spatial light modulator in SPI is a DMD. These Fourier basis patterns are grayscale and cannot be directly loaded on the DMD. Binarization is typically required when these patterns are used for modulation. Fourier basis pattern generation via temporal dithering or signal dithering [31] is at the expense of temporal resolution in DMD-based FSPI. The spatial dithering strategy proposed by Zhang et al. can increase the speed of FPSI by two orders of magnitude compared with the temporal dithering method [12]. The high temporal resolution of the dithering method is important for imaging a fast-moving object. So, the grayscale patterns used in this study can be binarized using the upsampling scheme [12] and Floyd-Steinberg dithering method [32] as in Ref. [12]. After binarization, the pattern ( , , 0) plus the pattern ( , , ) equals the all-one pattern; the pattern ( , , /2) plus the pattern ( , , 3 /2) equals the all-one pattern, as well.
In the Fourier transform, all points in the spatial domain will contribute to each coefficient in the Fourier domain; as a result, a displacement change in the spatial domain will directly affect the Fourier coefficient. Based on the linear phase shift property of the Fourier transform, Zhang et al. proposed a method for detecting the object motion trajectory using two Fourier coefficients for each frame [5]. The specific principle is that the displacement(Δ , Δ ) of the object image ( , ) in the spatial domain will result in a phase shift in the Fourier domain (−2 Δ , −2 Δ ), which can be expressed as where ( , ) denotes the spatial frequency coordinate in the Fourier domain, ( , ) denotes the Fourier spectrum of the original image ( , ), and −1 represents the inverse Fourier transform. The relative displacement of the object can be calculated by measuring the phase shift term = −2 ( Δ + Δ ) of each frame. Finally, the displacement Δ and Δ are calculated by obtaining ( , 0) and (0, ) of each frame [5]: where {} denotes the argument operation, denotes the complex conjugate operation, bg ( , 0) and bg 0, represent the two Fourier coefficients obtained at the initial position before the object starts moving, and (0, ) and ( , 0) represent the two Fourier coefficients obtained at the current moving frame. Six binary Fourier basis patterns for each frame can realize real-time tracking of moving object trajectories by using the three-step phase-shifting method, as verified in Ref. [5]. As shown in Fig. 1, the proposed pattern sequence consists of six patterns for each frame. Two Fig. 1. Schematic of pattern design. Each motion frame corresponds to six patterns, of which four are binary Fourier basis patterns and two are differential Hadamard basis patterns. The Fourier patterns of all frames are the same, and the corresponding phases are 0 and /2, respectively. The Hadamard patterns corresponding to different motion frames are sorted according to the total variation (TV) ordering method [33].
differential Hadamard basis patterns that encoded the object spatial information are embedded in every four Fourier basis patterns to achieve the high rate of Ref. [5], and finally, the moving object is imaged. The first four binary Fourier basis patterns of all motion frames are the same which correspond to the Fourier basis patterns of ( , 0, 0), ( , 0, /2), (0, , 0), and (0, , /2), respectively. The spatial frequencies and in this study are both 2/ , where × represents the spatial resolution of the image. The two differential Hadamard patterns + and − are calculated from the ℎ Hadamard pattern : + plus − also equals the all-one pattern. The Hadamard patterns in each motion frame differ. Each Hadamard pattern is selected according to the total variation (TV) sorted method [33]. For six patterns of each frame, the corresponding single-pixel values are ,0 , , /2 , ,0 , , /2 , + , and − , respectively. Based on the four-step phase-shifting method, the two Fourier coefficients are calculated by: and the Hadamard coefficient can be calculated by: The displacement of the object can be calculated by Eq. (6), and then the displacement of the current frame can be determined. In addition, four binary Fourier basis patterns, one Hadamard basis pattern, and the corresponding five single-pixel values are obtained for each frame, which can be used in the final imaging procedure. We normalize the measured values by the total light intensity of each frame to mitigate the influence of non-uniform illumination during the object movement. For the six patterns of each frame, the corresponding single-pixel detection values are marked as ( = 1, 2, 3, 4, 5, 6). 5 and 6 are the intensities of two differential Hadamard patterns, and the sum of them is the total intensity of the frame. The normalized value˜ can be expressed as:˜ The displacement of the object during the pattern modulation is equivalent to that the object being static and the pattern moving in the opposite direction to the object's motion. The object image can be reconstructed from the recorded single-pixel values and transformed patterns when transformed patterns are sufficient. The process of transforming patterns is shown in Fig. 2. The whole pattern of the moved object requires reverse translation according to the calculated relative displacement. The total variation augmented lagrangian alternative direction algorithm (TVAL3) [34] is an efficient and widely used compressed sensing [27][28][29] algorithm. The TVAL3 solver can be employed to reconstruct the object using the transformed pattern sequence and corresponding single-pixel values.

Simulations
In the simulation, an object with two trajectories is simulated to verify the proposed method.
The object image is shown in Fig. 3(a), and the resolution is 128 × 128 pixels. The total number of moving frames is 1666, and each frame corresponds to six patterns, meaning that a total of 9996 patterns are used in the simulation. Guo et al.'s method based on geometric moment patterns [24] is also simulated for comparison. The geometric moment patterns used in the simulation were binarized using the Floyd-Steinberg dithering algorithm [32] with an upsampling ratio of 2. We studied the relationship between the upsampling ratio of spatial dithering and the relative position accuracy of our method. It was found that our method is not sensitive to the upsampling ratio, so we chose an upsampling ratio of 1 without sacrificing the spatial resolution of the image (see Supplement 1). The total number of patterns for Guo et al.'s method [24] is 6664 corresponding to 1666 moving frames. Note that Guo et al.'s method [24] can obtain the centroids of the object in each moving frame, so the displacements of the object can be calculated by subtracting the centroid coordinates of the first frame. Hadamard patterns are chosen to modulate and reconstruct the image of the moving object as in the conventional SPI method, and 9996 differential Hadamard patterns are used according to the TV order [33] for improved quality. Gaussian white noise with = 0.1 is included in the measurement for simulating real noisy experiments. The mean square error (MSE) is introduced to evaluate the accuracy of the reconstructed relative displacement. The MSE of the reconstructed relative displacement coordinate and the original coordinate is defined as follows: where represents the total number of frames. The smaller the MSE, the closer the reconstructed relative displacement is to the original relative displacement. The peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) [35] are introduced to evaluate the quality of reconstructed images. The PSNR between the original image and the reconstructed image is defined as follows: where ( , ) represents the MSE between and . is the maximum value of the image data type. The larger the PSNR value, the higher the reconstruction quality. The SSIM between the original image and the reconstructed image is defined as follows: where represents the mean of , represents the variance of , represents the covariance of and , represents the mean of , represents the variance of , 1 and 2 are constants. The value range of SSIM is [0, 1]. The larger the SSIM value, the higher the structural similarity between the two images.
The simulations with different noise distributions were repeated five times. Figure 3 illustrates the results of the corresponding method. Figure 3(h) and Figure 3(i) compare the displacement relative displacements reconstructed by the two methods and the original relative displacements. For trajectory (type-I) in Fig. 3(h), the image reconstructed by conventional SPI, Guo et al.'s method [24], and the proposed method are depicted in Fig. 3(b), Fig. 3(c), and Fig. 3(d), respectively. For trajectory (type-II) in Fig. 3(i), the images reconstructed by conventional SPI, Guo et al.'s method [24], and the proposed method are depicted in Fig. 3(e), Fig. 3(f), and Fig. 3(g), respectively. Table 1 shows the mean MSEs, PSNRs, and SSIMs of these methods. For a more detailed comparison under different noise levels, see Supplement 1. From the above  [24] exhibits a significant deviation in the reconstructed relative displacements, indicating that the geometric moment patterns are more sensitive to noise than the Fourier patterns. The reconstructed images of the conventional SPI method are blurred and degraded because of the motion of the object during the measurement process, whereas the proposed method can effectively reconstruct the images of the moving object with higher quality than Guo et al.'s method [24]. The influence of the number of moving frames on the imaging quality should also be considered by reconstructing the image while sequentially reducing the number of frames. The reconstructed images obtained while moving along the type-II trajectory were investigated in this simulation. Ten groups of data were selected, including 10%, 20%, ..., 90%, and 100% of the total 1666 frames, respectively, to compute the reconstructed images. The corresponding PSNRs and SSIMs are shown in Fig. 4(a). Figures 4(b-e) depict the reconstructed images with the corresponding frame numbers 333, 666, 1000, and 1333. The results indicate that, considering the influence of noise, when the number of sampling frames is greater than a certain frame number (e.g., 666 frames), the moving object image with good quality can be reconstructed by our method.

Experiments
The proposed method is verified through experiments. The experimental system device consists of a light-emitting diode (LED) source with a maximum power of 5 W, a linear motorized stage (KA400Z, Zolix), a DMD (Texas Instruments DLP7000), a photomultiplier tube (PMT, H10682-210, Hamamatsu Photonics), and a data acquisition board, as depicted in Fig. 5. The modulation patterns are preloaded into the DMD in advance for modulation. The transmitted object is imaged on the DMD after being illuminated by the light source. After being modulated by the DMD, the modulated light is collected into the PMT and converted into measurement values through the data acquisition board. The DMD operated at a high refresh rate of 20000 Hz. The pixel size of the modulation patterns located on the DMD was 256 × 256 pixels, and all 2 × 2 pixels were merged into one super pixel, meaning that the image size of the moving object was 128 × 128 pixels. The Fourier basis patterns used in the method were binarized using the Floyd-Steinberg dithering algorithm [32] with an upsampling ratio of 1. In each experiment group, 9996 patterns were used corresponding to frame numbers of 1666, and the total measurement time was 0.4998 s for a modulating rate of 20000 Hz. Guo et al.'s method [24] was also applied for comparison. The geometric moment patterns used in the experiments were binarized using the Floyd-Steinberg dithering algorithm [32] with an upsampling ratio of 2. The total number of patterns for Guo et al.'s method [24] was 6664 corresponding to 1666 moving frames and the total measurement time was also 0.4998 s for a modulating rate of 13333 Hz.

Background-free situation
In this section, the moving object was a transmitted object "B" with a size of 2.5 mm × 3.2 mm. In the first group experiment, the object moved straightly along the diagonal of the image with a constant speed of 25 mm/s. In the second group experiment, the object moved straightly along the diagonal of the image with an initial speed of 25 mm/s and an accelerated speed of 50 mm/s 2 . The Ground truth image was obtained by imaging the static object using the conventional SPI method, as depicted in Fig. 6(a). To obtain a clear static image, the DMD was operated at a modulating rate of 100 Hz and a total of 19992 differential Hadamard patterns were used corresponding to a sample ratio of 61.01%. The static object image at a modulating rate of 20000 Hz, as depicted in Fig. 6(b), illustrates the image degradation under high frame rate sampling. The real relative displacement of the object was determined by imaging the static object along the displacement axis of the motorized stage. Three static object images reconstructed using the conventional SPI method were combined to obtain the real relative displacement. The reconstructed images for moving object using the conventional SPI method are illustrated in Fig. 6(c) and Fig. 6(d), which are blurred due to motion. The reconstructed images for moving object using Guo et al.'s method [24] are illustrated in Fig. 6(e) and Fig. 6(f), which are also blurred due to the larger deviation in the reconstructed relative displacements. Two images of the moving object with good quality were calculated using the proposed method, as shown in Fig. 6(g) and Fig. 6(h). Figure 6(i) and Figure 6(j) compare the real and calculated relative displacements using Guo et al.'s method [24] and the proposed method. The reconstructed relative displacements using the proposed method are closer to the real relative displacements, indicating that our method is more robust to noise than Guo et al.'s method [24]. Compared with the compressed sensing algorithm, which requires a long calculation time, we used the non-iterative differential ghost imaging (DGI) [36] algorithm to reconstruct the object image in Supplement 1. Although there is large background noise in the images, our method can distinguish the object, while the other two methods cannot.
The influence of the number of moving frames on the imaging quality was investigated in this experiment. Ten groups of measured data were selected, including 10%, 20%, ..., 90%, and 100% of the total 1666 frames, respectively, to compute the reconstructed images. The corresponding PSNRs and SSIMs are depicted in Fig. 7(a). The results are similar to the simulation results. When the number of sampling frames is greater than a certain frame number, our method can reconstruct the moving object image with high quality. The reconstructed images with the corresponding frame numbers 333, 666, 1000, and 1333 (corresponding to a measurement time of 0.0999 s, 0.1998 s, 0.3000 s, and 0.3999 s, respectively) are depicted in Fig. 7(b-e). Visualization 1 demonstrates the motion of the object and the reconstructed image using different numbers of frames. For studying the influence of the frame rate of the relative positions obtained on the image quality, we compared the reconstructed image quality at different moving speeds. The image quality of our method is better than Guo et al.'s [24] method in all cases, demonstrating the superiority of our method. Details are described in Supplement 1. In this experiment, the moving target was also the transmitted object "B", while the stationary background consisted of a transmitted letter "I" and "T". Light could be transmitted through both object "B" and background "I" and "T" when the object move straightly along the diagonal of the image with an initial speed of 25 mm/s and an accelerated speed of 50 mm/s 2 . When the object began to move, the detection module began counting until the object completely moves out of the field of view. We calculated the relative displacement and image of the moving object by subtracting the measured data from the background. The ratio between the total intensity of the object and the total intensity of the background was marked as R. We performed four experiments under different background intensities ( R= 8.60, 3.25, 1.60, and 1.04, respectively) to evaluate the influence of background intensity on reconstructed relative displacement and image. Four clear images of the static object with different background intensities were obtained using a modulating rate of 100 Hz, as depicted in Fig. 8(a). The reconstructed images and relative displacements using our method are shown in Fig. 8(b) and Fig. 9, respectively. The high-quality images are reconstructed under a weak background. With the increase of background intensity, the quality of reconstructed images and relative displacements using our method decreases gradually. To mitigate the impact of noise, we apply a five-order mean filter to the calculated Fourier coefficients during reconstruction. The reconstructed images and relative displacements using filtered data, as shown in Fig. 8(c) and Fig. 9, respectively, demonstrate the improvement of the relative displacements' accuracy and imaging quality.

Discussion
The key feature of the proposed method is to obtain object motion information determined using the image-free tracking method. The higher the frame rate and the more accurate the calculated relative displacement, the more accurate the reconstructed image of the moving object. In our settings, six modulation patterns were used for each motion frame. Thus, the maximum frame rate of our method is the refresh rate of DMD divided by six. Using a DMD with a refresh rate of 20000 Hz, we achieved obtaining the relative displacements at a frame rate of 3332 Hz and finally imaged the moving object in the experiments. A higher frame rate can be achieved by using an SLM with a higher refresh rate. Although a higher frame rate of obtaining the relative displacements can also be achieved using geometric moment patterns, the simulation indicates that binary geometric moment patterns [24] lack differential measurements and require a higher upsampling ratio when binarizing, resulting in the loss of spatial resolution and poor robustness to noise. The four-step phase-shifting method typically exhibits better noise resistance than the three-step phase-shifting method to obtain a Fourier coefficient in FSPI [37]. To have a better trade-off between the relative displacement accuracy and image quality, six measured values are used to calculate the Fourier coefficients in this study rather than the eight values required by the four-step phase-shifting method. Nevertheless, our method can reconstruct the image of a translational object with high quality without sacrificing spatial resolution compared with the method based on geometric moment patterns. More binary Fourier patterns can be added to obtain a more accurate relative displacement; however, this will decrease the maximum frame rate of obtaining the relative displacements. To mitigate the impact of noise, a mean filter can be used to reduce the fluctuation of the reconstructed relative displacement and obtain high-quality images. The influence of the number of moving frames on imaging quality is also considered in this study. The simulated and experimental results indicate that with an increase in the number of measurement frames, the quality of the reconstructed image will increase rapidly and then tends to be stable. The result also indicates that the total detection time of 0.4998 s is unnecessary for imaging such an object.
We also acknowledge that the proposed method has several limitations. First, it can only image a transnational object with a simple background in the field of view due to the characteristics of the Fourier transform. When the object is rotating or deforming, our method would be invalid. The method based on geometric moment patterns can obtain the rotation state of objects by adding low-order geometric moment patterns, such as the method in Ref. [25], and achieve tracking and imaging of a rotating object. Second, our method only can obtain the relative positions of the translational object. Without prior knowledge of the initial position of the moving object, we will not obtain the trajectory information. The limitation of our method can be improved by introducing a small number of geometric moment patterns to determine the centroid coordinates at the beginning of the tracking process so that the tracking ability can be approximately obtained through the measured centroid coordinates and relative displacements. Third, our method cannot reconstruct the image if the time of the object staying in the field of view is too short to obtain sufficient useful measurements. Compared with our method, the method based on geometric moment patterns can achieve a higher position frame rate. When the object moves at a high speed, this method of obtaining the object's position at a high frame rate may have advantages. Considering its poor noise robustness, it may get good results when combined with our method. In addition, the image cannot be reconstructed in real time because of the use of the iterative algorithm. The non-iterative algorithm DGI is also used in Supplement 1, but there is large background noise in the restored image. In future work, the restored algorithms based on deep learning [38,39] may be a good choice.

Conclusion
A single-pixel detection method for restoring the relative displacements and imaging a translational object is proposed in this study. The displacement of a moving object for each moving frame can be determined by four Fourier patterns along with two differential Hadamard patterns. Based on the determined displacements and patterns, we can recalculate the reconstruction matrix and reconstruct the image of the moving object. This method does not require any prior knowledge of the object and its motion. It has been verified by simulations and experiments, achieving a frame rate of 3332 Hz to acquire relative displacements of a translational object at a spatial resolution of 128 × 128 pixels by using a 20000-Hz DMD. Future studies can focus on improving the frame rate for acquiring relative displacements and accelerating the reconstruction process to finally realize real-time imaging.
Funding. Beĳing Institute of Technology Research Fund Program for Young Scholars (Grant no.20212012).

Disclosures. The authors declare no conflicts of interest.
Data availability. Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.