Dynamic 3-D measurement based on fringe-to-fringe transformation using deep learning

Fringe projection profilometry (FPP) has become increasingly important in dynamic 3-D shape measurement. In FPP, it is necessary to retrieve the phase of the measured object before shape profiling. However, traditional phase retrieval techniques often require a large number of fringes, which may generate motion-induced error for dynamic objects. In this paper, a novel phase retrieval technique based on deep learning is proposed, which uses an end-to-end deep convolution neural network to transform a single or two fringes into the phase retrieval required fringes. When the object's surface is located in a restricted depth, the presented network only requires a single fringe as the input, which otherwise requires two fringes in an unrestricted depth. The proposed phase retrieval technique is first theoretically analyzed, and then numerically and experimentally verified on its applicability for dynamic 3-D measurement.


Introduction
Fringe projection profilometry (FPP) has been widely used in reverse engineering, security, and bio-medicine [1][2], etc. In the FPP, the 3-D shape is reconstructed from the phase modulated by the object's surface, which can be retrieved from two steps of phase calculation and phase unwrapping [3].
The phase calculation usually uses transform-based [4] or phase-shifting algorithms [5]. Because only one single fringe is necessary, the transform-based algorithm can be used for dynamic 3-D measurement but difficult to preserve shape edges [6]. The phase-shifting algorithm shows high-accuracy but requires at least three patterns to calculate the phase [7]. Recently, deep learning has been introduced to the phase calculation for the FPP, which can calculate the accurate phase by using only one fringe [8]. The deep learning calculates phase with similar accuracy to the phase-shifting algorithm with a large number of phase steps (i.e., 12-step) but preserves shape edges comparing with the transform-based algorithm [8].
As we know, the calculated phase is often discontinuous and wrapped in the range of ( , ]   [9]. In order to remove discontinuities in the calculated phase, a phase unwrapping process is necessary [10].
The phase unwrapping can be classified into two categories: spatial phase unwrapping [11] and temporal phase unwrapping [12]. Spatial phase unwrapping is based on the optimal path strategy to obtain the absolute phase, which often fails for a complex surface [7]. In the real measurement, temporal phase unwrapping is commonly used due to its high-robustness by temporally projecting a series of additional fringe images, which mainly include gray-code [13], multi-frequency [14][15], and phase-coding methods [16], etc. For example, the multi-frequency method uses several sets of phase-shifting sinusoidal fringes with different fringe frequencies [17]. Deep learning has been introduced to reduce fringes for the multi-frequency method. However, three fringes are still necessary for correct phase unwrapping [18].
As illustrated before, the two steps of the phase retrieval always require more than one fringe. In this paper, a novel phase retrieval based on deep learning is proposed by only using one single fringe. The proposed phase retrieval technique is different from traditional techniques [19][20], which fully utilizes the powerful ability of deep learning to extract characteristics and realize the conversion between different images. Both the phase calculation and the phase unwrapping desired fringes can be predicted by using one fringe pattern prediction network (FPPnet). The FPPnet only requires one single image and one single network, which simplifies the training process compared with the previous works requiring multiple inputs and multiple networks [8,18].
The rest of the paper is organized as follows. Section 2 introduces the principle of FPP and the FPPnet.
Section 3 shows the experiment results. Section 4 summarizes the paper.

Fringe projection profilometry
In a typical FPP system, a typical fringe can be represented as [21] ( It is worth mentioning that the ( , ) b x y can be calculated by using the following formula [7]  The phase can be calculated by using a least-squares algorithm [22] which is usually discontinuous and wrapped in the range of ( , ]   , and the continuous absolute can be obtained by where ( , ) k x y is the fringe order determined by the phase unwrapping [23]. It should be noted that ( , ) b x y can filter out low reflectance regions (most of which are areas where no fringe exist) [7]. Therefore, we can set a threshold based on the value of ( , ) b x y and generate a mask. In order to make the training datasets get better results and fit more easily, the mask is added during the training process.

Fringe pattern prediction network
In order to achieve the phase retrieval from one single fringe, we propose one FPPnet. The structural design of the network refers to Efficient Residual Factorized Network (ERFnet) [24][25]. ERFnet proposes factorized convolution and designs an encoder-decoder network semantic segmentation, which saves much computational cost and improves efficiency. We use this structure to extract powerful features to achieve modeling pattern distribution. The network is designed in Figure 1. Table   1 is the network structure details, that is, the corresponding part of ConvNet in Figure 1.   , and m is the number of pixels in the mask.
To improve the output effect, we adopt online hard examples mining (OHEM) [26]. Its mathematical expression is as follows where t is threshold, N is the number of samples， and D is loss function that the network adopts. In this article, D is 1 Loss . Since FPPnet prediction effect at texture and detail is weaker than flat area, this brings errors to the calculation of the phase. OHEM sorts the loss of each sample and focuses on some samples with higher loss to improve the effect of these samples.

Experiment
In the experiments, fringes are projected by the projector (DLP6500, Texas Instruments). The fringes are captured by the CMOS camera (Basler acA800-510um) of resolution 800*600 with a lens of 12 mm focal length. In order to make full use of the camera's field of view, and improve the computing efficiency of the FPPnet, we choose the area of 496*496 for the actual operation. In the same experiment environment, the dataset is collected. It contains a train set of 68 different scenes, a validation set of 10 different scenes and a test set of 10 different scenes (the following results are obtained in test set scenes). These scenes are projected by 12-step phase-shifting fringes of different periods. The mask corresponding to each scene is used as an input to help the FPPnet quickly converge.
The experiments are performed on the platform of the Nvidia Titan V graphic card.

Different period results with the FPPnet
In the first experiment, different period results are demonstrated with the FPPnet. We choose the first fringe frequency of 1 13 f  as the input, as shown in Figure 2(a). Twelve fringes of 1 f are predicted through the FPPnet, as shown in Figure 2 FPPnet uses its learning ability to achieve prediction of output fringes in the same period and different periods. It is worth mentioning that when predicting the same period fringes, we try to obtain the other eleven fringes except the input, and combine the output with input to calculate the phase. However, the phase results obtained in this way are worse than the strategy adopted in this paper. The reason is that even if input has a more accurate grayscale distribution, there are some incompatibility compared with the results predicted by the FPPnet. The accurate grayscale distribution introduces a large error to the phase calculation.   Figure 3(c) is the error of the two phase calculation results (for objective evaluation, the jump error near two periods is subtracted from 2 . This situation only exists at the junction with the periods. It does not cause a wide range of unwrapping phase errors). Its mean phase error is 0.054 rad. The phase results corresponding 2 f are shown in Figure 3(d-f) with 0.038 rad of mean phase error. This approach achieves phase unwrapping with one single image, as shown in Figure 3(g). It can still retrieve the phase of the discontinuous region very well. Meanwhile, Figure 3(g) uses multi-frequency method to recover the absolute phase. The performance can be improved if more different periods of wrapped phases are used.

Results with different phase-shifting steps
In the second experiment, we verify the impact of different phase-shifting steps. The input is the first fringe frequency of 1 13 f  . Through the FPPnet, we predict the 2 16 f  fringes of 4-step phase-shifting, 6-step phase-shifting, and 12-step phase-shifting. Subsequently, the phase is calculated by the network output, which is compared with the 12-step phase-shifting of ground-truth. The mean error is presented in Table 2.
It can be seen that no matter how many fringes are predicted, the grayscale loss of the image has no obvious change. Therefore, it is easy to get a conclusion. Since each of the predicted image has an error, the higher the phase-shifting step of estimating, the smaller the influence of the error, so we can get the more accurate phase. The conclusion is confirmed by mean phase error in Table 2. Thus, we can choose a higher phase-shifting step fringes as ground-truth in our datasets for the more accurate phase retrieval.
The advantage of the FPPnet is that although we get more phase-shifting fringes, the acquisition of these fringes only needs to project one fringe, and it does not affect measurement speed.

Results with different cross-period
In the process of the phase retrieval, we hope to select the appropriate fringe frequency according to the action distance, environment, equipment parameters and other factors. Thus, in the third experiment, the ability of cross-period predicting is explored. The fringe frequency of 13 L f  is also set to input.
We use the 12-step phase-shifting fringes of different frequencies as the output. The FPPnet loss curve is shown as Figure 4(a). Figure 4 It can be seen from Figure 4, the cross-period predicting near the input fringe frequency can get an accurate effect, because the characteristics of the fringes are close in the case of adjacent frequencies.
In this condition, deep learning can achieve more accurate image prediction. Therefore, when predicting high-frequency fringes, if the frequency difference with the input fringe is large, the result is not ideal. However, this situation does not have much influence on the low-frequency fringe, because the grayscale of the image changes more gently at low-frequency, and the phase error caused by the same grayscale error is smaller than that of the high-frequency.
On the other hand, for fringe input at different frequencies, it is clear that under high-frequency conditions of H f , the effect of network prediction is much stronger than that of low frequency. of L f .
In both cases, the training loss is close and very low, indicating that the network has sufficient capacity and easy to fit different inputs. The test loss is different because when the input image contains limited information, that is, in the brightest and darkest areas of the fringe, valid information is easy to lose, and the structural information is easier to be destroyed in the low-frequency case. Therefore, the generalization ability of the network on low-frequency fringe is weak, and the adaptability to different scenes is poor. If background images are added in inputs, the overfitting can be improved.

Conclusion
In this work, a novel phase retrieval based on deep learning for FPP is proposed by only using one single fringe. The FPPnet is designed to predict fringes with different fringe periods by using one single image and one single network. Thus, the phase calculation and the phase unwrapping can be achieved. Theoretical analysis and experiments are provided to verify its performance. The proposed technique shows great potential for dynamic 3-D measurement.