Label enhanced and patch based deep learning for phase retrieval from single frame fringe pattern in fringe projection 3D measurement

We propose a label enhanced and patch based deep learning phase retrieval approach which can achieve fast and accurate phase retrieval using only several fringe patterns as training dataset. To the best of our knowledge, it is the first time that the advantages of the label enhancement and patch strategy for deep learning based phase retrieval are demonstrated in fringe projection. In the proposed method, the enhanced labeled data in training dataset is designed to learn the mapping between the input fringe pattern and the output enhanced fringe part of the deep neural network (DNN). Moreover, the training data is cropped into small overlapped patches to expand the training samples for the DNN. The performance of the proposed approach is verified by experimental projection fringe patterns with applications in dynamic fringe projection 3D measurement.


Introduction
Fringe projection as a non-contact and whole field three dimensional (3D) shape measurement technology with high speed, high resolution, and low cost has been widely employed in diverse fields with biomedical applications, industrial and scientific applications, kinematics applications, and biometric identification applications [1][2][3].The principle of this method is to measure the deformation of projected fringe pattern demodulated by the height of tested object.The height information is related to the phase in the deformed fringe pattern, and the phase is recovered by phase retrieval operator.Phase retrieval is a key and difficult problem in fringe projection measurement technique [1,4].The phase retrieval methods are mainly divided into two categories: the methods from single frame fringe pattern and the phase shift methods.The phase shift methods usually require multiple fringe patterns at different moment [5,6].In the measurement of objects in fast motion or in a temporally unstable environment, it is difficult or costly to take several projection fringe patterns in an extremely short period of time.Compared with the latter, the former only requires one fringe pattern in single shot, which makes it less interfered by the external environment and is more suitable for 3D measurement of dynamic objects [5][6][7].
However, phase retrieval from single frame fringe pattern is a challenging problem in fringe projection 3D measurement especially for objects with edges or abrupt changes in depth, which attracts wide attention.Numerous methods have been proposed, such as the well-known Fourier Transform method (FT), Windowed Fourier Transform method (WFT), the Wavelet Transform method (WT), Shearlet Transform method (ST), and the more effective methods such as Empirical Mode Decomposition (EMD) method and more recently proposed variational image decomposition(VID) and variational mode decomposition (VMD) methods [8][9][10][11][12][13][14][15].Although extensive research efforts have been made for phase retrieval, it is hard to implement an accurate and fast retrieval phase method for the tradeoff between the accuracy and computational efficiency in traditional phase retrieval methods.For instance, FT method is simple but could not work well for object with edges.The more effective phase methods such as VID and VMD cause a great computation amount.
In recently, the discriminant learning such as deep learning has demonstrated to be successful in many areas ranging from computer vision such as image recognition, image denoising, and image super-resolution to optical imaging such as digital microscopy and digital holography [16][17][18][19].Inspired by the success in those areas, Feng and Zuo et.al. recently introduced the deep learning method into fringe pattern analysis and they proposed the deep neural network (DNN) to conduct phase retrieval from single frame fringe pattern [20,21].The process of phase retrieval is learned from the input data and the output labeled data in the training dataset by DNN.Their work demonstrates that the deep-learning-based technique can provide high accuracy phase retrieval results with rapid time.Owing to the ability of deep learning to learn the mapping between the input data and output labeled data, one can introduce the DNN to the problems such as phase unwrapping and 3D mapping in fringe projection 3D measurement [22][23][24].
As a data driven fashion, the performance of deep learning-based phase retrieval is subject to the training data both in quality and quantity.Abundant yet accurate labeled data in real fringe projection is important but difficult in acquisition.Hence, the use of fewer samples is desirable for the deep learning-based phase retrieval method provided the learning and prediction performance is unchanged.In addition, the fringe patterns which evidently contain noise will decrease the accuracy of phase retrieval results.By now, the two issues are not addressed in existing deep learning-based phase retrieval method for fringe projection 3D measurement.In this paper, we developed a new phase retrieval method based on the recently proposed DnCNN model to tackle the noise problems in phase retrieval and training samples problems [25].In the proposed method, we use the fringe pattern and the enhanced fringe part as the input data and output labeled data in the train dataset of DNN to learn their mapping to implement the data driven phase retrieval.Since the labeled data is enhanced, our proposed is expected to deal with the noisy fringe pattern without pre-processing or post-processing.Moreover, the proposed method needs fewer samples as we can expand the samples by cropping the original samples into more overlapped small patches.The phase retrieval by the proposed method was performed, and the performance of the method was verified by experimental results.The contributions of our work are as follows: (1) We proposed to use the denoised and enhanced fringe part as the labeled data in the training stage.In this way, the proposed DNN network can learn the denoised and enhanced fringe part from a noisy fringe pattern, therefore it can simultaneously achieve the fringe part extraction and enhancing, which does not require the filtering pre-processing or post-processing in phase retrieval.For the simulated data, output labeled data is known in advance.However, for the real fringe pattern, the labeled data is not exactly known.Therefore, we use the phase shift and Shearlet transform filtering method to produce the enhanced labeled data.The applicability and advantages of the label enhancement were demonstrated in fringe projection.
(2) We proposed the patch strategy to expand the training dataset by cropping the input fringe pattern and output labeled data into overlapped small patches.In this way, the samples were expected to be expanded to deal with the problems existed in traditional DNN that the training dataset is difficult to acquire.Meanwhile, the small patches will decrease the size of network computation, leading to sensible reductions in running time and memory requirements.It is noted that although the patch strategy has been employed in image super-resolution and denoising, etc., it is firstly introduced in the field of phase retrieval for fringe projection 3D measurement.The advantages of the patch strategy were demonstrated by real fringe patterns.

The proposed method
In fringe projection 3D measurement, the intensity distribution of a fringe pattern can be expressed as I(x, y) = a(x, y) + b(x, y) cos(φ(x, y) where a(x, y) is the background, b(x, y) and φ(x, y) are the modulation intensity and the optical phase, f 0 is carrier frequency, and noise denotes the noise in I(x, y).Phase retrieval can be implemented by extracting the fringe part b(x, y) cos(φ(x, y) + 2πf 0 x) apart from the background a(x, y) and noise part [13].However, due to the discontinuous edge of objects and noise effect, the fringe part and other parts are not well separated.The deep learning method has been proposed to separate the fringe part from the other parts by learning the mapping between the input fringe pattern and the output fringe part with DNN.As noted, previous works on phase retrieval using deep learning tool do not deal with separation fringe part from noise.The noise effect in the labeled data has never been paid attention.Also, in order to effectively train the DNN, scores of fringe pattern with labeled data should be prepared [20].In this paper, we propose to extract the fringe part apart from background part as well as noise part with less samples in a new manner using deep learning as follows.

The design of DNN for phase retrieval
The proposed phase retrieval method is based on the extraction of the fringe part from fringe pattern by using a DNN to learn the process of fringe part extraction.There are two steps for the DNN to implement fringe part extraction: the training step and the testing step.In the training step the DNN was trained to learn the mapping between the input data (fringe pattern) and the output labeled data (fringe part), and in the testing step the trained DNN predicts the output fringe part given the input fringe pattern.Figure 1 shows the diagram of the fringe part extraction, where DnCNN model is used in our study for that DnCNN utilizes residual learning and batch normalization which can benefit from each other, and their integration is effective in speeding up the training and boosting the denoising performance [25].As shown in Fig. 1, in order to make the network acquire the features of the image more efficiently, the input fringe pattern and output labeled fringe part with pixels 512×512 are divided into overlapped patches of 40×40 pixels size by using a fixed-size window.The input patches and output label patches are used to train the DNN to learn the mapping between the fringe pattern and the output fringe part.Once the DNN is trained, it is used to predict fringe part from tested fringe pattern.The DNN model shown in Fig. 1 contains convolutional layers (Conv), BN(Batch Normalization) and RELU.DnCNN applies a residual learning formulation to learn a mapping function, and it uses batch normalization to accelerate the training procedure while improving the denoising results.Based on Eq. ( 1), the input fringe pattern for DNN can be rewritten as p = f + r, where f and r are the fringe part and the rest of the fringe pattern p, respectively.r contains background part and noise part, which is denoted as generalized residual apart from fringe pattern p in this paper.Our goal is to use the DNN with residual learning to separate f and r [25].The residual learning method is used to train the network to generate a nonlinear map of R(p) = r, so that the residual part r = R(p) and the fringe part f can be extracted.The loss function of DNN is as follows which represents the mean square error between the expected residual (p − f ) and the network predicted residual R(p i ; θ).θ represents the weight and bias in the network, which changes with the back propagation of DNN, {(y i , f i )} N i=1 is the N pairs of fringe pattern and the corresponding fringe part.After the training of DNN, the predicted fringe part f can be obtained.With the derived fringe part f (x, y) = f , the wrapped phase distribution with carrier is calculated by Hilbert transform and arc tangent operator on the fringe part as follows: where H denotes Hilbert transform, Re{} and Im{} respectively denote real and imaginary parts, and ϕ c (x, y) is the carrier which should be removed to produce a pure phase.In this paper, the unwrapped phases were obtained by quality guided phase unwrapping algorithm [14].To obtain the pure unwrapped phases without the carrier term, the carrier was removed from the unwrapped phases using Fourier carrier removal method.To sum up, Fig. 2 shows the diagram of the deep learning based method for phase retrieval, which is composed by the above mentioned DNN training and prediction, wrapped phase retrieval, phase unwrapping and carrier removal.

The label enhancement and patch strategy
The simulated fringe pattern is generated according to Eq. ( 1).Gaussian random noise is added with variance of 0.2 in the simulation.Figure 3 shows the simulated training dataset with labeled data.For the simulated data, the labeled data is known.However, for the experimental data, the labeled data is not exactly known.To obtain the ground-truth data as the output labeled data in real fringe projection, we exploit the four steps phase shifting method to produce the labeled data [5,14] ( In real 3D measurement, the fringe pattern captured by the CCD or CMOS camera is corrupted by noise, which causes the 3D reconstruction result with errors.In order to eliminate these errors, some denoising algorithms are usually used to perform denoising in fringe pattern preprocessing or post processing.In this paper, we propose to train the DNN using the fringe pattern captured from real scenario by CMOS camera and the corresponding denoise fringe part (label enhancement), so that the trained network can learn the mapping between the input fringe pattern and the enhanced fringe part, and consequently can predict the enhanced fringe part of the given fringe pattern, which avoids the denoising steps.The Shearlet transform method with soft threshold shrinkage is employed in order to effectively denoise the labeled fringe part L while preserve the details [26].Figure 4 shows the experimentally obtained training dataset.In Fig. 4, we give training dataset of two strategies, all of which use the same fringe pattern, but using the noise fringe part and the denoise fringe part as the labeled data respectively.As seen from enlarged local areas of the magnified wood doll and human face images that the denoise fringe part is clearer in detail than the noisy fringe part.
Overlapping patches are densely cropped from the input fringe patterns.These patches are then used to train the DNN during training stage.In the training stage, the original fringe pattern is cropped into small overlapped patches as the training data.These small patches are input to the learned DNN to produce the output fringe part.In the testing stage, the input fringe pattern is cropped into small overlapped patches as the training data, and then these small patches are predicted by DNN, and lastly the overlapping reconstructed patches (fringe parts patches) are aggregated to produce the final output (fringe part with the same size as the input fringe pattern).In addition to this fashion in the testing stage, the input fringe pattern can be directly tested through the DNN without data cropping for the ability of the fully convolutional network which can receive the input data with arbitrary size.In this paper, the patches are set as 40×40 pixels and overlapped with 10 pixels at each direction and the input fringe pattern without cropping is adopted in the testing stage for its simplicity in implementation.

Results and discussion
In this section, DNN based on DnCNN network is implemented using Python language and the framework of Pytorch (0.4.1 version) on a PC with Intel(R) Core (TM) i5-7500H CPU (3.40 GHz), 16 GB of RAM, and the GeForce GTX 1080 (NVIDIA).The DNN model shown in Fig. 1 contains 17 convolutional layers.The kernel size of the 1 st to 16 th layers in the network is 3×3×64, and the kernel size of the last layer is 3×3×1, the convolution stride is 1, and zero-padding is used to control spatial size of the output data to make the input data and output data with the same size.From the 2 nd layer to the 16 th layer, we use the batch normalization to speed up the network training and improve the training precision with numerical stability constant eps of 0.0001 and momentum of 0.95.The Adam optimization algorithm is used to train the neural network with a learning rate 0.001.In the above DNN model, the ReLU activation function is used to better fit the nonlinear mapping at the output of the convolutional layer.It takes about 17 hours to train the simulated dataset up to 1000 epochs and takes about 12 hours to train the experimental dataset up to 1000 epochs.Simulated and experimental experiments were conducted in this work.In the simulation, the fringe pattern with 512×512 pixels as the input data of DNN are simulated as shown in Figs. 3  and 5 where Gaussian random noise with variance of 0.2 added in the fringe patterns [14].Since DnCNN is usually tailored to a specific noise level, therefore the training and testing dataset with the same noise level is used in this study.In the real experiment, the fringe patterns were captured by CMOS camera of 8-bit pixel depth and of resolution 512×512 pixels from the tested object, on which the projector (DLP LightCrafter 4500) projects the fringe pattern.

Patched based strategy validation
To valid the effectiveness of the proposed patch strategy, the DNN with patch strategy (DNN1) was compared with the DNN without patch strategy (DNN2).The DNN1 and DNN2 algorithms for phase retrieval are the same except the training dataset, which are implemented as shown in Fig. 2. Figures 5(a-1)-5(a-3) respectively show the simulated fringe pattern, true fringe part and true phase.Figure 6 shows extracted fringe parts by DNN1 and DNN2.In Fig. 6, the training dataset with and without patch strategy is trained using network size with the same parameters.As to the training dataset with patch strategy, although the number of training samples is only 8 frame, the number of overlapped small patches (with patch size 40×40 pixels) is as large as 45824.That means 45824 patches were obtained from the original fringe patterns.Figures 6(a-1) and 6(a-2) respectively show the extracted fringe parts by DNN1 and DNN2 approaches with their error shown in Figs.6(b-1) and 6(b-2), respectively.Similarly, Figs.7(a-1) and 7(a-2) respectively show the unwrapped phase by the two methods with their phase error respectively shown in Figs.7(b-1) and 7(b-2).The MSE of the fringe parts are 1.95×10 −3 and 8.32×10 −3 by DNN1 and DNN2, as well as the MSE of unwrapped phase are 2.77×10 −2 and 5.32×10 −2 respectively.Compared with DNN2, DNN1 extracts fringe parts with smaller error and thus preserves more details of the edge of phase as shown Figs.7(b-1), 7(b-2) and 7(c).The proposed patch strategy for phase retrieval was validated by real fringe patterns in fringe projection.Figure 8 shows the extracted fringe parts and phase results from one real fringe pattern by DNN1 and DNN2.The results from four steps phase shift method were used for reference.As seen in Fig. 8 that the results by DNN2 shares more artifacts in fringe parts and phase results compared with DNN1.There are some ripples in the phase results for both DNN1 and DNN2 approaches because low quality of fringe pattern of the measured object subjected to inhomogeneous reflection.To sum up, the proposed method with patch strategy can extract fringe parts with a better performance.(c-1) and 11(c-2), respectively.As seen DNN realizes the automatic extraction of the fringe parts by learning the mapping between the input fringe pattern and output fringe part.From the comparison we can see that Figs.11(a-1) and 11(b-1) are with more noise than Figs.11(a-2) and 11(b-2) respectively, indicating DNN with label enhancement exhibits fringe part enhancement automatically.It can also be seen from Figs. 11(c-1) and 11(c-2) that the prediction results of the network using label enhancement are smoother than that without label enhancement.Overall, the above results show that enhanced fringe part can be extracted from the noisy fringe pattern, and the accuracy of the phase extraction result is improved by learning the mapping between the noisy fringe pattern and the denoised and enhanced fringe part.

Comparisons with other methods and application in dynamic object measurement
In order to further show the performance of the proposed phase retrieval method, we compare the proposed method with the FT method and the four steps phase shift method by using two real experimental fringe patterns.Figures 12(a) and 12(b) respectively show the captured fringe patterns for human face and a plastic box.Figure 13 shows the processed results for Fig. 12 where the unwrapped phase for four steps phase shift method, FT and the proposed method are shown.In detail, Figs.13(a-1)-13(a-3) respectively show the phase retrieval results for Fig. 12(a) with four steps phase shift method, FT and the proposed method, while Figs.13(b-1)-13(b-3) respectively show the unwrapped phase results for Fig. 12(b) with the three methods.Figures 13(c-1) and  13(c-2) give the plots of the unwrapped phase data at 255 th row for Fig. 12(a) and the plots at 255 th column for Fig. 12(b), respectively.The plots in Figs.13(c-1) and 13(c-2) show that phase results by the proposed method are closer to the phase shift method compared with that obtained by FT method.As shown in the inset of Fig. 13(c-2) the proposed method preserves more details of phase especially for the object with abrupt changes such as the edges of the plastic box.We can conclude from Fig. 13 that proposed method produces phase closer to the phase shift method compared with FT method and it has better visual quality of 3D human face.As to the computation time, the FT method, although simple in implementation, needs 0.49s to achieve the fringe part extraction, while our proposed method only requires 0.05s provided the model is loaded, which is about ten times faster than FT method.To further investigate the performance of the proposed method in dynamic fringe projection 3D measurement, the experimental fringe patterns are captured by CMOS camera with frame rate of 100 Hz under the scenario of hand motion and tested.The phase of the set of fringe patterns is retrieved by the proposed method and FT method, respectively.The trained network is the same to that used for Fig. 13. Figure 14 shows the experimental fringe patterns of hand under motion with 6 different time.Figure 15 shows the reconstructed phase from fringe patterns at 6 different times by using the proposed method and FT method respectively.As seen, the proposed method preserves more details of the phase results than FT method, and the results by the proposed method are reasonable than that by FT method such as in the fingers of hand.As to the computation efficiency, once the model is loaded in one time, it can predict a set of images without repeated loading time, thus it achieves the fringe part extraction with 0.05s per fringe pattern with size of 512×512 pixels.In contrast, the tradition methods such as FT method require repeat computation for each of the image set.Therefore, the proposed method is more suitable for dynamic phase retrieval for dynamic fringe projection 3D measurement.

Conclusion
Phase retrieval from single frame fringe pattern remains one of the most challenging open problems in fringe projection 3D measurement.In this paper, the label enhanced and patch based deep learning phase retrieval approach is proposed to achieve a good performance both in accuracy and computation efficiency by learning the mapping between the input fringe pattern and output desirable fringe part with DNN.Different from previous work, this method can effectively denoise and enhance the fringe part to improve the accuracy of phase extraction results for objects with edges.In the proposed method, the real fringe pattern and the corresponding denoise fringe part are as the input data and output labeled data of the DNN, so that the trained network can predict an enhanced fringe part of the given fringe pattern.More importantly, we firstly demonstrate that the advantages of patch strategy that cropping the original fringe pattern into more overlapped patches can expand the samples.Experimental results demonstrate that the proposed DNN with patch strategy can extract fringe part with the training dataset with a few fringe patterns.Compared with FT method, The effectiveness of the proposed method is verified by experimentally obtained fringe pattern in the scenarios of human face, hand in motion and the results shown that the proposed method can effectively preserve the edges of 3D object in phase retrieval with a fast and accurate result.This works may benefit for other phase retrieval problems such as in digital holography where labeled data is difficult to obtain.In addition, since our work attributes a manner of phase retrieval in terms of training data, which can be extended to other DNNs such as U-Net and ResNet for improving phase retrieval results.
Since our method learns the mapping from the input fringe pattern and output labeled fringe part under experiments, one can improve the performance of proposed method by considering removing nonlinear error in the labeled data in future work.One can also improve the performance by denoising the labeled data with a more effective filtering method to deal the cases where the speckle noise with high noise level existed in fringe pattern or objects with abrupt changes and tiny details.

Fig. 1 .
Fig. 1.The diagram of the fringe part extraction.

Fig. 2 .
Fig. 2. The diagram of the proposed deep learning phase retrieval.

Figures 9 (
Figures 9(a) and 9(b) respectively show the experimental fringe patterns of human face and hand, both of which are not used in the training dataset.Figure 10 shows the results of fringe patterns using two different network training strategies.In detail, Figs.10(a-1) and 10(a-2) are extracted fringe parts from Fig. 9(a) by DNN without label enhancement and with label enhancement, while Figs.10(b-1) and 10(b-2) respectively show the extracted fringe parts from Fig. 9(b) by the two approaches.The corresponding phase results are shown in Figs.11(a-1) and 11(a-2) for Fig. 9(a) and in Figs.11(b-1) and 11(b-2) for Fig. 9(b), respectively.The 255 th column of Figs.11(a-1) and 11(a-2) and the 200 th column of Figs.11(b-1) and 11(b-2) are plotted in Figs.11(c-1) and 11(c-2), respectively.As seen DNN realizes the automatic extraction of the fringe parts by learning the mapping between the input fringe pattern and output fringe part.From the comparison we can see that Figs.11(a-1) and 11(b-1) are with more noise than Figs.11(a-2) and 11(b-2) respectively, indicating DNN with label enhancement exhibits fringe part enhancement automatically.It can also be seen from Figs.11(c-1) and 11(c-2) that the prediction results of the network using label enhancement are smoother than that without label enhancement.Overall, the above results show that enhanced fringe part can be extracted from the noisy fringe pattern, and the accuracy of the phase extraction result is improved by learning the mapping between the noisy fringe pattern and the denoised and enhanced fringe part.

Fig. 14 .
Fig. 14.The experimental fringe pattern of hand under motion with 6 different time.

Fig. 15 .
Fig. 15.Phase results of hand under motion at 6 different times by FT method and DNN method.