Phase unwrapping in optical metrology via denoised and convolutional segmentation networks

The interferometry technique is commonly used to obtain the phase information of an object in optical metrology. The obtained wrapped phase is subject to a 2π ambiguity. To remove the ambiguity and obtain the correct phase, phase unwrapping is essential. Conventional phase unwrapping approaches are time-consuming and noise sensitive. To address those issues, we propose a new approach, where we transfer the task of phase unwrapping into a multi-class classification problem and introduce an efficient segmentation network to identify classes. Moreover, a noise-to-noise denoised network is integrated to preprocess noisy wrapped phase. We have demonstrated the proposed method with simulated data and in a real interferometric system. © 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement


Introduction
Phase measurements are of considerable interest for many applications such as optical metrology, medical diagnostics, and 3D imaging. However, by an inverse trigonometric computation, the measured phase is normally wrapped onto the range of (−π, +π], which does not reflect the true phase values. To address this problem, lots of phase unwrapping approaches [1][2][3][4][5][6][7][8][9][10][11] have been proposed. The discontinuity arisen at residues is firstly identified in the Goldstein's branch-cut algorithm [1] and it is identified based on that the closed-path integral of phase gradient is not zero. Quality-guided algorithms [2][3][4] rely on quality map to guide the integration path rather than identifying residues. Approaches [5,6] are designed to minimize the discontinuities in the unwrapped surface. For methods [7,8], the phase unwrapping problem is formulated as a generalized minimum-norm sense to minimize the difference of local derivatives between true and measured phases. Unwrapping algorithms [9][10][11] based on the transport of intensity equation (TIE) were proposed to obtain the absolute phase map directly from intensity information. The TIE unwrapping method is easy to implement and path independent.
Recently, deep learning has been successfully applied in the fields such as image classification and restoration and many models based on deep convolutional neural network (CNN) achieve promising results. This advanced technology has also been adopted in the field of optical metrology. A multilayer perceptron neural network [12] was proposed to identify phase discontinuities. The residual neural network was adopted in [13] to address phase unwrapping and the network was trained on simulated phases with steep spatial gradients. However, the unwrapped phases shown in [13] were not correct for some regions. In 2018, we filed a provisional patent application on phase unwrapping using convolutional segmentation network and denoising network [14]. A similar method using convolutional segmentation network was also demonstrated with simulated data only in 2019 [15]. However, our work is different with [15] as followings: (1) Smaller filter size, dynamic filter number, and deeper layers make a wider network and enlarge the non-linear of our network. (2) Since the unwrapped phase is reconstructed by adding integral multiple n of 2π to the wrapped phase, denoised network for noisy wrapped phases is integrated into our phase unwrapping network.
In this paper, we propose a new approach based on CNN to unwrap phase, where we transfer the task of phase unwrapping into a multi-class classification problem. A noisy wrapped phase is firstly denoised by our proposed noise-to-noise denoised network. Then, it is fed into a segmentation network to generate integral multiple and a post-processing is used to correct the integral multiple. Finally, denoised wrapped phase and corrected integral multiple are combined to generate the unwrapped phase. Simulated data and experiments with a real interferometric system have demonstrated the effectivity of our proposed method.

Phase unwrapping via a convolutional segmentation network
The phase unwrapping problem is to add the integral multiple n of 2π at each pixel of wrapped phase ϕ w for obtaining the unwrapped phase ϕ unw as: where (x, y) is a position, ϕ w , ϕ unw ∈ R H×W and n ∈ {0, ±1, ±2, ±3, . . .}. As formulated in Eq.
(1), phase unwrapping aims to determine the integral multiple n. From the other perspective, phase unwrapping is a multi-class classification problem, where an integral multiple represents a class. Thus, we can use a classification network to figure out this problem. To identify classes accurately, we introduce an efficient segmentation network [16] that is used for image semantic segmentation. To solve our task, the network is designed as shown in Fig. 1. The goal of phase unwrapping is to determine the integral multiple n and then add 2nπ at each pixel of wrapped phase. Thus, our network outputs the integral multiple with wrapped phase as inputs. Moreover, the network includes an encoder path and a decoder path, followed by a final pixel-wise classification layer. A constant kernel size of 3-by-3 over all convolutional layers is chosen. Each encoder layer in the encoder path consists of operations: convolution (Conv), followed by batch normalization (BN) [17] and rectified linear unit (Relu, f (x) = max(0, x)) [18]. Following that, max-pooling with a 2-by-2 window and stride 2 is used to reduce the dimension of feature maps and achieve translation invariance over small spatial shifts. Moreover, the max-pooling indices are memorized for subsequent up-sampling operations. The encoder path consists of 13 encoder layers which correspond to the first 13 convolutional layers in the VGG16 network [19] designed for object classification. Each encoder layer has a corresponding decoder layer, hence the decoder path has 13 decoder layers. An up-sampling operation is done on feature maps before they go through a decoder layer. This operation is to up-sample input feature maps using the memorized max-pooling indices from the corresponding encoder location. Finally, the high dimensional feature representation is fed into a convolutional layer and a soft-max classifier to produce a N-channel image of probability, where N is the number of classes. The output n corresponds to the class with maximum probability at each pixel. Actually, the output n is nonnegative because of soft-max operation. There is a constant difference C between n and n. In practice, the constant C can be chosen to make sure that C ≥ − min(n). To train the network, the number of categories N need to be fixed and it can be set: N = max(n) + C + 1. Too small category limits the range of fringe pattern. Too large N needs more training data to obtain a well-trained network. After we get the integral multiple n, the unwrapped phase can be reconstructed as: For training, given a training dataset where M is the number of the training dataset, the network aims to learn the best end-to-end mapping F(·), which minimizes the difference between prediction n k = so f tmax(p k = F(ϕ (k) w )) and ground-truth The item p k ∈ R H×W ×N represents the intermediate output before going through soft-max operation which produces the position with maximum probability at each pixel. To measure the difference, cross entropy loss function [20] is used since it is commonly used in the field of image classification. Thus, the loss function can be formulated as: where p k,t is a predicted probability belongs to class t = n k .
To train the network, we built datasets (simulated and real datasets) which consist of training and testing set, and these two sets were not intersected. Besides, we initiated the weights by the method described in [21] and used the Adam optimizer [22] with the mini-batch size of 3. The learning rate was initially set to 0.0001 and exponentially decayed with a rate of 0.99. The max epoch number was set to 1000. For quantitative measure, we calculated the pixel classification accuracy on the testing set. The number of pixel in class i predicted to belong to class j is denoted as s i j (includes s ii ) and the total number of pixels in class i is denoted as q i = j s i j . Thus, the pixel classification accuracy can be calculated as: i s ii / i q i . We also calculated the root mean square error (RMSE, || ϕ unw − ϕ unw || 2 /(H × W)) between reconstructed unwrapped phase ϕ unw and ground-truth ϕ unw .
To generate the simulated data set, a list of Zernike coefficients up to 35 terms was randomly generated to represent the surface ϕ unw . The interferograms were generated by Eq. (4): where A and B are constant, a ∈ {0, 1, 2, 3} is used to generate phase shift fringe and noise is zero for clean data. Then a four step phase shifting algorithm was used for calculating the wrapped phase as: For the experiment on simulated clean data, 10000 pairs of data (wrapped and unwrapped phases) with size of 400-by-400 were generated, where 9500 pairs were used for training and the rest was used for testing. Besides, the class number N and the constant C were set to 29 and 7, respectively. The average pixel classification accuracy and the average RMSE on the testing data are 94.62% and 0.0022, respectively. The experimental result on simulated clean data is shown in Fig. 2. The wrapped phase in Fig. 2(a) was fed into the network as a input. The network output and reconstructed unwrapped phase are shown in Figs. 2(b) and 2(d), respectively. Comparing with ground-truth, we can see that our network produces pretty good results. The difference between reconstructed unwrapped phase and the corresponding ground-truth is shown in Fig.  2(f). One can see that the difference is so small and a majority of pixel value is zero, which demonstrates that our network works well on clean data for phase unwrapping.   However, as shown in Figs. 3(c1) and 3(c2), there are corrupted regions in unwrapped phases due to misclassification. To improve the classification accuracy, a post-processing operation was done on the network output (integral multiple n). We firstly identified phase discontinuity locations since they played a key role in phase unwrapping [23]. Fortunately, the identification of phase discontinuity can be done by our network because the phase discontinuity task can be considered as a two-class classification problem. We just changed the final classification layer to predict two-class problem, correspondingly, binary images of phase discontinuity were treated as ground-truth to train the network. The locations of phase discontinuity of wrapped phase Figs. 3(a1) and 3(a2) are shown in Figs. 4(a) and 4(c), respectively. Next, connected regions were labeled based on the extraction result of phase discontinuity, as shown in Figs. 4

(b) and 4(d).
Finally, based on the labeling results, we corrected the integral multiple n according to a principle that same label corresponds to same value of integral multiple. The post-processed unwrapped phases are shown in Figs. 3(d1) and 3(d2) and they more closely resemble the ground-truth ( Figs. 3(b1) and 3(b2)). The last row in Fig. 3 shows differences, where Figs. 3(e1) and 3(e2) represent the differences between reconstructed unwrapped phases and ground-truth, Figs. 3(f1) and 3(f2) represent the differences between post-processed unwrapped phases and ground-truth. We can see that the difference is much smaller after post-processing. Besides, after post-processing, the average pixel classification accuracy and the average RMSE on the testing data are 96.88% and 0.0014, respectively, which demonstrates that the post-processing is benefit to unwrapping precision.

Phase unwrapping via a convolutional segmentation and denoised networks
To validate the effectivity of our proposed approach, an experiment was carried out on the noisy wrapped phase. Since the unwrapped phase is reconstructed by adding integral multiple n of 2π to the wrapped phase for our method, denoising is necessary on noisy wrapped phase. In practice, clean data is difficult to obtain.
To address this problem, a noise-to-noise denoised strategy [24] was integrated into a modified version of U-Net [25]. The network architecture of denoising noisy wrapped phase is shown in Fig. 5. Since image details were very important for our task, we removed pooling layers and up-sampling convolutions. In the U-Net, the output image was smaller than the input, which was unsuitable for our task. Thus, zero padding was used to keep the sizes of all feature maps (including the output image) the same. A constant kernel size of 3-by-3 and a constant filter number of 64 over all convolutional layers were chosen. Besides, we used two observations (for the same fringe pattern, two images with different random noise) of noisy wrapped phase to train our network. The observation 1 was treated as input and the observation 2 was used to calculate the loss. We generated two observations of noisy wrapped phases for each group and 500 × 2 groups of data were cropped into small patches with a size of 40-by-40 to train the network. The denoised result on wrapped phase is shown in Fig. 6. The noisy wrapped phase shown in Fig. 6(a) was generated as followings: a combination of Possion and Salt-Pepper random noise was added to interferograms I a , then noisy wrapped phase was obtained according to Eq. (5). The SNR(Signal-to-noise ratio) of the noisy wrapped phase is 4.0 dB. The denoised wrapped phase and the corresponding ground-truth are shown in Figs. 6(b) and 6(c), respectively. We can see that the proposed denoised network reconstructs almost clean result even for badly corrupted wrapped phase. What's more, the result was obtained by only using noisy data (not clean wrapped phase) to train our network. After we obtained the denoised wrapped phase, it was fed into our unwrapping network to produce the integral multiple n. Then, the post-processing and smooth constraint were used to reconstruct unwrapped phase due to noisy contours of wrapped phase and the reconstructed unwrapped phase is shown in Fig. 6(d). The difference between the reconstructed unwrapped phase and corresponding ground-truth (Fig. 6(e)) is shown in Fig. 6(f) and it is very small. The more badly corrupted data (SNR= 0.6 dB) was also used to test our network. The same results are shown in Fig. 7. One can see that, by integrating noise-to-noise denoised network, our unwrapping network still works well on noisy wrapped phase. The average pixel classification accuracy and the average RMSE on the testing noisy data are 90.04% and 0.0034, respectively.  Moreover, we compared our proposed method with Goldstein's branch cut algorithm [1] and Quality-guided path-following method [26] on noisy wrapped phases. For the noisy wrapped phases shown in Fig. 6(a) and Fig. 7(a), the unwrapping results by these two methods are expressed in Fig. 8. The results shown in Figs. 8(a) and 8(e) are unwrapped by Goldstein's branch cut algorithm, and the corresponding differences are displayed in Figs. 8(b) and 8(f), respectively. The unwrapped phased by Quality-guided path-following method are shown in Figs. 8(c) and 8(g), and the differences are expressed in Figs. 8(d) and 8(h), respectively. Compared with our results (Fig. 6(d) and Fig. 7(d)), one can see that our method produces the best results and the differences are the smallest. We also compared the running time and the RMSE results, as shown in Table 1. The running time was evaluated on a machine with 3.4 GHz Intel(R) Core(TM) i3-2130 CPU (8G RAM). We also evaluated the running time of our network on a machine with a 3.5 GHz CPU and a Titan X GPU and the image size is 400-by-400. The value on the left of "/" represents the RMSE result of unwrapped phase on Fig. 6(a), and the right value is on Fig. 7(a). From Table 1, we can see that our network achieves the highest precision and the running speed is the fastest. Besides, the running time on GPU makes our network available for real-time applications.

Experimental demonstration
Finally, we tested the network on real data. A HeNe laser source with wavelength of 632.8 nm was used as light source. An AlpaoDM with 97 actuators and 13.5 mm aperture diameter was placed in test arm to generate different fringe patterns, as shown in Fig. 9. To achieve snapshot measurement, a pixelated polarization camera PolarCam from 4D Technology, Inc. was used to capture 4 interferograms simultaneously [27]. Up to 15 terms of Zernike coefficients were random generated and applied to DM during the experiment. Because there was no ground-truth of unwrapped phase, the reconstructed unwrapped phases by a modified Goldstein's algorithm (followed by denoising wrapped phase, MG) were used as the ground-truth to train the network. Totally, 1500 groups of data (wrapped and unwrapped phases) were obtained, where 1300 groups were used for training and the rest was used to test. As show in Fig. 10, the first column represents wrapped phases obtained by our setup. The second and third columns show the reconstructed unwrapped phases by our network and MG, respectively. The differences between these two are shown in the last column. From the results, we can see that our network still works well on real data. Fig. 10. Unwrapping results on real data. From left to right are: wrapped phases (input, (a), (e)), reconstructed unwrapped phases by our network ((b), (f)) and MG ((c), (g)), and differences ((d), (h)).

Conclusion
In conclusion, we propose a new approach based on CNN for phase unwrapping. The phase unwrapping issue is transferred into a multi-class classification problem and an efficient segmentation network is introduced to identify classes. Besides, this network can be used to identify phase discontinuity locations and a post-processing operation is adopted to improve the performance. Moreover, a noise-to-noise denoised network is integrated to preprocess noisy wrapped phase. Since our network is fully convolutional, it also works on other image sizes (different with training image size). Simulated and experimental data have demonstrated the effectivity of our approach. Our current networks were trained with and work well with continuous wrapped phase maps which are typical cases in interferometric optical metrology. We are working on more complex cases with discontinuous wrapped phases and will report the new approaches in the near future.