Phase unwrapping based on a residual en-decoder network for phase images in Fourier domain Doppler optical coherence tomography.

To solve the phase unwrapping problem for phase images in Fourier domain Doppler optical coherence tomography (DOCT), we propose a deep learning-based residual en-decoder network (REDN) method. In our approach, we reformulate the definition for obtaining the true phase as obtaining an integer multiple of 2π at each pixel by semantic segmentation. The proposed REDN architecture can provide recognition performance with pixel-level accuracy. To address the lack of phase images that are noise and wrapping free from DOCT systems for training, we used simulated images synthesized with DOCT phase image background noise features. An evaluation study on simulated images, DOCT phase images of phantom milk flowing in a plastic tube and a mouse artery, was performed. Meanwhile, a comparison study with recently proposed deep learning-based DeepLabV3+ and PhaseNet methods for signal phase unwrapping and traditional modified networking programming (MNP) method was also performed. Both visual inspection and quantitative metrical evaluation based on accuracy, specificity, sensitivity, root-mean-square-error, total-variation, and processing time demonstrate the robustness, effectiveness and superiority of our method. The proposed REDN method will benefit accurate and fast DOCT phase image-based diagnosis and evaluation when the detected phase is wrapped and will enrich the deep learning-based image processing platform for DOCT images.


Introduction
Fourier domain optical coherence tomography (FD-OCT) is a high-resolution non-invasive 3D imaging modality, which has been widely used for biomedical research and clinical studies. The measurement of blood flow with Doppler OCT (DOCT) systems plays great important roles in both disease diagnosis and clinical intraoperative evaluation [1][2][3][4]. Phase images from DOCT that can provide important underlying biophysical information often suffer from phase wrapping problems when detected phase range is out to the dynamic range of 2π.
Phase unwrapping is a basic signal processing issue to resolve true phase value from detected phase value wrapped in the range of (-π, π]. Ideally phase unwrapping can be achieved by adding an appropriate integer multiple of 2π to each pixel based on the phase difference between the adjacent pixels. However, phase unwrapping becomes very challenging when severe noises, shadow effect and phase discontinuities are existent. Specially, DOCT phase images carrying the feature information of the blood flow suffers from severe speckle noise and random noise. High noise level in DOCT phase images increases the complexity of the phase unwrapping procedure. Many ingenious phase unwrapping methods have been proposed over the past years. The first category includes those path-following methods, which contains quality-guided algorithm [5,6], cut algorithm [7,8], and minimum discontinuity algorithm [9,10]. Performance of these algorithms are effective but not robust to severe noises. The second category is based on the framework of minimum-norm. The least-squares is the most representative algorithm [11,12] with fast Fourier transformation and discrete cosine transform operations. L1-Norm envelope sparsity theorem has been proposed [13], which gives a sufficient condition to achieve phase unwrapping. The modified network programming (MNP) algorithm [14,15] assumed that the mapping relation between wrapped phase and true phase can be not necessarily the integer multiple of 2π. The limitation of the minimum-norm methods is that the output is dependent on the computation path. It may introduce distortion in the regular regions without noise. The third category is featured with phase noise filtering, such as unscented Kalman filter [16,17] and the recursive phase unwrapping (RPU) system [18][19], which can achieve accurate and noise-immune unwrapping. However, it produces over smoothness on phase image.
Recently, deep learning methods based on convolutional neural network (CNN) have been proposed to address signal phase unwrapping issues. The idea is to transform it into semantic segmentation, which is also referred as "pixel-wise classification," aims at classifying each pixel into one of the pre-determined classes [20][21][22]. Utilization of hundreds of hidden layers, consisting of millions of parameters in CNN methods makes it possible to discover intricate structures in large datasets [23][24][25][26]. Recently PhaseNet based on U-net architecture was proposed to address simulated phase data with Gaussian function for general purpose study [27]. Deep learning based phase unwrapping (DLPU) based on U-net architecture was further evaluated on holographic interferometry imaging [28]. Zhang et al used the DeepLabV3+ architecture to address phase unwrapping for interferometric metrology almost contemporarily [29].
Currently, no deep learning based methods have been reported nor evaluated on addressing DOCT phase image unwrapping issues. Incorporating the contextual information and adding short-term memory to each layer, residual neural network architecture has demonstrated its superior performance in semantic segmentation [30][31][32][33][34][35]. To solve phase unwrapping problem in DOCT phase images, we propose a noise-immune and robust residual en-decoder network (REDN) by uniting multi-scale context with pixel-level accuracy. The REDN includes a residual stream and a pooling stream. The residual stream carries feature maps at full image resolution, which can be combined with classical residual block (RB) and full resolution residual block (FRRB) from the pooling stream. The pooling stream performs the pooling operations and plays an important role in capturing high-level information through the network. These two streams are concatenated at the full image resolution. Without additional pre-processing or post-processing procedure, our method can obtain accurate phase unwrapping results for Doppler OCT phase images. Compared to DeepLabV3+, PhaseNet and MNP methods, our method demonstrates its superiority.
The remainder of this paper is organized as follows. Details of our method are illustrated in Section 2. Experimental results and comparison with other methods on simulated images, phantom plastic tube flow images and mouse artery flow images are presented in Section 3. Discussion are shown in Section 4. Finally, the main conclusions are presented in Section 5.

Methods
Phase unwrapping is the process to get true phase value from detected phase value wrapped in the range of (-π, π]. When the detected phase exceeds the range of 2π, phase wrapping becomes an issue. The relation between the true phase and wrapped phase is given below: where φ(x, y) is the true phase and ϕ(x, y) is the wrapped phase, (x, y) represents the spatial coordinates of a pixel and k(x, y) denotes integer multiple of 2π referred to as phase jump-count to be added to the wrapped phase to get the true phase. In our framework, the phase unwrapping process regarded as the semantic segmentation task is learned through the architecture that takes ϕ(x, y) as the input and gives the output as jump-count k(x, y). The ground truth of the training process can be computed using Eq. (2): One of the basic requisites for deep learning is sufficient amount of labeled training data. For DOCT phase images, there is limited number of training data due to the complexity of in-vivo experiment to get sufficient sample images. In addition, the ground truth DOCT phase image with no wrapping issue is hard to obtain. Thus, simulation data is an alternate approach to take. In this work, we generated the training dataset and validation dataset by the definitive relation between the wrapped phase image and the true phase image as shown in Fig. 1. It is worth mentioning that noises need to be considered to make the dataset faithful to the phase images from DOCT. A typical Doppler OCT phase image is shown in Fig. 1(a). We extracted the noise from the background of the DOCT phase image including mainly shot noise as shown in Fig. 1(b). The abundant simulation data were generated using linear combination of multiple Gaussian phase distributions located at different locations with both forward and backward flow directions as shown in Fig. 1(c). The mean and variance of each Gaussian phase distribution were randomly varied. This enables the network to learn phase continuities for broad general shapes rather than limiting it to certain definitive patterns. Figure 1(b) and Fig. 1(c) are combined to get the phase image with noise, shown in Fig. 1(d). Wrapped image Fig. 1(e) is used as the input of the neural networks, whose minimal value is shifted to zero. The ground truth jump-count map of simulation data is shown in Fig. 1(f) obtained according to Eq. (2). Background region without the target information is thresholded with an absolute value of 0.05 to avoid feature interference and save training time.

Experimental data
To validate the performance and robustness of REDN method on real sample phase images, two types of experimental data were used. One is the transparent plastic tube (referred as phantom, 0.5 mm inner diameter, 0.9 mm outer diameter) with different velocities of flowing milk and the other is mouse artery.
The phantom images were obtained from a home-built spectral domain OCT system with a central wavelength of 1300 nm and a bandwidth of 60 nm. The system ran at 70 fps with 1000 A-scans per frame and had a measured axial resolution of 14 µm, imaging range of 6.7 mm. The sensitivity and phase stability of the OCT system were 92 dB and 70 mrad, respectively. The Doppler flow imaging speed range thus is calculated to be from 0.316 mm/s to 14.2 mm/s in both directions parallel to the scanning beam with adjacent A-scan time difference of 14.2 µs [15].
The mouse artery was imaged with a MEMS-based handheld probe. The system ran at 36 fps with each frame size of 1000 (lateral)×512 (axial) pixels using a swept source with a central wavelength of 1310 nm and tuning range from 1260 nm to 1360 nm. The axial and lateral resolution of OCT was to be 12.6 µm in air and 17.5 µm respectively. The system had a sensitivity of 84 dB and sensitivity roll-off of 5.7 dB/mm over an imaging range of 5 mm. The phase stability of the OCT system is 70 mrad. The Doppler flow imaging speed range thus is calculated to be from 0.363 mm/s to 16.3 mm/s in both directions parallel to the scanning beam with adjacent A-scan time difference of 20 µs [36].

Network architecture
Our REDN architecture, illustrated in Fig. 2, is derived from the architecture described in [37], which is a very deep network. The input is a 256×256-pixel image. The output is the predicted jump-count map obtained by the softmax. We adopted two threads to combine the high-level features for recognition and low-level features for localization. In the pooling thread, we reduced the size of the features and increased the receptive field of the networks with the max pooling layers. The residual thread can compute the residuals at full image resolution with FRRB and RB that make the high-level features to go through the network [38][39][40]. Meanwhile, it has been proven that it is much easier to optimize a residual mapping than the original plain network [41]. The full pre-activation RB was adopted in our network architecture, as shown in Fig. 3(a). The 1×1 convolution layer is essentially a linear projection onto the space of the same dimensionality and an additional non-linearity is presented by the rectification function [42]. The FRRB acting as a residual unit for residual thread includes two inputs and outputs. Figure 3(b) shows the detail of the full resolution residual blocks. At first, Input1 from the residual thread goes through pooling layers first to concatenate with Input2 from the pooling thread. Then the concatenated features are flowed into two convolution blocks. Before the 3×3 convolution layers, we add the batch normalization layer and Relu activation function as the non-linearity transformation. And a second convolution block on one hand forms Output2 for the next FRRB and on the other hand is followed by a 1×1 convolution layer and up-sampling layer to make concatenation with the residual thread.

Results
We compared our results with recently proposed deep learning based DeepLabV3+ and PhaseNet methods for signal phase unwrapping and traditional modified networking programming (MNP) method. Firstly, quantitative evaluation metrics are presented. Then, we exhibit the phase unwrapping results on the simulated image, phantom tube image and mouse artery image.

Quantitative evaluation
The evaluation metrics were applied to evaluate both the segmentation and classification performance, which contained sensitivity (SE), specificity (SP) and accuracy (AC). We computed the average of the test dataset to get the final results. The evaluation metrics are defined as: where N tp , N tn , N fp and N fn represent the number of true positive, true negative, false negative and false positive, which are defined on the pixel level. A predicted 'jump pixel' is regarded as a true positive if its ground truth is 'jump'. Otherwise, it is regarded as a false negative. A predicted 'non-jump pixel' is considered as a true negative if its ground truth is 'non-jump'. Otherwise, it's regarded as a false positive.
For the simulated image, root mean square error (RMSE) can be used to describe the similarity between the predicted phase image P(k, j) and the true phase image: where k and j denote pixel position, m and n are the height and width of the image, respectively. However, the true phase image of the DOCT real experimental data cannot be obtained, the RMSE is not applicable. Here, we calculated the total variation (TV) of the predicted phase image to evaluate the unwrapping effectiveness. The TV is defined as the integral of the magnitude of the image gradient, which can be expressed as: where u x = ∂u ∂x , u y = ∂u ∂y , D u is the image domain. TV as a reference parameter can measure the phase discontinuity of reconstructed phase image based on the fact that unwrapped phase will have smooth phase transition.

Training procedure
The training dataset consists of 12000 images with the size of 256×256. The validation dataset consists of 3000 images. The wrapped phase image is used as input of the architecture and the jump counts of the training images are distributed in the range of -5 to 5 consisting 11 classes in total. Cross-entropy loss and stochastic gradient decent with momentum 0.9 and initial learning rate of 10 −4 were used in our REDN model training process. Cross-entropy loss and Adam optimizer and initial learning rate of 10 −4 were used in PhaseNet model training process. Cross-entropy loss and stochastic gradient decent with momentum 0.9 and initial learning rate of 10 −6 were used in DeepLabV3+ model training process. In order to reduce overfitting, we used the dropout ratio 0.25, 0.2 and 0.2 corresponding to REDN, PhaseNet and DeepLabV3+ architecture during the training procedure. In Fig. 4, the convergence of the learning curves is illustrated in terms of the training loss and validation loss over the epochs.

Experiments on simulated data
The unwrapping results of the simulated wrapped image for different methods are shown in Fig. 5. Figure 5(a) is the wrapped image and Fig. 5(b) denotes the true phase image of Fig. 5(a). The first row of the Fig. 5(c) denotes the predicted unwrapped phase images with our REDN method, DeepLabV3+, PhaseNet and MNP respectively. The second row of the Fig. 5(c) denotes the residual error between the predicted unwrapped phase image and the true phase image Fig. 5(b). Both our REDN method and MNP method can obtain a good performance even though the wrapped points are almost difficult to identify with naked eyes. The DeepLabV3+ method cannot obtain the discrete jump pixels at the edge of the wrapped region. From the residual error, we can see PhaseNet method performs better than DeepLabV3+ method, but worse than our method and MNP method. Quantitative parameter comparison including AC, SE, SP, RMSE, and time consumption based on 400 test simulated images is shown in Table 1. Please note for MNP method AC, SE and SP are not applicable. From Table 1, we can see that among all deep learning methods, our method is better than PhaseNet method, which is better than DeepLabV3+ method for simulated data. RMSE value of MNP is very close to our method. However, its time consumption is much longer than deep learning based methods. The reason is that deep learning based methods are GPU accelerated while MNP is processed using CPU only.

Experiment on real data
To compare the phase unwrapping capability on real DOCT images of four methods, Fig. 6 shows four phantom images with different flux levels. It is clear from Fig. 6 that our method and MNP method can achieve satisfactory results with good continuity and contrast. These comparison result illustrates that our method has the outstanding performance compared to the other two deep learning methods for DOCT phantom images. When the flow velocity is high, the predicted results with DeepLabV3+ are more accurate than PhaseNet. The PhaseNet method cannot unwrap the phase image precisely at central pixels and the DeepLabV3+ method has low accuracy near the tube wall region. However, as the flow velocity decreases, PhaseNet performs better than the DeepLabV3 + . It implies that the DeepLabV3+ model can recognize the general region of the wrapped phase but lack the good pixel-accuracy localization. And the PhaseNet can localize the wrapped phase pixel-by-pixel while the accuracy is not high enough. Please note motion effect caused by movement of sample or instrument, assuming its continuous variation over a single phase image, will change the detected flow profile. However, phase wrapping boundaries with value of integer multiple of 2π are independent of motion effect. Bulky motion correction can be applied for phase images after unwrapping process when it is necessary. We can see that four methods can unwrap the real DOCT blood flow images to some extent. Particularly, the noise increases as the imaging depth increases due to the blood scattering. The bottom of the vessel is blurred with severe noise and the blood flow information cannot be obtained clearly. The MNP method and our proposed method both have similar good unwrapping result at most flow region. Large differences come from high noise region. Proposed REDN method has a better performance than MNP method on vessel phase image in high noise region especially at the bottom of vessel. Meanwhile, the other two deep learning methods cannot obtain accurate result near the vessel wall area. Figure 7(e) plots the wrapped phase profile and the reconstructed flow profile for different methods over the dashed red line marked in Fig. 7(d). We can see the difference of unwrapping results between these four methods clearly. The unwrapping results of our method are consistent with the MNP's to some extent. Our unwrapping result looks more reasonable intuitively, while DeepLabV3+ and PhaseNet generate erroneous results that are of large discontinuity.
To evaluate the performance of these four methods on DOCT real phase images quantitatively, we calculated the TV in Table 2. Our method has approximate effectiveness with MNP. The MNP has the smallest TV value on phantom image while our method has the smallest value on vessel image. However, our method has a better performance than those two deep learning methods. Please note the reason for larger TV value of phantom image compared to vessel image is due to the larger cross section pixel numbers in phantom image. TV can serve as a reference parameter but not a definitive parameter. It should be combined with visual image inspection.

Discussions
In the model training process, the noise level of the training data is fixed, and the corresponding network was trained under this fixed noise level. In practice, it is impossible to train different parameters subject to different noise levels. Here, we make a phase unwrapping comparison on the noise effect in test simulated data. The noise level of training data is regarded as 1, and we added different noise to the same simulated data with levels from ratio 0.1 to 1.2, corresponding SNR of 35.9 dB to 0.24 dB based on SNR (dB) = 10log 10 (P signal / P noise ). Figure 8 shows the mean RMSE values and its stand deviation. We can see that as the SNR increases, the mean RMSE value gets smaller, which is as expected. The MNP method has the best performance at high SNR region while our method has the best performance than MNP in low SNR region. As SNR gets smaller, the RMSE value of PhaseNet method and DeepLabV3+ method becomes larger quickly. And the maximal RMSE value of our method is still small, which means that our method is more robust than those two deep learning methods.

Conclusions
We proposed a deep learning based REDN phase unwrapping method for DOCT phase images by regarding it as a semantic segmentation. To address the insufficiency of qualified training data, we trained the REDN model using the simulated data synthesized with DOCT phase image noise. We combined residual thread and pooling thread at two resolution levels to achieve consistently robust performance confirmed by experiments on simulated, phantom and vessel phase images. Comparison with recently proposed deep learning based DeepLabV3+ and PhaseNet methods and traditional modified networking programming method shows the superiority of our method. The proposed REDN method can retrieve unwrapped phase information from DOCT systems for accurate quantitative diagnosis and evaluation when detected phase is wrapped. It could be integrated with other deep learning-based image processing models for future OCT image analysis.

Disclosures
The authors declare that there are no conflicts of interest.