Three-dimensional imaging through turbid media using deep learning: NIR transillumination imaging of animal bodies

: Using near-infrared (NIR) light with 700–1200 nm wavelength, transillumination images of small animals and thin parts of a human body such as a hand or foot can be obtained. They are two-dimensional (2D) images of internal absorbing structures in a turbid medium. A three-dimensional (3D) see-through image is obtainable if one can identify the depth of each part of the structure in the 2D image. Nevertheless, the obtained transillumination images are blurred severely because of the strong scattering in the turbid medium. Moreover, ascertaining the structure depth from a 2D transillumination image is difficult. To overcome these shortcomings, we have developed a new technique using deep learning principles. A fully convolutional network (FCN) was trained with 5,000 training pairs of clear and blurred images. Also, a convolutional neural network (CNN) was trained with 42,000 training pairs of blurred images and corresponding depths in a turbid medium. Numerous training images were provided by the convolution with a point spread function derived from diffusion approximation to the radiative transport equation. The validity of the proposed technique was confirmed through simulation. Experiments demonstrated its applicability. This technique can provide a new tool for the NIR imaging of animal bodies and biometric authentication of a human body. Society of America under the terms of the OSA Open Access Publishing Agreement


Introduction
In medical and biometric applications, the three-dimensional (3D) structure of blood vessel networks provides crucial information for diagnosis, treatment evaluation and personal authentication. As examples, this information is extremely helpful for evaluating cancer invasion depth, raising robot surgery precision, and stepping up vein authentication from 2D to 3D. Useful imaging techniques available in medical fields such as X-ray CT, MRI, and PET can provide high-quality 3D images, but they require hazardous radiation or large-scale equipment. Recently, an acousto-optic imaging technique has been used to visualize the 3D blood vessel structure of a human body [1][2][3][4][5][6]. This technique is safe and useful, but it requires both light and ultrasound. This requirement not only makes the system complicated; it also makes contact on the lesion unavoidable.
Optical transillumination imaging techniques are other candidates to visualize blood vessel networks. Using these techniques, non-contact measurements can be taken using simple, compact, and safe equipment. Major veins become visible under visible light illumination if the subcutaneous vein lies at a few millimeters depth under the skin [7][8][9][10][11]. Nevertheless, the image is not clear. The deeper blood vessels are not visible. When using near-infrared (NIR) light, the vein image can be visualized better because less scattering and absorption occurs than with visible light. Some instruments are available to provide a vessel network pattern using NIR light [12][13][14][15][16][17][18]. However, the captured vein image before image processing is blurred severely because of light scattering in the interstitial tissue between the vein and the skin. Diffuse scattering at the body surface also degrades the vein image in non-contact measurement. These effects degrade the transillumination image quality and make visualization of the 3D structure difficult.
One can imagine a clear image from a blurred one. Similarly, one can estimate an object's depth when immersed in a turbid medium, much as one can infer an approximate depth of a fish in muddy water from a blurred view. This seems to derive from many earlier experiences. Therefore, a clear image might be obtained from a blurred one; the depth of an absorber might be estimated using the well-trained neural network of a computer. Similar ideas for deblurring and depth estimation [19][20][21] [20]. However, their technique cannot be applied to the blurred image in a turbid medium. Sabir et al. presented a CNN-based technique to estimate the bulk optical properties (absorption and scattering coefficients) of a highly scattering medium such as biological tissue in diffuse optical tomography (DOT) [21]. Similarly, Yoo et al. proposed a technique to use a CNN to obtain the distribution of optical anomalies for DOT [22]. They used the same DOT system which required many optical fibers to obtain large number of input and output signals and extensive calculation to solve inverse problems. Our technique requires only a wide illumination device and a single camera with relatively simple video-capture software. In addition, few reports have been found on the combination of deblurring and depth estimation using deep learning, particularly for transillumination images. With a view toward the better visualization of the blood vessel network, we propose a new technique to obtain a clear 3D structure from a blurred 2D transillumination image. The validity of the proposed technique was examined in simulation. Its applicability was tested through experimentation.

Training data generation
The proposed technique is based on deblurring and depth estimation using a neural network trained for blurred images. A neural network (NN) designed for deep learning was used for this study. To train the NN for deblurring, we feed many pairs of clear and corresponding blurred images to a computer system. To train the NN for depth estimation, we feed many pairs of the depth of the absorber in a turbid medium and the blurred image of the absorber.
Generally, better performance of NN can be expected with a greater number of the training pair before reaching the overfitting limit. In our system, the number of the training pairs was from several hundreds to a few thousands in a single epoch. It is unrealistic to prepare such numerous training pairs in a practical measurement. Therefore, we generated blurred images by the convolution of original images with a point spread function (PSF). The PSF based on the model presented in Fig. 1 is given as [23]: where κ d = [3µ a (µ s ′ + µ a )] 1/2 . C, µ s ', µ a and d respectively represent the constant with respect to ρ and d, the reduced scattering coefficient, the absorption coefficient, and the absorber depth.
This PSF was derived originally for the light intensity distribution on the surface of a turbid medium for a point light source as presented in Fig. 1 [23]. In contrast, the transillumination image is a blurred shadow of an absorber in a turbid medium. It has been verified that this PSF is applicable to the blur in transillumination imaging regarding the image as a collection of point absorbers [24]. Using this PSF we can obtain the blurred images for specific depth in a turbid medium. The calculated images were good for training the neural network. In this calculation, the background of an image is assumed to be homogeneous. In practice, however, the medium is often inhomogeneous in scattering and absorption coefficients. The image blur is much more dependent on scattering than absorption. In macroscopic imaging of animals, the target of the imaging is often the absorption distribution, and the scattering coefficient does not vary much in the viewing area. Therefore, this PSF can be used to simulate the blur in practical variations. The applicability of this PSF in practice was reported before [23]. Figure 2 presents examples of training pairs obtained using PSF convolution with different depths. The original and the blurred images were used as a training pair for the fully convolutional network (FCN) to deblur the image. Depth d and the blurred image were used as a training pair for the convolutional neural network (CNN) to estimate the absorber depth.

Deblurring with FCN
We can expect to obtain a blur-less image as an output of NN for a blurred image input if we train NN with many pairs of images before and after blurring. We used a NN developed for deep learning. In deep learning, CNN is commonly used for classification, detection and segmentation of an image. In our application, the NN output should be a modified image. Therefore, we used FCN for which the last fully connected layer of NN was replaced by a convolutional layer. For the FCN, we used an NN based on U-net with skip connections [25,26] to improve the image processing accuracy. Figure 3 presents the concept of deblurring with FCN.

Depth estimation with CNN
In transillumination imaging of animal bodies, images of absorbers such as blood vessel networks are blurred by strong light scattering at the body tissue. The degree of the blur is dependent on the absorber depth in a turbid medium. As the depth increases, the transillumination image of the absorber becomes more blurred. Therefore, if we train NN with many pairs of a blurred image and a corresponding depth, then the NN will output the depth for a new blurred image input. This is a common task of classification in deep learning. Figure 4 portrays the concept of depth estimation with CNN.

Clear 3D image from blurred 2D image
From transillumination imaging of an animal body, one obtains a 2D blurred image of an absorbing structure in the body. With FCN and CNN, one can deblur the 2D image and obtain the absorber depth. After dividing the image into small parts and obtaining the depth for each part, one can reconstruct a three-dimensional clear image of the absorbing structure. The rear absorber depth cannot be obtained in the part if a part of an absorber overlaps with another absorber in a single 2D image. In such a case, 2D images should be taken from a few orientations. Then the processes presented above should be repeated.

Deblurring with FCN
As described in Sec. 2.2, we can expect to obtain a deblurred image as an output of FCN. The feasibility of this technique was examined through simulation. For FCN, we used the U-net with skip connections [25,26]. The skip connection in the U-net connects the coding network of a blurred image with the decoding network for a clear image such that the features of the sampling layer in the coding network can be transmitted directly to the sampling layer in the decoding network, which makes the location of the pixels in the network more accurate.
To train the FCN, we generated 5,000 pairs of clear and blurred images. The original images were 10 patterns that were made artificially to simulate images of the subcutaneous blood vessel network. The different blurred images were generated from original clear images by convolution with the PSF given in Eq. (1). The optical parameters were those of general human body tissue, or µ s ' = 1.0 /mm and µ a = 0.01 /mm. These parameters were used in all simulations described hereinafter. The PSFs with 10 depths were applied to the 10 patterns. The images were rotated in 50 orientations to produce 5,000 pairs training data. Subsequently, 5,000 training pairs fed into the FCN for training with the batch size, the filter size, and the epoch of 10, 3 × 3 and 100, respectively.
With the original clear patterns, they constituted 5,000 training pairs for FCN. To test the FCN, 400 testing images were generated from four original patterns which differed from the 10 original patterns used for training. The training was done on a workstation (Intel Core i7-7700 K CPU; 3.00 GHz; 32 GB memory). The FCN was run by Python in a workstation equipped with a graphic processing unit (GTX 1080Ti; GeForce). Figure 5 presents examples of a training pair, an input test image, an output image from FCN and an original image before blurring. Using the trained FCN, we were able to restore the clear original image from the badly blurred image. Like the brain, the trained FCN were able to accommodate new blurred patterns well with different absorber depths. To analyze the deblurring effects, the quality of the output image from FCN was evaluated in correlation analysis. Figure 6 presents correlation between the output image and the original image before blurring. As the absorber depth increases, the blur becomes severe and the deblurred image quality became worse. In Fig. 6 the FCN performance was compared with that obtained by training of different types. The decrease of image quality with the absorber depth was considerable with fewer training data. With one fifth of the training data, the image quality decreased rapidly with the absorber depth. For training, more depths seemed to produce better results than more image-orientations. However, this difference was much smaller than that shown for the number of training pairs. These results show that we can get a clear image for the absorber as deep as several to 10 mm in a turbid medium. In this study, we did not add noises for training data. However, using this trained system, we could get clear noise-free image with little defects even for the input image with random noise. We also confirmed that we can get perfect image with the system trained with Gaussian noise. These analyses verified the FCN capability to deblur an image with sufficient training and with appropriate choice of training data. If we use the images captured in practical environment for training, the improvement in performance is expected. But it is often not easy to capture enough number of training images.

Depth estimation by CNN
As described in Sec. 2.3, we can expect to obtain the absorber depth as an output of CNN. The feasibility of this technique was examined through simulation. For NN of deep learning, we used the CNN based on ResNet, as first introduced by He et al. in 2015 and placed in a top-5 accuracy network [27]. ResNet is a classification model that uses a very deep neural network. We can expect high accuracy in reality with a sufficient number of training data. The accuracy drops with the decrease of training data. To overcome this training difficulty and to make CNN applicable for our specific tasks, we use the PSF given in Eq. (1) to generate training data. For the training and test data, we generated 60,000 blurred images with known absorber depths. We prepared 10 kinds of original image shapes of absorber pattern rotated in 60 orientations, with depth of 0.1-10.0 mm (0.1 mm step). Each image was blurred by the convolution using Eq. (1) with specified depth. Training was made in a default value epoch using stochastic gradient descent with momentum for better optimization. The number of the data was 60,000. The typical numbers for the batch size, the learning rate, and the epoch were, respectively, 32, 10 −4 , and 10. We tried different numbers and found this condition was appropriate for computational time and the stability in our system. To reduce the learning rate gradually, we set the learning rate schedule as "piecewise" and shuffle every epoch that can make ResNet able to learn more representative features effectively.
These images were split randomly into two subsets: 70% for training and 30% for testing. The training was made on the same workstation, as described in Sec. 3.1. After training CNN, we fed test data that CNN had never been exposed to before. Then, we obtained the estimated depth as an output of the CNN. Figure 7 portrays examples of input images and output depths of the trained CNN. Figure 8 presents a comparison between the given and the estimated depths. Error bars show the mean and standard deviation of the estimated depths for 10 images. They agreed well within 2% average error up to 10 mm depth. From this result, we can expect the depth resolution about 1 mm and 2 mm for the 5 mm and 10 mm depth, respectively. The lateral resolution and the signal to noise ratio of the output image are close to those of the original image because of the high correlation coefficient between the original and the output images of FCN. This result suggests the feasibility of the depth estimation of an absorber in the blurred transillumination image using CNN.   Figure 9(e) shows the 3D image reconstructed using Fig. 9(d) and the depth distribution obtained from the trained CNN. This result suggests the feasibility of obtaining a clear 3D structure from the blurred 2D transillumination image.

Transillumination imaging system
Applicability of the proposed technique was examined in experiments. Figure 10 shows the outline of the transillumination imaging system. The light source was an array of 50 LEDs (810 nm wavelength, 50 × 1 mW optical power, OSLUX IR, PowerStar; EMSc UK Ltd.). A black painted Y-shape absorber (3.0 mm diameter, 75 mm height) was fixed in the rectangular acrylic container (40 × 100 × 60 mm 3 , internal size) filled with a turbid medium with tissue-equivalent optical parameters. Intralipos suspension (Otsuka Pharmaceutical Co. Ltd.) was mixed with pure water to produce the turbid medium (µ s ' = 1.0 /mm, µ a = 0.00536 /mm) [28][29][30][31]. The one side of the container was illuminated with the light source. A transillumination image was recorded with a cooled CCD camera (ORCA-R2 C10600; Hamamatsu Photonics KK) from another side of the container. The absorber depth was varied from 1.00 to 10.0 mm from the observation surface of the container using a mechanical translation system.

Suppression of background inhomogeneity
Training data for FCN and CNN were generated on the assumption that the light illumination was uniform over a sufficient area around the absorbing object. In practical transillumination imaging, this assumption can be hardly satisfied because of the finite size of the light source. Figure 11(a) shows a typical transillumination image obtained in the experiment with a bar-absorber in a turbid medium. The effect of the non-uniform illumination appears in the background of the absorber image. We can eliminate the effect of the non-uniform illumination by dividing the  In the experiment with a model phantom, it is not difficult to obtain the background image without the target absorber. However, in practical applications such as transillumination imaging of animal bodies, we cannot take a target absorber out of the body. Therefore, in such a case, we calculate the background image as a convolution of the light distribution at the illumination side of the turbid medium and the point spread function Eq. (1) with the depth of total thickness of the medium as where I b , I s , d, and t respectively denote the background light distribution, source light distribution at the illuminated surface of a turbid medium, depth of absorber in the turbid medium, and the turbid medium thickness. Because I s (x,y) and t are measurable at outside the body, the background light distribution I b (x,y) can be obtained irrespective of the target absorber at unknown depth in the body. The validity of this technique was tested using measured transillumination images. Figure 11 presents results of the background elimination. Figures 11(a)-11(e) respectively depict an observed transillumination image of a bar-shape absorber in a turbid medium, a measured background by extracting the absorber from the medium, a calculated background with Eq. (2), the result of image division (a)/(b), and the result of image division (a)/(c). Figure 12 presents a comparison of the intensity profiles along the central horizontal lines in Figs. 11(b) and 11(c), and Figs. 11(d) and 11(e). They agreed well. These results suggest that we can eliminate the effect of the inhomogeneous illumination by calculating the background image using the light source distribution and the outer thickness of the turbid medium. The calculation requires the reduced scattering coefficient and the absorption coefficient of the medium. These values are available from the literature or from separate measurement. The former value does not change much in normal physiological variation. The dependence of PSF on the latter value is much smaller than the former value. For the following experiments, this background elimination technique was applied to the measured transillumination images.

Deblurring by FCN
The applicability of the proposed technique to obtain a clear transillumination image from a blurred image using FCN was tested in experiments. In the container presented in Fig. 12, we placed an absorber made of 3-mm-diameter black plastic wire. For reference, after the container was filled with clear water, a transillumination image was taken. Then the water was replaced by the tissue-simulating turbid medium (µ s ' = 1.0 /mm, µ a = 0.00536 /mm). A transillumination image was taken. After the background elimination, the blurred image was fed to the FCN, which had been trained with the simulated images in the simulation described in Sec. 3.1. For additional analysis, output from the FCN was compared with the reference image taken through clear water. Figure 13 presents examples of these images. The effectiveness of this technique was evaluated using correlation analysis. Figure 14 presents correlation between the output image from FCN and the corresponding image through clear water for 30 absorbers. As the absorber depth increased, the transillumination image was more blurred. The image recovery became more difficult. However, with our technique using skip-connection of FCN, even at 10.0 mm depth, a correlation coefficient of more than 0.94 was attained. This result verified the applicability of FCN to deblur transillumination images as deep as 10.0 mm. For comparison, correlation of the output images from the FCN without skip connection was analyzed. The correlation coefficient decreased and its variation increased with the depth. The fluctuation in the variation at the depth more than 5 mm was irregular. The difference in the correlation coefficients with and without the skip connection was apparent. This result demonstrates the degree of the effectiveness of the skip connection.

Depth estimation by CNN
The applicability of the proposed technique to obtain the absorber depth from a 2D transillumination image using CNN was tested in experiments. Figure 15 shows the structure of the absorber with varying depth and its transillumination image. The background inhomogeneity was removed using Eq. (2). The blurred image was fed to the CNN trained with simulated images in the simulation described in Sec. 3.2. Figure 16 presents the result of depth estimation. As the depth increased, the estimation error increased. However, the average error was within 3.5%. High correlation between the given and estimated depths was confirmed. This result demonstrated the applicability of CNN to estimate the absorber depth as being at least as deep as 10 mm.

Clear 3D imaging from blurred 2D image
The applicability of the proposed technique to obtain clear 3D structure in a turbid medium from a blurred 2D transillumination image was tested through experimentation. Figure 17 presents the absorber structure in clear water, in turbid medium and the transillumination image. respectively. The absorber depth varied from one place to another. Figure 18 shows the 3D transillumination image obtained using the clear image from FCN and the depth distribution from CNN. This result verified the applicability of the proposed technique to obtain a clear 3D image from a single blurred 2D transillumination image.

Conclusions
To expand the usefulness of transillumination imaging through turbid medium with NIR light, a technique was developed to obtain a clear 3D image from a blurred 2D transillumination image. The severe blur caused by the turbid medium was clarified using FCN trained with 5,000 training images in deep learning. The absorber depth in a turbid medium was estimated using CNN in deep learning with 42,000 training pairs. The difficulty of obtaining numerous training data was solved using convolution with a point spread function derived from the diffusion approximation to the equation of transfer. The problem posed by inhomogeneous illumination was resolved through background elimination using measurable quantities from outside the turbid medium. The feasibility of the proposed technique was confirmed in simulation. Its validity was verified through experimentation. The effectiveness of the proposed technique was demonstrated for the absorbing structure at a depth from several to 10 mm in the tissue-simulated turbid medium with 40 mm thickness. There have been many attempts to sharpen blurred images, but few have been able to make transillumination imaging useful in medical practice. The poor flexibility for different depths of multiple targets in a turbid medium is one of the reasons. The proposed technique can solve this problem. The tradeoff of this technique compared with others is the requirements for a large number of training data and large computational power. However, they can be solved with the use of an appropriate PSF and current progress of computers. In this study, we examined the feasibility of deep learning to clarify the blurred image and to estimate the absorber depth with FCN and CNN. It would be useful if we can combine them into a single network. It is a future task to implement these functions into one.
Results suggest that this technique is useful to observe the subcutaneous structure of the blood vessel network and identify its depth distribution as deep as several millimeters. This technique only requires optic, not require complicated contact, ultrasound or others supplements, it can provide a new tool for the diagnosis of dermatology, various cancers, vascular diseases, and tissue metabolism. It can also step up the vein authentication from 2D to 3D. The pursuit of application of the proposed technique to animal tissue should be continued.