Deep convolutional neural network-based scatterer density and resolution estimators in optical coherence tomography

: We present deep convolutional neural network (DCNN)-based estimators of the tissue scatterer density (SD), lateral and axial resolutions, signal-to-noise ratio (SNR), and effective number of scatterers (ENS, the number of scatterers within a resolution volume). The estimators analyze the speckle pattern of an optical coherence tomography (OCT) image in estimating these parameters. The DCNN is trained by a large number (1,280,000) of image patches that are fully numerically generated in OCT imaging simulation. Numerical and experimental validations were performed. The numerical validation shows good estimation accuracy as the root mean square errors were 0.23%, 3.65%, 3.58%, 3.79%, and 6.15% for SD, lateral and axial resolutions, SNR, and ENS, respectively. The experimental validation using scattering phantoms (Intralipid emulsion) shows reasonable estimations. Namely, the estimated SDs were proportional to the Intralipid concentrations, and the average estimation errors of lateral and axial resolutions were 1.36% and 0.68%, respectively. The scatterer density estimator was also applied to an in vitro tumor cell spheroid, and a reduction in the scatterer density during cell necrosis was found.


Introduction
Speckle is an inevitable phenomenon in the image formation of optical coherence tomography (OCT) [1]. The speckle is an information carrier that conveys the properties of tissues, such as the sub-resolution distribution of the scatterers [2] and sub-resolution translation [3,4]. In addition, speckle carries properties of the imaging system itself, such as the optical resolution [1]. We refer to these properties as "fundamental parameters" in this paper. This information-carrying nature of the speckle motivates us to measure/estimate the fundamental parameters through the speckle patterns of an OCT image.
However, the relationship between these parameters and the speckle pattern is complicated, and it is thus not straightforward to estimate these parameters from the OCT image. This complexity comes from the image formation process. In principle, OCT imaging processes can be understood as a combination of two sequential processes. The first process is a forward process in which the tissue property is encoded into an interference signal. The second process is the backward process in which the interference signal is processed to form an OCT image, and the OCT image is then analyzed to estimate the fundamental parameters. The principle is exemplified by attenuation coefficient estimation [5][6][7], signal intensity estimators [8,9], and polarization parameter estimations of polarization sensitive OCT [10][11][12][13][14].
In general, the forward process is definitive and can be easily modeled, and also can be relatively easily numerically simulated [15][16][17]. On the other hand, the backward process is complicated and hard to be performed. Despite its ease of simulation, the forward process is a physical process and does not always need to be simulated. In contrast, despite the difficulty, the backward process should be numerically performed for accurate and quantitative OCT measurement.
Our fundamental idea is to use a deep convolutional neural network (DCNN) to solve the complicated backward process. In recent years, deep learning [18,19] has been demonstrated to be a powerful tool in the field of computer vision. Its application in OCT, such as in image segmentation [20][21][22][23], classification [24,25], and denoising [26,27] has been successful. The DCNN is one of the most established realizations of deep learning, which automatically learns a hierarchy of increasingly complex features related to the training data sets [28][29][30]. The DCNN is a highly nonlinear method and can therefore potentially solve the backward process.
In our approach, the DCNN takes a local (on the order of tens of micrometers) speckle pattern of an OCT image as an input and outputs fundamental parameters, including the scatterer density and optical resolution. Here the scatterer density denotes the number of scatterers per unit volume. However, the DCNN should be trained using a huge training data set, and it is not realistic to acquire such a huge data set in experiments. We here solve this issue to perform simple numerical simulations of the forward process. By employing a simple model of the OCT imaging, a huge number of local speckle patterns are created for several predefined resolutions and scatterer densities; i.e., the fundamental parameters. These sets of parameters and the generated speckle patterns are used to train the DCNN. Our strategy is therefore summarized as a backward process (i.e., estimation of parameters from a local speckle pattern) performed by the DCNN and the training of the DCNN using a fully numerically generated data set that is generated by simulating the forward process. These estimators are validated by numerically generated OCT images, experimentally obtained scattering phantom images, and in vitro tumor spheroid images.

Three-dimensional speckle pattern generator
We generate small volumetric OCT speckle patterns in a simulation based on a simple Fourier imaging model. A three-dimensional (3D) scatterer distribution map, S(x, y, z), is first generated stochastically to generate the speckle pattern. Here x and y are lateral positions and z is the depth position. The 3D scatterer distribution map is a complex 3D numerical array with 128 × 128 × 128 pixels, where the pixels with a scatterer (i.e., scatterer pixels) have an amplitude of unity and a random phase whereas the pixels without scatterer have zero values. The scatterer pixels are randomly selected such that the expectation of the scatterer density of the distribution map is a particular set value. The random phases of the scatterer pixels represent the randomly distributed sub-wavelength position of the scatterers. This 3D scatterer distribution map is then numerically convolved with a 3D complex point spread function (PSF), PSF(x, y, z), to yield a simulated complex OCT signal G(x, y, z) as where G(x, y, z) is the 3D speckle pattern and * indicates the 3D convolution. The convolution is computed by a Fast-Fourier-Transform-based method. The PSF is numerically generated as a 3D Gaussian function with the same physical dimension and pixel numbers as the scatterer density map, where ∆x and ∆y are the lateral resolutions defined as the 1/e 2 -width of the squared amplitude of the PSF. ∆z is the axial resolution defined as the full-width-half-maximum (FWHM) of the squared amplitude of the PSF. The lateral and axial resolutions are randomly selected, but two lateral resolutions ∆x and ∆y are selected to be identical. The axial and lateral resolutions are ones of our estimation target. Additionally, the magnitude of the PSF is randomly selected to simulate the uncontrolled probe beam power and unknown scattering intensity.
After computing the convolution [Eq. (1)], the 3D numerical field was down-sampled from 128 × 128 × 128 pixels to 16 × 16 × 16 pixels as keeping the original physical field size. After the down-sampling, we add complex Gaussian noise to each pixel to achieve a randomly but specifically selected signal-to-noise ratio (SNR) then taking the squared intensity of the 3D OCT speckle pattern.
The generated 3D speckle pattern covers a 31.2-µm (x) × 31.2-µm (y) × 115.8-µm (z) volume with 16 × 16 × 16 pixels. So, the physical size of a pixel is 1.95 µm (x) × 1.95 µm (y) × 7.24 µm (z), while that of the original 128 × 128 × 128-pixel field is 0.24 µm (x) × 0.24 µm (y) × 0.90 µm (z). The pixel separations of the down-sampled speckle field are selected to be identical to those in our experiment described in Section 2.4. The inputs for the training of the DCNN-based estimator described in the next section are 2D cross-sectional OCT images, which are the x-z cross-sections of the 3D numerically generated speckle pattern.

Speckle data set
The data set for DCNN training is generated by the 3D speckle pattern generator described in the previous section. The true parameters are randomly selected with specific ranges of 3 to 30 µm for the axial and lateral resolutions, 0 to 50 dB for SNR, and 0 to 0.0645 scatterers/µm 3 (0 to 1.775 scatterers/pixel 3 ) for the scatterer density. The maximum scatterer density corresponds to effective numbers of scatterers (ENS) of 33.7 and 76.7 for high-and low-NA objectives, respectively. Here the ENS is the number of scatterers within a 3D resolution volume, and see Section 2.4 for the details of the high-and low-NA objectives. The speckle patterns are normalized by intensity before being input to the DCNN, and the PSF magnitude is thus not a significant parameter. Each datum in the data set is the combination of the speckle pattern and the set fundamental parameters.
The training data set consists of 1,280,000 2D cross-sectional speckle patterns extracted from 80,000 3D speckle patterns. The validation data set, which is used to compute the loss during the training, comprises 160,000 2D speckle patterns extracted from 10,000 3D speckle patterns.

Network architecture and training
Our DCNN model architecture is shown in Fig. 1. The DCNN is input with the 2D cross-sectional speckle patterns, and it automatically learns the local features of the speckle pattern and estimates the underlying parameters. The whole network is subdivided into two parts, namely the feature extraction and the estimation parts.
The input to the feature extraction part is a 2D cross-sectional speckle image with a size of 16 × 16 pixels [input image in Fig. 1]. The input image intensity is normalized to a [0, 1] range before being input. The first convolution layer (CL1) convolves the image with 32 different filters having a kernel size of 2 × 2 to obtain a feature map (FM1; 16 × 16 pixels × 32 filters). Here, the rectified linear unit (ReLU) is used as an activation function. A max-pooling layer (MP1) with a stride of 2 is applied after the convolution, which fuses nearby spatial information to reduce the size of the feature map. The second convolution layer (CL2) produces the third feature map (FM3; 8 × 8 pixels × 64 filters) using 2 × 2-pixel kernels and ReLU activation. After applying a max-pooling layer with a stride of two (MP2), the third convolutional layer (CL3) generates the fifth feature map (FM5; 4 × 4 pixels × 128 filters). Finally, the third max-pooling layer (MP3) makes the final feature map (FM6; 2 × 2 pixels × 128 filters), and feeds the subsequent estimation part.
The estimation part consists of two fully connected layers of FC1 and FC2 which consist of 512 and 256 neurons, respectively. The final output layer consists of a single neuron and its output is an estimated fundamental parameter.
We train our neural network models to minimize the mean squared error (MSE) between ground-truth parameters and network outputs. The network is trained using an Adam optimizer [31] with a learning rate of 10 −4 . The batch size of the training was 32. Five independent networks with the same network architecture are trained for five fundamental parameters, namely the scatterer density, the lateral and axial resolutions, SNR, and ENS. It is noteworthy that the SNR is randomized and the signal strength is normalize, and hence, a DCNN might not use this information to estimate the scatterer density and ENS.
The models are constructed in Python 3.7 with Keras 2.3.1 based on the TensorFlow back end. The training of each network with a graphical processing unit (GPU; Nvidia GTX1080) takes about 5-6 min for each epoch. The training is terminated if the loss has not decreased for 10 continuous epochs, and the weights with the minimum loss are selected for the estimator. The minimum loss is obtained at the 30th, 28th, 27th, 15th and 28th epochs for the scatterer density, lateral and axial resolutions, SNR, and ENS, respectively.

Validation methods
The DCNN-based estimators are validated using numerically generated OCT signals, experimentally measured OCT images of the scattering phantom and an in vitro spheroid sample. The details are as following.

Numerical validation
We evaluated the estimation performances of the DCNN-based estimators using 100 numerically generated OCT speckle patterns. The speckle patterns are generated with the same method used to generate the training data set, but they are not included in the training or validation data set. Each speckle pattern was extracted from a different simulated volume. So, all speckle patterns are independent of each other and have different fundamental parameters. Here, the evaluation performances for the scatterer density, ENS, the lateral and axial resolutions, and SNR are evaluated. The true values are known for this evaluation, and we can thus obtain the estimation accuracies.

Validation by scattering phantom measurement
The DCNN-based estimators were validated experimentally by measuring scattering phantoms; the Intralipid emulsion. In the field of biomedical optics, 20% Intralipid emulsion (IL-20; I141, Sigma-Aldrich) is frequently used as an optical phantom to mimic the optical properties of tissue [32]. Six different dilutions were made by mixing the IL-20 with purified water to obtain concentrations of 1%, 2%, 4%, 6%, 8%, and 10%. Each dilution was then placed in a Petri dish. Black mending tape was attached at the bottom of the Petri dish to avoid strong specular reflection from the glass surface. Three samples were made for each concentration, i.e., 18 samples were made in total.

Human breast cancer spheroid
The scatterer density estimator was also applied to an in vitro sample. The sample was a human breast adenocarcinoma spheroid made from MCF7 cell line. The scatterers in the cell are cell nuclei and cell organelles, and we thus anticipated that the scatterer density was strongly related to tissue functions.
After 15-day cultivation, spheroids with a size of a few hundred micrometers had formed. The spheroid was extracted from the culturing environment and placed in a room-temperature culture medium without CO 2 supply. The cell might have been gradually dying because of a lack of nutrients. We performed OCT measurement every 2 hours up to 28 hours. Note that this experiment was originally performed for the study described in Ref. [36], and the same data set was used in the present study.

OCT setup and measurement protocol
Swept-source OCT with a 1.310 µm wavelength probe beam and a measurement speed of 50,000 A-lines/s was conducted in the experiments [37]. Although the system was polarization-sensitive, we did not use its polarization functionality. The OCT image used in this study is the coherence composition of multiple polarization channels (Section 2.3.1 of Ref. [37] and Section 3.8 of Ref. [38]), which is almost equivalent to a standard OCT image.
Two objectives were used in this study. One has an effective focal length of 18 mm (LSM02, Thorlabs Inc., NJ). The beam diameter incident on the objective is 3.49 mm, and the effective numerical aperture (NA) was thus 0.097. This resulted in a diffraction-limit lateral resolution (spot size) [39] of 8.6 µm, whereas the lateral resolution numerically simulated by OpticStudio (Zemax) is 8.9 µm. We refer to this objective as a high-NA objective. The other objective, a low-NA objective (LSM03, Thorlabs Inc., NJ) had an effective focal length of 36 mm. The effective NA of 0.048 results in a diffraction-limit lateral resolution is 17.2 µm, whereas the numerically simulated resolution is 18.1 µm.
Each B-scan comprises 512 A-lines covering a lateral scanning range of 1 mm, and the lateral pixel separation is 1.95 µm. The axial resolution of our system is 14.1 µm and the axial pixel separation is 7.24 µm in tissue. These axial and lateral pixel separations are identical to those of the numerically generated speckle patterns (Section 2.1).
The measurements were performed for several values of probe beam attenuation, namely 0 dB, −5.4 dB, and −11.76 dB (for round trip). The probe power on the sample was 12 mW with 0-dB attenuation.

Numerical validation
All DCNN-based parameter estimators show high consistency between the set and estimated parameters as shown in Fig. 2. Figure 2(a)-(e) are for the scatterer density, ENS, lateral and axial resolutions, and SNR, respectively. The horizontal axes represent the set parameters whereas the vertical axes show the estimates. The root mean square errors (RMSEs) of the estimations are 0.113 scatterers/µm 3 for the scatterer density, 3.06 scatterers for the ENS, 1.78 µm for the lateral, 1.88 µm for the axial resolutions, and 1.82 dB for the SNR. These RMSEs correspond to 0.23%, 6.15%, 3.58%, 3.79%, and 3.65% of the estimation ranges, respectively. Here, the estimation ranges are the ranges of each parameter in the speckle generation. R 2 values of a model "estimate = true value" are 0.955 for the scatterer density, 0.985 for the ENS, 0.947 for the lateral resolution, 0.951 for the axial resolution, and 0.985 for the SNR. The estimators therefore give the reasonable estimates of the parameters.

Scattering phantom
3.2.1. Scatterer density estimation Figure 3 shows the example OCT B-scans of the scattering phantom with the concentration of 0.227%(v/v), where Figs. 3(a) and 3(b) are obtained with the high-and low-NA objectives, respectively. We process these images using the scatterer density estimator having a sliding window of 16 × 16 pixels and computed the average scatterer density within the Intralipid region. Figure 4 shows the average scatterer density at each scatterer concentrations for each of three probe-beam attenuations. Figures 4(a) and 4(b) show results for the high-NA and low-NA objectives, respectively. Three phantoms were measured for each concentration, and the error bars in the plots represent the standard deviations among the three measurements of the three samples. It is found that the results are consistent for the high-and low-NA objectives. It is also noteworthy that the estimations are not affected by the probe beam attenuation, i.e., the probe power. This suggests that the estimator does not estimate the scatterer density from the OCT signal strength.
We can compute the scatterer size from these slopes as discussed in detail in Section 4.1. The intercepts of plots are found not to be zero, although the ideal intercept is zero. The intercepts To evaluate the depth dependency of the scatterer density estimation, the estimated scatterer densities are averaged along the lateral direction (red lines in Fig. 5). Here, the data correspond to those in Fig. 3 and the concentration is 0.227%(v/v). The blue lines indicate the average OCT intensity and the region between the dashed lines is the region of the Intralipid. The strong OCT  It is found that the estimated scatter densities are almost constant along the depth, and they are consistent among the NAs. It is also found that the estimated scatterer densities are independent of the average OCT intensity. The mean scatterer densities in the Intralipid region are 1.37 scatterers/µm 3 and 1.35 scatterers/µm 3 for the high-and low-NA objectives, respectively, i.e., the difference is only 1.5%.
Attenuation coefficients or the signal attenuation rate along the depth are frequently used to evaluate the scattering property of a sample. The attenuation rates of these data are computed through linear regression of the OCT intensity as −24.8 dB/mm (for the high-NA) and −6.8 dB/mm (for the low-NA). This relatively large inconsistency (8%) is accounted in part by the different focus positions and the different depth-of-focuses of the two measurements. Figure 6 shows the resolution estimation results of the scattering phantom. The data correspond to those of Fig. 3. The blue regions indicate the region of Intralipid. The average estimates of the lateral resolution over the Intralipid region were 9.6 µm and 18.8 µm for the high-and low-NA objectives, respectively [ Fig. 6(a) and 6(b)]. These estimates are close to the numerically simulated in-focus lateral resolutions of 8.9 µm (high NA) and 18.1 µm (low NA). It is also noteworthy that the lateral resolution of the high-NA objective shows significant depth dependency. This can be accounted for by the short depth of focus (88.7 µm in theory and 95.0 µm by Zemax simulation, in air), whereas the depth of focus of the low-NA objective is relatively long (355.0 µm in theory and 392.8 µm by Zemax simulation, in air).

Resolution estimation
The average axial resolution estimates are 14.1 µm and 13.9 µm, respectively for the high-and low-NA objectives [ Fig. 6(c) and 6(d)]. These estimates are close to the true depth resolution of 14.1 µm.

Time-course evaluation of a tumor spheroid
The scatterer density estimator is applied to the time-lapse images of an in vitro human breast cancer spheroid measured up to 28 hours after the extraction from the cultivation environment (Fig. 7). The first and second rows [ Fig. 7(a) and (b)] show the conventional OCT and scatterer density image, respectively. It is noted that the estimation window is relatively large, i.e., 31.2 µm (lateral) × 115.8 µm (depth), in comparison with the spheroid size (around 400 µm in diameter), and the estimates at the spheroid periphery are thus not exactly reliable. Figures 7(c) and 7(d) show the tissue dynamics images [36] for reference. Figure 7(c) presents the "logarithmic intensity variance" (LIV), which is primarily sensitive to the magnitude of the signal intensity fluctuation, i.e., it is sensitive to the magnitude of the intracellular motility. Figure 7(d) presents the "OCT correlation decay speed at early delay time" (OCDS e ), which reflects the speed of the signal decorrelation. OCDS e has a large value if there is a rapid motion in the cell. These images are reprinted from our previous publication [36].  Meanwhile, the region with high scatterer density (green) rapidly becomes small over the first 8 hours [ Fig. 7(b)]. Additionally, the average scatterer density [ Fig. 8(b)] rapidly reduces over the first 8 hours. The low LIV region (red) and high OCDS e region (yellow to green) became large with time [ Fig. 7(c) and (d)]. Additionally, the average LIV gradually reduces with time [ Fig. 8(c)]], but this reduction is slower than that of the scatterer density [ Fig. 8(b)].
The low LIV and high OCDS e are thought to indicate cell necrosis [36,40]. The main scatterers in the cell are cell nuclei and cell organelles, and the rapid reduction of the scatterer density therefore suggests the destruction of nuclei and organelles during necrosis.

Scatterer diameter of the Intralipid
The scatterers of the Intralipid are the particles of soybean oil and egg lecithin [33][34][35]. Assuming the scatterer particles are spherical, the scatterer density (the number of scatterers per unit volume) of the Intralipid dilution can be computed as where σ is volume concentration of the scattering medium and d is the diameter of the scatterer particle. Using this equation, the diameter of scatterer particle can be computed from the slope of the relation between the scatterer density and σ as In our Intralipid phantom measurements (Section 3.2.1), the slope of the relation between the scatterer density and σ is found to be 0.5533 ± 0.0086 µm −3 /%(v/v), or equivalently 55.33 ± 0.86 µm −3 /(v/v ratio) (mean ± standard deviation over three attenuations and two NAs). The diameter of the scatterer particle is thus estimated to be 325.6 nm. Previously reported scatterer particle diameters of Intralipid vary in the literature; e.g., 25 to 473.9 nm [35], 50 to 400 nm with 225.7-nm mean diameter [33], and 360 nm [34]. The diameter estimated in the present study is well within the range of reported diameters.

Benefits of numerically synthesized training data set
The present DCNN based estimators were trained by a numerically synthesized data set. The numerical generation of the training data set has several advantages against experimentally collecting a training data set.
At first, a massive amount of training data can be obtained with negligible cost. In the present study, 1,280,000 2D speckle patterns extracted from 80,000 3D synthesized OCT volumes were used for training (Section 2.2.1).
Second, the training data can have high diversity of the ground-truth fundamental parameter values. For instance, all the 80,000 synthesized OCT volumes were generated with different fundamental parameters in the present study. It might make the DCNN-based estimator robust. If we achieve the same diversity of the ground-truth parameters with an experimental data set, 80,000 accurately fabricated phantoms are required, and it is not realistic.
Third, some of the fundamental parameters are not really controllable in the experiment. For example, the lateral and axial resolutions are defined by the OCT device. And using massive amounts of OCT devices to prepare the training data set with a variety of resolutions is not realistic.
Finally, it is noteworthy that some of the fundamental parameters are hard to be accurately controlled in the experiment. As pointed out by Kübler et al., the same recipe of the phantom does not always give the same scattering property [41]. In addition, the resolutions are affected by the aberrations, dispersion, wavelength-dependent absorption, and so on. And hence, it is hard to control or know the accurate resolutions in the experiment.

Rationality of the estimation of the signal-strength-independent scatterer density
The speckle patterns and OCT images are normalized by intensity before being input to the DCNN-based estimator, and the estimator therefore cannot use the overall signal strength to estimate the scatterer density. This is consistent with the results shown in Fig. 4, where the scatterer density estimates are not sensitive to the probe beam attenuation.
This signal-strength-independent estimation process is reasonable for the following reasons. First, Hillman et al. showed that the contrast of a local speckle pattern has a unique and monotonical relationship with the ENS, i.e., the number of scatterers per coherence volume that is an alias of the 3D resolution [2]. This suggests that the ENS can be estimated from the speckle contrast in principle.
In addition, if we know the size of the coherence volume, we can estimate the scatterer density (i.e., the number of scatterers per unit volume) from the ENS. The size of the coherence volume can be estimated if the axial and lateral resolutions are known. Kurokawa et al. showed that these resolutions can be estimated through the spatial correlation analysis of a local speckle pattern [3].
That is to say, we can estimate both the ENS and the coherence volume size from the local speckle pattern. Hence, estimation of the scatterer density without information of the overall signal strength is possible in principle.
In the present study, the DCNN-based estimators successfully estimate the ENS and the resolutions (Sections 3.2.1 and 3.2.2), which further supports the rationality of the estimation.
The forgoing discussion gives the idea that the scatterer density can be computed by combining the speckle contrast analysis of Hillman and the correlation analysis of Kurokawa. However, it is difficult to accurately compute the speckle contrast and correlation because of noise in the speckle pattern. One strength of our DCNN-based estimator is that it can be trained to be robust against noise by adding random noise to the speckle patterns of the training data set.

Limitation of the current estimators
The current estimation framework has limitations. First, the speckle generator does not consider the size of the scatterers. The speckle generator uses a 3D numerical field in which the pixel size is 0.24 µm × 0.24 µm × 0.90 µm (lateral × lateral × depth), and a single scatterer correspond to a single pixel. This limited accuracy of the scatterer representation can reduce the accuracy of the estimation.
The second limitation is that the speckle generator does not account for depth-dependent signal attenuation. That is to say, the signals in the deeper region are not affected by the scattering or absorption in the superior region. Although we expect this effect to be small as the depth of the generated speckle pattern is only 115.8 µm, the estimator accuracy could be further improved by taking this effect into account.
Third, the sensitivity roll-off, depth-dependent variation of noise floor, and confocal effect [42] were not accounted for in our speckle generator. These factors can affect the estimation accuracy as it alters the SNR within the estimation window. However, for our small estimation window whose depth size is 115.8 µm, the effects of these factors might be limited. The estimation accuracy can be further improved by accounting these factors in the speckle generation process.
Fourth, in the speckle pattern generator, the scatterers are assumed to be static. Therefore, the jiggle of the Intralipid scatters due to Brownian motion is not accounted for. In addition to the above two limitations, this discrepancy would partially explain the non-zero intercept in Fig. 4.
Fifth, our study only considers fully developed speckle patterns and does not account for partially developed speckles. Although this might limit the accuracy and the applicability of the estimators, we believe our current estimators are still reasonable. It is because several previous studies of OCT are based on the assumption of fully developed speckles, and they have given reasonable results [1,2,43].
We expect further improvement of the speckle generator will increase the accuracy of the estimators.

Effects of wavelength dispersion and polarization mode dispersion
In addition to the factors discussed in Section 4.4, wavelength dispersion and polarization mode dispersion (PMD) are also not considered in our speckle generator. It is known that the wavelength dispersion can be corrected numerically with image based metrics [44,45]. So, we can minimize its undesirable effect by applying the numerical dispersion correction to the OCT image before it is input to the estimators.
The PMD is known to deteriorate the OCT signal if the device is equipped with a very long optical fiber and/or improper optical components such as some types of circulators. And its undesirable effect is significant particularly for endoscopic polarization sensitive (PS-) OCT [46]. On the other hand, its effect is not significant for non-PS-OCT and non-endoscopic PS-OCT as far as correct optical components are used. And hence, the present study might not be significantly affected by the PMD. In order to apply our estimators to OCT with significant PMD, one of the several PMD correction methods [47][48][49][50] can be applied prior to the estimation.

Conclusion
We demonstrated DCNN-based estimators for the scatterer density, ENS, lateral and axial resolution, and SNR. The DCNN is trained using fully numerically generated OCT images (i.e., speckle patterns) and was therefore trained easily using an extremely large training data set comprising 1,280,000 images. Numerical validation showed the good performance of the estimators. Additionally, validation with the scattering phantoms showed the feasibility of the estimators in experiments. The scatterer density estimator was also applied to an in vitro tumor spheroid, and the reduction of the scatterer density during cell necrosis was visualized.
In conclusion, the DCNN-based estimators successfully extract fundamental parameters from a local speckle pattern. The scatterer density estimator can be used to quantify a cell microstructure smaller than the OCT resolution. Furthermore, the resolution estimators can be used as optimization metrics of image-based adaptive optics or computational aberration correction and also for the calibration of OCT systems, such as in highly accurate measurements of the attenuation coefficient.