Optical-coherence-tomography-based deep-learning scatterer-density estimator using physically accurate noise model

We demonstrate a deep-learning-based scatterer density estimator (SDE) that processes local speckle patterns of optical coherence tomography (OCT) images and estimates the scatterer density behind each speckle pattern. The SDE is trained using large quantities of numerically simulated OCT images and their associated scatterer densities. The numerical simulation uses a noise model that incorporates the spatial properties of three types of noise, i.e., shot noise, relative-intensity noise, and non-optical noise. The SDE’s performance was evaluated numerically and experimentally using two types of scattering phantom and in vitro tumor spheroids. The results confirmed that the SDE estimates scatterer densities accurately. The estimation accuracy improved significantly when compared with our previous deep-learning-based SDE, which was trained using numerical speckle patterns generated from a noise model that did not account for the spatial properties of noise.


Introduction
Optical coherence tomography (OCT) is a non-invasive imaging modality that is used to provide high-resolution structural information about biological tissues [1][2][3].The anatomical images provided by OCT are used widely in clinical diagnosis [4].
In addition to anatomical investigations, OCT-based assessments of the optical properties of tissue have also been studied and have provided useful biomarkers.One commonly investigated optical property of biological tissues is the attenuation coefficient (AC) [5,6], and the AC is considered to be related to the tissue density.AC measurements are useful in a wide variety of applications, including investigation of tumor spheroid necrosis [7,8] and distinguishing between normal and cancerous tissues [9][10][11].However, AC measurements are strongly influenced by the measurement configuration and conditions, which including the system confocality, the depth position of the focus, and the presence of aberrations [12][13][14], and these factors can limit the accuracy and reliability of the AC measurements.Although several methods have been proposed to compensate for these effects, they generally require hard-wired assumptions to be made [12] or multiple measurements to be performed [15].
Rather than use the AC to assess biological tissue density, we have proposed a deep-learning based method that estimates the scatterer density of the tissue directly [16][17][18].In this work, we denote this method as the scatterer density estimator (SDE).The SDE analyzes the local spatial patterns of an OCT intensity image, i.e., the speckle pattern, using a convolutional neural network (CNN) and then estimates the scatterer density.The CNN was trained using fully numerically simulated OCT speckle patterns that were generated by a simple scalar-optics-based OCT simulator.This approach provides significant amounts of training data, and the data sets reflect the variety of the parameters involved in OCT imaging, including the resolution, the signal-to-noise ratio (SNR), and the sample scatterer density.
In our previous study [18], we noted that the CNN can estimate two primary quantities: the speckle contrast and the resolution.Hillman et al. showed that the local speckle contrast of OCT has a monotonic and negative relationship with the number of scatterers contained with in a three-dimensional (3D) resolution volume, which is designated the effective number of scatterers (ENS) [19].In addition, Kurokawa et al. demonstrated that the axial and lateral resolutions of OCT can be estimated based on a local speckle pattern [20].Therefore, we have anticipated that the CNN will primarily estimate the speckle contrast and the resolutions, and have then estimated the scatterer density based on these quantities.
This previous SDE has provided promising results, but its scatterer density estimation accuracy was limited when it was applied to real experimental data, particularly if the scatterer density or the SNR is low [18].By considering the possible mechanism for the estimator described above, we hypothesized that the low estimation accuracy for both the speckle contrasts and the resolutions may be limiting factors for the scatterer density.In addition, the estimation accuracy of the scatterer density falls when the SNR is low, i.e., in cases in which the noise forms a significant component of the OCT image.As a result, we hypothesized that our simulation-generated training data (i.e., the speckle patterns) did not actually reflect the physical properties of the noise accurately.This inaccuracy in the noise modeling process may have then caused the inaccuracies in both the speckle contrast and the resolution estimations.
In this paper, we introduce a new physically realistic noise model and demonstrate improved accuracy in the SDE.This new noise model incorporates the spatial properties of the OCT noises including shot noise, relative intensity noise (RIN), and non-optical noise.The accuracy of the SDE is then validated using numerically generated OCT images, along with OCT images of two types of scattering phantoms and in vitro tumor spheroids.Our results demonstrate significant improvement in the scatterer density estimation accuracy, particularly under low SNR conditions.

Neural network-based scatterer density estimator
Our CNN-based SDE estimates the scatterer density from a small spatial pattern of an OCT image, i.e., from a local speckle pattern with an image size of 16 × 16 pixels.To train the CNN model, we used numerical speckle patterns that were generated by a simple scalar-optics-based OCT simulator, which is designated a "speckle generator."The speckle generator generates a speckle pattern from an arbitrary parameter set that consists of the scatterer density, the axial and lateral resolutions, and the SNR.The speckle pattern is generated by convolving a 3D scatterer distribution map with a 3D complex point spread function and subsequently adding complex Gaussian noise.The details of the speckle generator are presented in Section 3.1.
In our previous study, the noise was assumed to be fully spatially decorrelated.In other words, the noise signals at the individual pixels are assumed to be fully independent of each other.In the new SDE, we introduce a more physically accurate noise model that accounts for the spatial correlation properties of the noise, as will be described in the next section.

Noise Model
Similar to our previous noise model, the new noise model has a complex Gaussian distribution, i.e., both the real and imaginary parts of the noise are normally distributed with a zero mean.However, our new noise model accounts for the different spatial correlation properties of the three types of OCT noise, comprising shot noise, RIN, and non-optical noise.These three noise types have different spatial extents, and these differences originated from the different dependencies of these noise types on the light source spectrum.
As elucidated in prior works [21,22], the energies of shot noise and RIN are proportional to light intensity and the square of light intensity, respectively.Consequently, the amplitudes of shot noise and RIN are proportional to the square root and the first-root of light intensity, respectively.Note that the first root of light intensity is the light intensity itself.Conversely, both the energy and amplitude of non-optical noise remain invariant with respect to changes in light intensity.Thus, the amplitudes of shot noise and RIN are proportional to the square root and the first-root of the light intensity, respectively.Note that the first-root of the light power is the light power itself.On the other hand, both the energy and amplitude of non-optical noise are independent of light intensity.And hence, the amplitudes of the shot noise and the RIN are proportional to the square-root and the first-root of the light intensity, respectively.Note that the first-root of the light power is the light power itself.On the other hand, both the energy and amplitude of non-optical noise is independent from the light intensity.And hence, when the light source spectral intensity is denoted by (), the noise within the wavenumber domain can be described as where  is the wavenumber, and  sh ,  RIN , and  no are normally distributed random variables that correspond to the shot noise, the RIN, and the non-optical noise, respectively.All these variables have a zero mean and the same standard deviations, but they are independent of each other.In addition, the constants 1, 2, and 3 are proportionality constants.
As the equation shows, the amplitude expectation of the shot noise (i.e., the first term on the right-hand side) is a function of , and is proportional to the square root of the spectral intensity at each value of .This is because the shot noise energy, which is the squared power of the amplitude, is proportional to the spectral intensity.Similarly, the amplitude expectation of the RIN (i.e., the second term) is also a function of , and is also proportional to the spectral intensity.This corresponds to the fact that the RIN energy is proportional to the squared power of the spectral intensity.In contrast, the non-optical noise (i.e., the third term) is independent of the light source spectrum and its amplitude expectation remains constant over the range of .
The spectral-intensity dependence of the noise causes a spatial correlation of the noise along the depth direction.The noise in the depth domain can be determined by taking the Fourier transform of Eq. (1) as follows where  is the depth and is a Fourier pair of .F −1 [ ] denotes an inverse Fourier transform and  ′ sh (),  ′ RIN (), and  ′ no () are the Fourier transforms of  sh (),  RIN (), and  no (), respectively.Because  sh (),  RIN (), and  no () are all normally distributed Gaussian noise,  ′ sh (),  ′ RIN (), and  ′ no () become normally distributed complex Gaussian noise.As the equation shows, the shot noise is convolved with the Fourier transform of √︁ (), and the result is the spatial correlation of the shot noise along the depth direction.Similarly, the RIN is convolved with the Fourier transform of () and this operation also results in a spatial correlation along the depth direction.
Notably, because √︁ () can be wider than (), F −1 √︁ () () can then be narrower than F −1 [()] ().This suggests that the spatial correlation distance of the shot noise is shorter than that for the RIN.In contrast to the shot noise and the RIN, i.e., the optical noise types, the non-optical noise (the third term on the right-hand side) shows no spatial correlation.The spatial correlation distances for the three types of noise contained in the new noise model and in the old noise model used in our previous SDE [18] are summarized in Table 1, in addition to the correlation distance of the OCT signal.
Table 1.Summary of spatial correlation distance characteristics of three noise types and the OCT signal.In the new noise model, each type of noise has different spatial correlation properties, whereas these properties [18] are identical for all noise types in the old noise model.In this study, we consider scanning OCT, rather than a full-field OCT.Because each A-line of the scanning OCT is acquired at a different time, none of the three noise types have spatial correlation along the lateral direction.We expect that the different spatial properties of the noises and the OCT signal will be used by the CNN to estimate the speckle contrast and the resolutions accurately.This will then enable more accurate estimation of the scatterer density than our old CNN model, as will be shown in Section 4.

Speckle generation and CNN-model training
The first step in the CNN model training process is the generation of the OCT speckle patterns, i.e., the training datasets.This step is identical to that described in Section 2.1 of [18], with the exception of the noise model.Because the details of this numerical speckle pattern generation process have been described elsewhere, we describe the process only briefly here.In this process, we first generate the 3D numerical fields using randomly distributed scatterers [(a) in Fig. 1].Here, the field has a size of 128 × 128 × 128 pixels (31.2 m × 31.2 m for the lateral directions and 115.8 m for the axial direction), and the pixels with a scatterer have an amplitude of unity but a random phase.This random phase represents the sub-wavelength depth position of the scatterer.The pixels without a scatterer have zero values.The 3D field of scatterers is then convolved with a 3D complex point spread function of the OCT [(b) in Fig. 1], which is assumed to have a 3D Gaussian distribution computed from the following equation: where Δ and Δ are the lateral resolutions defined as the 1/ 2 -width of the squared amplitude of PSF, and Δ is the axial resolution defined as the full-width-half-maximum (FWHM) of the  Initially, a 3D random scattering field is generated (a), which is then convolved with a 3D Gaussian PSF that is computed from Eq. ( 3).The resulting high-resolution speckle pattern is downsampled (c).Subsequently, three types of noise, i.e., shot noise, RIN, and non-optical noise, are added to obtain the final OCT speckle pattern which is used for training the CNN.squared amplitude.The two lateral resolutions are assumed to be identical.The resolutions are randomly selected as described in the next paragraph.The 3D numerical field is then down-sampled from 128 × 128 × 128 pixels to 16 × 16 × 16 pixels, in keeping with the original physical field size [(c) in Fig. 1].The pixel size after down-sampling is then 1.95 m for the lateral directions and 7.24 m for the axial direction.After down-sampling of the field, we added the shot noise, the RIN, and the non-optical noise which follow the noise model described in Section 2.2 [(d) in Fig. 1].
To train the CNN model, we generated 80,000 3D speckle patterns and then extracted 1,280,000 2D cross-sectional speckle patterns with dimensions of 16 × 16 pixels.Each 3D speckle pattern was generated with different and randomly selected values for the scatterer density, the resolutions, and the noise energies of the shot noise, the RIN, and the non-optical noise.The axial and lateral resolutions here are independent of each other and range from 3 to 30 m.The noise energies for each type of noise are randomly selected as the SNR with respect to each noise type in the range from 0 to 100 dB, where 0 dB indicates a scenario in which the signal power equals the noise power.The scatterer densities range from 0 to 0.2387 scatterers/m 3 (0 to 6.5723 scatterers/pix 3 ).Note that here we have randomized the resolutions instead of using the precise resolutions of a specific OCT system.This approach was adopted because practical measurements can exhibit varying resolutions due to uncontrolled factors such as defocus, aberrations, and etc.
The CNN consists of three convolutional and max-pooling layer pairs followed by two fully connected layers; this structure is identical to that described in Section 2.2.2 of Ref. [18].The model was trained to minimize the mean squared error (MSE) between the ground-truth scatterer density values and the network outputs by using the Adam optimizer [23] at a learning rate of 10 −4 .The batch size of the training was set at 32.The Input image was represented on a linear scale (i.e., not dB-scale) and was normalized into a [0, 1] range before being input into the model.The validation set used for the training consists of 100 2D speckle patterns that were extracted from 100 3D speckle patterns; these patterns were generated independently from the training dataset.

Evaluation method
The performances of the SDE were validated both numerically and experimentally.The numerical validation used numerically generated OCT speckle patterns, whereas the experimental validations used two types of scattering phantom (comprising Intralipid phantoms and microsphere phantoms) and in vitro tumor spheroid samples.For comparison, we also evaluated our previous SDE [18], which is identical to the new SDE with the exception of the noise model used to generate the training dataset.The process details are described in the following sections.

Numerical validation
For the numerical validation, we used 100 2D OCT speckle patterns that were generated using the same method that was used to generate the training dataset.Each of these 100 2D speckle patterns was extracted from different 3D speckle patterns.Therefore, all the 2D speckle patterns are independent of each other and are based on different resolutions, SNR values, and scatterer densities.Note that the noise model used for this generation process is the physically accurate noise model that was described in Section 2.2.Because we know the true density, we can determine the estimation accuracy directly via this numerical validation process.

Validation using scattering phantoms
For the phantom-based experimental validations, we used two scattering phantom types, i.e., Intralipid phantoms and microsphere phantoms.
The Intralipid phantoms are composed of Intralipid solutions with various concentrations of 1%, 2%, 4%, 6%, 8%, and 10% (v/v concentration).These Intralipid solutions are made from a 20% stock solution of Intralipid (IL-20, Sigma-Aldrich I141).Three phantoms were fabricated for each concentration, giving a total of 18 phantoms.Note that these phantoms are identical to the phantoms used in our previous study [18], and the same measurement datasets that were used in the previous study were also reused in this study.
The microsphere phantoms are formed using a mixture of polystyrene microspheres and agarose gel.When compared with the Intralipid droplets, the polystyrene microspheres have well controlled and known sizes and refractive index values, and thus these microspheres are frequently used as phantoms in the biomedical optics field [24].In our study, the microsphere-based phantom is particularly useful because we can compute the true scatterer densities from the product-specific particle concentrations and diameter in each case.
The mixture was poured into an acrylic container with a thickness of 1 mm.The container was then refrigerated for 1 h at 5 ℃.Note that these phantoms were fabricated by following the protocol described in [25].Three phantoms were made with each concentration, and thus 18 phantoms were fabricated in total.
Both the Intralipid phantoms and the microsphere phantoms were measured using a sweptsource OCT with a probe beam wavelength of 1.3 m.(The OCT system is described in detail in Section 3.2.5.)For the measurements, we intentionally attenuated the probe beam by applying a variable neutral density (ND) filter to alter the SNR.Each phantom was measured with three different round-trip attenuations, which were 0, -5.4, and -11.8 dB for the Intralipid phantoms, and 0, -5.6, and -12.0 dB for the microsphere phantoms.The probe power on the sample was 12 mW with 0-dB attenuation.

Human breast cancer spheroid
Tumor spheroids composed of human breast adenocarcinoma (MCF-7 cell line) were used to evaluate the SDEs from a biomedical application perspective.The cells were cultured for 15 days and spheroids with a size of approximately 500 m were formed.The scatterers within the cells are believed to consist of cell nuclei and organelles.For the measurements, each spheroid was extracted from the culturing environment and then placed in a room-temperature culture medium without CO 2 supply.
We performed two types of measurements.The first type involves longitudinal hourly measurements for up to 28 h, where the measurements were performed using a 2D cross-sectional scan protocol.Each cross-sectional image (i.e., B-scan) consists of 512 A-lines.Note that this experiment was performed initially for the study described in Ref. [26], and that the same data set was used in our previous SDE study [18].
The second type is a 3D measurement taken at two longitudinal time points of 0 h and 20 h.A 3D datasets consists of 512 × 128 A-lines.This measurement was originally performed for the study detailed in Ref. [27].

Statistical analysis
For the numerical validation and the polystyrene microsphere phantom experiment, the agreement between the estimates and the ground truth was evaluated statistically via intraclass correlation (ICC) [28].ICC is a statistical measure employed to evaluate the level of agreement between two datasets, typically denoted as  and  .It assesses the extent to which these datasets conform to the  =  line, examining both their linear relationship and equality.And a higher (i.e., closer to 1.0) ICC indicates a better estimate.This statistical analysis was performed using the intraclass_corr( ) function in the statistics library of Python (Pingouin 0.5.3) in Python 3.7.

OCT devices and measurement protocol
A polarization-sensitive Jones matrix swept-source OCT system [29,30] was used to perform the experimental validations.The probe wavelength was 1.3 m and the measurement speed was 50,000 A-lines/s.The axial optical resolution and the pixel separation were 14.1 m and 7.24 m (both in the tissue), respectively, while the lateral optical resolution and the pixel separation were 18.1 m and 1.95 m, respectively.Note that the pixel separations were identical to those of the numerically generated speckle patterns.The OCT image used in this study is a coherent composition of multiple polarization channels that is nearly identical to a standard non-polarization-sensitive OCT image.

Numerical validation
Figure 2 shows the numerical validation results, where the estimated scatterer densities are plotted versus the set (ground truth) scatterer densities.The SDE that was trained using the physically accurate noise model (the new SDE, red circles) shows high consistency between the set and estimated scatterer densities.In contrast, the previous SDE [18], which was trained using a spatially-uncorrelated noise model (the old SDE, blue triangles), shows higher variation among the estimates, and the estimated values are also downshifted significantly from the ground truth.The ICCs between the estimates and the ground truth were computed to be 0.975 for the new SDE and 0.722 for the old SDE.The higher ICC of the new SDE is a quantitative demonstration of the superior performance of the new SDE.

Intralipid phantom
The results of the Intralipid phantom validation are summarized in Fig. 3.In this figure, the estimated scatterer densities are plotted versus the volume concentrations.The circles and triangles represent the estimates from the new and old SDEs, respectively.The colors of the plots indicate the probe-beam attenuations in each case.Specifically, each color corresponds to a different SNR.It should be noted that each cross-sectional OCT B-scan is processed by each estimator using a sliding kernel with a size of 16 × 16 pixels to obtain a scatterer density image.The mean scatterer density of the scattering phantom region is then computed.Since there are three samples, three mean scatterer densities were obtained under each measurement condition.These three mean scatterer densities were further averaged and plotted.The error bars represent the standard deviations among the three measurements of the three phantoms and the lines are linear regression lines that were computed from the data.
Both SDEs showed reasonably small variations (standard deviations) among the phantoms, and shoewd highly linear relationships between the estimates (on the vertical axis) and the volume concentrations (on the horizontal axis).In addition, the estimated values were not sensitive to probe attenuation.Specifically, both SDEs were unaffected by the SNR of the OCT image.However, the intercept of the old SDE was as high as 0.05 m −3 at the 0% volume concentration, whereas that of the new SDE was close to zero (0.01 m −3 ).Because the scatterer density at the 0% volume concentration might be 0 m −3 , we can conclude that the new SDE provides significantly better estimation accuracy when compared with the old SDE.Fig. 3. Scatterer density estimation results for Intralipid phantoms with several volume concentrations and several probe-beam attenuations.The circles and the triangles represent the estimates obtained when using the new and old SDEs, respectively.The colors represent the probe beam attenuations that correspond to the SNR of the OCT image.The error bar shows the standard deviation for three measurements of the three phantoms.Both SDEs show high repeatability (i.e., a small standard deviation in each case) and consistent estimates among the different probe-beam attenuations.However, the old SDE shows a significantly large intercept that indicates the low fidelity of the old SDE, particularly at low concentrations.The lines are linear regression lines computed from the data.

Polystyrene microsphere phantom
The estimation results for the microsphere phantoms are plotted versus the theoretically computed scatterer density in Fig. 4. Here, the plotted values represent the average scattering densities among three phantoms, obtained using the same method as that employed for the Intralipid phantom (refer to the first paragraph of Section 4.2 for details).The theoretical scatterer density was calculated from the volume concentration of the phantom using Eq. ( 4).Similar to Fig. 3, the circles and the triangles represent the estimates from the new and old SDEs, respectively, and the colors indicate the probe-beam attenuation levels.The error bars indicate the standard deviations among the three measurements of the three phantoms, but the error bars are nearly invisible here because of the very high measurement repeatability (leading to very small standard deviations).In addition, both SDEs are nearly perfectly independent of the SNRs, as indicated by the fact that the data points for the different SNRs are nearly perfectly overlapping each other.
The estimates from both SDEs show highly linear relationships with the theoretically predicted scatterer density.However, the estimates from the old SDE (triangles) show significant departures from the ideal estimation line (i.e., the solid black line).In contrast, the new SDE (circles) shows very close agreement with the theoretical predictions.The ICCs between the estimateds values and the theoretical scatterer density values were very high at 0.982 (0-dB attenuation), 0.982 (-5.6-dB attenuation), and 0.981 (-12-dB attenuation) for the new SDE, whereas those for the old SDE were 0.467 (0-dB attenuation), 0.467 (-5.6-dB attenuation), and 0.460 (-11.8-dBattenuation).The higher ICCs indicate the superior performance of the new SDE.

Tumor spheroids
Figure 5 shows hourly time-lapse images of the MCF-7 spheroid.The individual rows show the OCT B-scans, the scatterer density images obtained using the new SDE, and those obtained using the old SDE (in order from top to bottom).The color scale (hue) in the scatterer density images represents the estimated scatterer density, while the brightness corresponds to the OCT intensity.Note that the images from the old SDE are identical to those presented in Ref. [18].In Fig. 4. Scatterer density estimation results for the microsphere phantoms.The vertical axis represents the estimated scatterer density and the horizontal axis represents the theoretically predicted scatterer density as computed from the volume concentration of the phantom using Eq. ( 4).The error bars represents the standard deviations among the three estimates of the three samples.The black line represents the ideal estimate, while the other lines are linear-regression lines that were computed from the data.The error bars are almost unrecognizable because of the high repeatability of the estimates.The new SDE (circles) provides estimates that are very close to the theoretical predictions (black line).In contrast, the estimate of the old SDE show significant departures from the theoretical predictions.In addition, the estimation results are unaffected by the probe-beam attenuation.Specifically, although the attenuations are represented by different colors, the plots and the regression lines with the different attenuations overlap significantly and are not really distinguishable.Fig. 6.Mean scatterer density within the spheroid region.Both the new (red circles) and old (blue rectangles) SDEs showed a reduction in the mean scatterer density over time.For the time points after 10 h, the old SDE gives constant scatterer density estimates, whereas the new SDE shows a continuous reduction in the scatterer density.As discussed in Section 5.1, the results from the new SDE are more plausible in this case.
addition, the mean scatterer density was computed within manually segmented spheroid regions and plotted versus time, as shown in Fig. 6.In the figure, the red circles and the blue rectangles correspond to the new and old SDEs, respectively.Both SDEs showed a reduction in the scatterer density over time, but distinctive differences between the two SDEs can be seen at the later time points after 10 h.At these time points, the new SDE showed a continuous reduction in the scatterer density, whereas the estimates from the old SDE remained constant.The constant estimation from the old SDE may be caused by the relatively large error of low SDE at lower scatterer densities, as shown in the numerical and phantom-based validations presented in Sections 4.1, 4.2, and 4.3.Here, we remind that the low scatterer density may cause a lower OCT signal intensity and thus lead to the higher noise dominance.Therefore, the scatterer density estimation can be more helpful in maintaining the noise model accuracy for the lower scatter density cases.The continuous reduction in the scatterer density found by the new SDE is more consistent with to our dynamic OCT findings than the results from the old SDE; this will be addressed later in the discussion section (Section 5.1).
Figure 7 shows cross-sectional (first and third rows) and en face (second and fourth rows) images obtained from a volumetric dataset of a tumor spheroid.The en face images are the slices acquired around the equator of the spheroid.The left images (a-d) are the OCT intensity images, the middle images (e-l) are the scatterer density images acquired using the new SDE, and the right (m-t) are those acquired when using the old SDE.Each scatterer density image is shown with two colormaps since the optimal display range for new-and old-SDE images are not the same.And hence, we show each image with two different colormaps.Namely, the first and third rows show the cross-sectional images in two different colormaps, similarly the second and fourth rows show en face images in the two colormaps.
The cross-sectional images showed similar appearances to the corresponding images in Fig. 5.However, the en face images illustrate that the scatterer density images are more informative than the OCT intensity images.For example, the OCT intensity at zero-hour time point [Fig.7(c)] only shows a homogeneous appearance.In contrast, the scatterer density image from the new SDE [Fig.7(g, k)] revealed that the center region has a lower scatterer density than the peripheral regions.This may be an indication of the well-known necrotic core of MCF-7 spheroids [27,31,32].Notably, the core is not visible when using the old SDE [Fig.7(o, s)].This may occur because the estimator accuracy of the old SDE is too low for the low scatterer density  in this case.
The en face OCT intensity images at the two time points [Fig.7(c) and (d)] do not show clear differences, whereas the scatterer density images [Fig.7(g, k) versus (h, i) and Fig. 7(o, s) versus (p, t)] show clear reductions in the scatterer density at the late time point (20 h).The reduction in the scatterer density at the later time point is reasonable because the dominant scatterers in these cells are the nuclei and cell organelles, and they are resolved by cell death.Some discussions related to this point can be found in Section 5.1.It should also be noted that the scatterer density from the old SDE at the later time point [Fig.7(p, t)] is higher than that obtained from the new SDE [Fig.7(h, l)].This difference also can be attributed to the low accuracy of the old SDE for low scatterer densities.
Additionally, we can see that the scatterer densities of the upper and lower parts of the spheroid are not symmetrical.Specifically, the upper hemisphere shows a higher scatterer density than the lower hemisphere.This issue is discussed in greater detail in Section 5.2.

Time course reduction and scatterer density
Both the 2D and 3D spheroid imaging results presentrd in Section 4.4 showed time-course reductions in the scatterer densities.As shown in Fig. 6, the new SDE indicates a continuous reduction in the scatterer density over 28 h, while the scatterer density estimated using the old SDE becomes constant after 10 hours.
These reductions in the scatterer density can be understood more easily when using dynamic OCT imaging.In our previous paper, the same datasets used in Figs. 5 and 6 were analyzed using two dynamic OCT methods, i.e., the logarithmic intensity variance (LIV) method and the late OCT correlation decay speed (OCDS  ) method.The mean LIV and mean OCDS values within the entire spheroid region were found to be decreasing continuously over 28 hours.These reductions in the dynamic OCT signals indicate the progression of necrosis within the spheroid [27].
The necrotic cell death causes destruction of the nuclei and the cell organelles.Because the nuclei and the organelles are the dominant scatterers within the cell, the progression of the Fig. 8. Numerical validation of the new SDE was conducted for each noise type.In the plot, the color and shape of the points represent the noise type, while the black line indicates the ideal estimation.It was observed that the estimates approach the ideal line as the SNR increases.However, no significant dependency on noise type was noted.
necrosis causes a reduction in the scatterer density.
According to the dynamic OCT analysis, the necrosis has progressed continuously over 28 h; this may also suggest that the reduction in the scatterer density also may progress over this time period.Therefore, the results from the new SDE, which showed a continuous reduction in the scatterer density, are more plausible than the results from the old SDE, in which the scatterer density becomes constant after 10 h.

Asymmetric appearance of the scatterer density in spheroids
Cross-sectional scatterer density images of the spheroid, where Fig. 5 (b) and (c) show asymmetric appearances between the upper and lower parts of the spheroid.This asymmetry may partially be attributed to the occurrence of multiple scattering in the deeper region of the tissue.Because our speckle generator does not incorporate the multiple scattering effect, the estimation accuracy of the SDE may decrease in the deeper region.
Several possible solutions have been proposed to address this issue.One approach is to apply one of the available multiple-scattering rejection methods [33][34][35][36].Another possible solution is to improve the speckle generator to allow it to account for the multiple scattering effect.Several models and simulation methods have been proposed that can account for multiple scattering [37][38][39][40][41].Although these current methods are computationally intensive and rather too slow to generate the massive amount of training data required, future investigations of the theoretical model and associated simulation methods may enable development of a speckle generator that can account for multiple scattering with reasonable computation speeds.

The impact of noise types on the estimation accuracy
Since we distinguished three types of the noise to train the new estimator, it was anticipated that the estimation accuracy may depend on the dominant noise type of the input image.This issue was investigated by using numerically generated speckle patterns, and it was found that the estimation accuracy was not significantly affected by the dominant noise types.The details of this study are as follows.
A total of 180 3D speckle patterns were generated, each with resolutions and scatterer density randomly selected from the same ranges as those used in the training-set generation (Section 3.1).From each of these 180 3D speckle patterns, one 2D speckle pattern was extracted, resulting in 180 fully independent 2D speckle patterns.To each 2D speckle pattern, only one type of noise, i.e., shot-noise, RIN, or non-optical noise, was added.The SNR was set to 3, 20, or 50 dB.Finally, 20 noisy 2D speckle patterns were generated for each combination of SNR and noise type.
The speckle images were processed by the new SDE, and the estimation results are summarized in Fig. 8.In this plot, each color and shape of the plot points indicate the type of noise, while the black line represents the ideal estimation line.It was found that the estimates are closer to the ideal line with higher SNR for all noise types.However, no marked dependency on noise type was observed across all SNRs.

Open issue: Physical mechanism used in the SDE
As we have described in the introduction, the development of the improved noise model was motivated by our anticipation about the possible working principle of the CNN-based SDE.Specifically, we suspected that the SDE might internally estimate speckle contrast and resolutions from speckle patterns.Speckle contrast was expected to aid in estimating the number of scatterers in the resolution volume (ENS), while resolution volume could be derived from estimated resolutions (as discussed in Section 4.3 of Ref. [18]).This principle could work effectively if speckle contrast were sensitive to ENS, particularly for small ENS values However, we found that our SDE could function not only in this regime but also for cases with large ENS values.For instance, Fig. 9 depicts the speckle contrast of OCT speckles as a function of ENS, where speckles were numerically generated with the same parameter ranges of resolutions and scatterer density as the validation dataset used in Section 4.1, and noise was added to the speckle pattern.Speckle contrast is defined as the standard deviation of signal amplitude over the 3D speckle pattern divided by the mean amplitude.The ENS is defined in the same manner as described in Ref. [19], where it represents the number of scatterers within a volume of Δ × Δ × Δ.Here, Δ and Δ denote the lateral resolutions, defined as the 1/ 2 -width of the intensity, while Δ represents the axial resolution, defined as the FWHM of the intensity.As illustrated in this plot, the speckle contrast remains almost constant across most of the ENS ranges.
During our numerical validation and phantom experiments, the new SDE provided accurate estimates even for ENS values around a few hundred.Namely, the SDE worked even for high scatterer density samples where the speckle contrast remains nearly constant.
We suspect that the SDE estimator leverages additional properties of speckle beyond speckle contrast and resolutions.One such property might be the spatial shape or statistics of the speckles.Given that this study demonstrated improved estimation accuracy by considering the spatial properties of noise and signal, this suspicion appears plausible.
Our results affirm the SDE's strong estimation performance, yet further analysis of the trained model and investigation into potentially exploited physical principles remain important.

Computation time of the speckle generator and the SDE
The overall training process of the CNN-based SDE can be split into two sub-processes, i.e., speckle generation and CNN training.The generation of 80,000 3D speckle patterns with dimensions of 16 × 16 × 16 pixels takes approximately 3 hours when using a desktop PC equipped with an Intel Core i7-6900K CPU and a graphics processing unit (GPU; GeForce RTX 3090 Ti, NVIDIA).The main memory of the PC is 128 GB in size, and the GPU has 10752 cores, a 1.86 GHz boost clock, and 24 GB of memory.The speckle generator is written in Python (version 3.7) with a GPU-compatible numerical computation library (CuPy 12.0.0).Notably, the speckle generator is more than 11 times faster than our previous version, which did not use a GPU.Therefore, usage of a GPU is essential when generating large training datasets.
Subsequently, training of the CNN model takes approximately 1 h, where the CNN is written in Python 3.7 with Keras 2.3.1 based on the TensorFlow backend.Therefore, the entire training process takes approximately 4 hours.
Scatterer density estimation (i.e., inference) of a 512 × 402 pixel cross-sectional image using a 16 × 16 pixels sliding window took around 1 minute, and thus the estimation process for a volumetric image that consists of 128 B-scans takes approximately 2 hours.The inference can be faster by using more optimized network architecture and/or implementation, and this would be a future research topic.

Compatibility of the trained model and the scanning protocol
Because the speckle generator generates a speckle pattern with a specific pixel size, each training dataset (i.e., each set of speckle patterns) is specific to a single scan protocol.Therefore, speckle patterns should be generated for each different scan protocol and a specific SDE model should then be trained using these specific speckle patterns.
The other compatibility issue that must be considered is the minimum separation required between adjacent A-lines (i.e., the lateral pixel size).We suspect that our new SDE uses the different lateral spatial properties of the noise and OCT signals to distinguish them, as summarized in Table 1.Specifically, we have assumed that the noise has fully decorrelated between the adjacent A-lines, as described in Section 2.2.However, the OCT signals are mutually correlated along the lateral direction with the correlation distance around the lateral resolution.The different lateral spatial properties of the noise and the OCT may have been used by the SDE model to discriminate the OCT signal from the noise.However, the different lateral extents only become pronounced if the A-line separation is smaller than the lateral resolution.Therefore, the OCT scan protocol to be used with our SDE should have an A-line separation that is smaller than the lateral resolution.
It is also notable that, according to Eq. ( 2), the RIN may have the same axial extent as the OCT signal.Thus, if the CNN model cannot use the different lateral properties of the signal and the noise, as in the case where the A-line separation is far greater than the lateral resolution, the CNN model cannot discriminate the RIN from the OCT signal.This further emphasizes the need to consider the minimum separation required for the A-lines.

Conclusion
In this study, we demonstrated a CNN-based SDE that is trained using speckle patterns generated by a numerical speckle generator.This speckle generator uses a physically realistic noise model that takes the spatial properties of shot noise, RIN, and non-optical noise into account.The SDE was examined using numerically generated OCT speckle patterns along with the experimentally obtained OCT images of two types of scattering phantom and in vitro tumor spheroid samples.In these examinations, the SDE's estimation performance was compared with that of our previous CNN-based SDE, which was trained with speckle patterns that did not account for the spatial properties of the noise.All the validation experiments demonstrated the superior estimation accuracy provided by the new SDE.In particular, the validation using microsphere phantoms demonstrated the excellent agreement of the estimated scatterer density with the ground truth.
In future work, this new SDE can be used in a variety of biomedical application to assess the sub-resolution structural properties of tissues and cells.

Fig. 1 .
Fig.1.The flow diagram illustrates the OCT speckle generation process.Initially, a 3D random scattering field is generated (a), which is then convolved with a 3D Gaussian PSF that is computed from Eq. (3).The resulting high-resolution speckle pattern is downsampled (c).Subsequently, three types of noise, i.e., shot noise, RIN, and non-optical noise, are added to obtain the final OCT speckle pattern which is used for training the CNN.

Fig. 2 .
Fig.2.Numerical validation of the new SDE (red circles) and the old SDE (blue triangles).The horizontal and vertical axes correspond to the set scatterer density (the ground truth) and the estimated densities, respectively.The black solid lines represents the perfect estimate.The new SDE gives reasonable estimates, whereas the estimates from the old SDE are downshifted significantly from the ground truth.

Fig. 5 .
Fig. 5. Time-lapse images of a tumor spheroid.The individual rows show (a) the OCT intensity images, and the scatterer density images obtained by the (b) new and (c) old SDEs.Both SDEs show a reduction in the scatterer density over time.

Fig. 7 .
Fig. 7. 3D time-course images of MCF-7 spheroid.(a-d) show OCT intensity images,and (e-l) and (m-t) are scatterer density images acquired when using the new and old estimators, respectively.The first and third rows represent the cross-sectional images, while the second and fourth rows represent en face images, respectively.At the zero-hour time point, the OCT intensity did not show a clear structure within the spheroid, whereas the en face scatterer density images showed a clear low-density core at their center.In addition, the en face scatterer density images showed clear difference between two time points, while this stark contrast was not observed in the en face OCT intensity images.The scale bar represents 200 m.

Fig. 9 .
Fig. 9.The simulated speckle contrasts as a function of ENS, where the speckle patterns were numerically generated with the same resolution-and scatterer-density ranges with the validation dataset of Section 4.1.The speckle contrast becomes almost constant if the ENS is larger than a few scatterers.(b) provides a zoomed-up view of the red box in (a).