Maximum-likelihood estimation in Optical Coherence Tomography in the context of the tear film dynamics

: Understanding tear film dynamics is a prerequisite for advancing the management of Dry Eye Disease (DED). In this paper, we discuss the use of optical coherence tomography (OCT) and statistical decision theory to analyze the tear film dynamics of a digital phantom. We implement a maximum-likelihood (ML) estimator to interpret OCT data based on mathematical models of Fourier-Domain OCT and the tear film. With the methodology of task-based assessment, we quantify the tradeoffs among key imaging system parameters. We find, on the assumption that the broadband light source is characterized by circular Gaussian statistics, ML estimates of 40 nm +/ − 4 nm for an axial resolution of 1 μ m and an integration time of 5 μ s. Finally, the estimator is validated with a digital phantom of tear film dynamics, which reveals estimates of nanometer precision.


Introduction
Dry Eye Disease (DED) is a multifactorial disease of the tears and the ocular surface. Symptoms include discomfort, visual disturbance, and tear film instability, and in some cases damage to the ocular surface [1]. DED, a serious public health problem that affects more than 100 million people worldwide, degrades the quality of life seriously [2,3]. However, therapies for DED remain elusive for two reasons: there is no quantitative, highly repeatable, objective diagnosis method, and there is little or no correlation between signs and symptoms [4]. In this work, we hypothesize that tear film instability is a key indicator of DED and can be quantified by tear film dynamics, specifically the thinning rate of the tear film across the cornea.
Optical coherence tomography (OCT) is a biomedical imaging method that has emerged in the last two decades [5]. In OCT, one collects the back-reflected, or scattered, light from tissue. The method is similar to ultrasound, but it operates in the near infrared region of the spectrum. OCT has been used extensively in ophthalmology, especially for imaging the retina. In this work, we propose to use OCT to monitor tear film dynamics. Considering the speed of changes in tear film thicknesses, Fourier-Domain OCT (FD-OCT) is tailored to this challenge.
Statistical decision theory has been widely used in various biomedical fields in which random processes associated with light emission and detection must be accounted for. Statistical decision theory was introduced into time domain OCT for classification tasks [6,7]. We recently reported on key results of using statistical decision theory to estimate tear film thickness [8]. In this paper, we present the detailed mathematical framework that underlies these results, and we further report on the limit of the estimation. To our knowledge, this work represents the first time a maximum-likelihood (ML) estimator has been used to interpret OCT data.
In section 2 of this paper, we present a mathematical model of an FD-OCT system in the context of the ML estimation. We present a model of the tear film in section 3. Next, in section 4, we detail the generation of simulated spectra that serve as inputs to the ML estimator. The principle of the ML estimator is then detailed in section 5, while its performance is investigated in section 6. In the last section of the paper, the ML estimator is validated using a digital phantom.

Mathematical model of FD-OCT in the context of the ML estimation
In this work, we consider a free-space setup of an FD-OCT system that is shown schematically in Fig. 1. It is based on a Michelson-like interferometer geometry, in which the detector is a spectrometer. The broadband light source emits an electric field that can be regarded as a superposition of plane waves and expressed as The electric field for a plane wave with an angular frequency ω can be written as These plane waves are split at the beam splitter. One of the two resulting waves travels into a reference arm, the other travels into a sample arm. The electric field propagating in the reference arm will accumulate a phase delay φ 1 (ω), which accounts for the optical path length, the dispersion, and any other shifts of phase from the reference arm. We let α(ω) represent the amplitude loss caused by reflection, absorption, and the loss at the beam splitter. The electric field propagating in the sample arm will, similarly, experience a phase delay φ 2 (ω). We Let β(ω) represent the amplitude loss and phase shifts caused by reflection, beam splitting, and interaction with the specimen. These two modulated electric fields will recombine at the beam splitter and propagate to the detector. The detector in FD-OCT is a complementary metaloxide-semiconductor (CMOS) or charge-coupled device (CCD) camera-based spectrometer, whose output is a number array. Each element of the array is proportional to the number of electrons accumulated at an associated pixel during one integration time.
The electric field on the x th pixel of the detector can be written as For simplicity, we define Therefore Eq. (3) may be rewritten as where the angular frequencies of the incidence waves on the x th pixel span from ω x -Δω x to ω x . The number of electrons, N g (x,Δt), accumulated at the x th pixel during one detector integration interval Δt can be written as with N p (x,Δt) being the number of electrons generated by photons, and N d (x,Δt) being the number generated by dark current. The ML estimator requires information about both the first-order and second-order statistics of the output, so we now formulate the ensemble average and covariance of N g (x,Δt). To establish a model benchmark, we characterize the broadband source by circular Gaussian statistics, and we characterize the photon inter-arrival times as a Poisson process. We will discuss the characterization of the detectors dark noise below. The ensemble average and covariance of N g (x,Δt) may be related to N p (x,t) and N d (x,t) as where the subscripts G, P, and D denote circular Gaussian statistics, Poisson statistics, and dark noise, respectively. Since N p (x,Δt) and N d (x,Δt) are independent, the covariance K Ng (x,x';Δt) will be the sum of the covariances of N p (x,Δt) and N d (x,Δt), where x and x' denote any two pixels in the array. A statistical characterization of the dark noise is typically provided in the camera manufacturer's manual, but this characterization may be based on assumptions, such as that the camera is operated at a certain temperature, which may not be consistent with the experimental environment. Thus, we characterize the detector's dark noise experimentally. We block the incident beam and we take several readings. We then calculate the mean and the variance of the measured data, both of which are required by the model. Next we calculate the statistical quantities of N p (x,Δt). The number of photoelectrons accumulated during one detector integration interval can be written as where N(x,t') represents the generation rate of photoelectrons. Thus the first term on the right hand side of Eq. (7) may be rewritten as If the conditional mean <N(x,t')> P|G is denoted as N̅ (x,t'), then the ensemble average of N(x,t) may be related to the incident electric field by the following expression (11) where A is the area of one pixel, R(x) is the pixel's responsivity, η 0 is the impedance of air, and e is the charge of the electron. It is shown in [6] that where S(ω) is the power spectral density of the source. Thus the ensemble average of N(x,Δt) in Eq. (11) may be rewritten as By substituting this expression into Eq. (10), we can express the ensemble average of N p (x,Δt) as Next we will calculate the second order statistics of N p (x,Δt). Given that <<N p (x,Δt)>> doesn't depend on time, different measurements are independent and we only need to consider the covariance of different pixels in the same integration time, which can be expressed as The first term on the right-hand side of Eq. (15) can be expressed as [9] Thus K Np (x,x';Δt) may be written as According to the definition of covariance, we have Together with Eq. (11), applying the complex Gaussian moment theorem [10] to the first term on the right-hand side of Eq. (18) yields It follows from Eq. (5) and Eq. (12), that By substituting these expressions into Eq. (20) we can rewrite it as By combining Eq. (17) and Eq. (23), we can express the covariance K Np (x, x';Δt) as

Tear film model
The normal tear film consists of three parts: the lipid layer, the aqueous layer, and the mucin layer. Because the aqueous layer has the largest volume in the tear film, here we only consider the aqueous layer [11]; we take this as a first approximation, to be refined in future investigations. We model the tear film as a one layer structure with two interfaces, the interface of the air and the tear film and the interface of the tear film and the cornea; these are shown schematically in Fig. 2, where n 1 , n 2 , and n 3 are the refractive indexes of the air, tear film, and cornea, respectively. In the figure, l is the thickness of the tear film, and r 1 and r 2 are the reflection coefficients of the two interfaces. The interface of the tear film and the cornea is considered rough (and is illustrated as such), as it is known that the corneal surface has microplicae and glycocalyx [11]. In our model, we use the root mean squared deviation in the height of the surface, σ, to characterize the surface roughness. We assume that deviations of the corneal surface from a smooth plane can be described by a stochastic process with a Gaussian distribution. We assume that the incident light strikes the corneal surface at a right angle. Using the perturbation theory developed in [12], the following equation has been established [13] ( ) 2 2  2  3  2  2  2  3 exp 2 , n n r k n n ε σ where ε 2 is the dielectric constant of the tear film. At the interface of the air and the tear film, which we consider to be smooth, the reflection coefficient, r 1, is given by Fresnel's law of reflection as The total response m(ω) from the OCT system can be written as

Generate simulated OCT imaging data sets
To generate one simulated image for a tear film with thickness l, we assume that one instance of the output at the x th pixel follows a normal distribution: Thus we use a normal random number generator to simulate a spectrum N lg , which is a vector representing the output along the camera's pixel array. It is important to note that we simulate spectra to benchmark the framework; in a real measurement, N lg will be a measured spectrum, which will be the input to the ML estimator detailed in section 5.

Principle of the ML estimator
On the assumption that all pixels are independent of each other, the probability that a particular spectrum N lg is generated by a tear film of thickness d can be expressed as where M is the number of pixels, and Det is the determinant. The ML estimator estimates parameters by maximizing P(N lg |d) with respect to d, which is equivalent to minimizing the following expression Therefore, the estimate from one data set N lg will be given by  Fig. 3. Principle of the ML estimator illustrated with different axial resolutions of 1 μm, 2 μm, and 4 μm. Figure 3 illustrates the principle of our ML estimation. We consider spectra that are Gaussian-shaped, with 1 μm, 2 μm, and 4 μm axial resolutions, respectively. Axial resolution, in this paper, is conventionally defined as the half width half maximum of the axial point spread function [7,14]. The source power is set at 1 mW to be in compliance with the ANSI standard for eye safety in this spectral region [15]. The integration time is 5 μs. Ground truth of the tear film thickness is set to be 0.5 μm, and we assume a root mean squared height of the corneal surface of 150 nm [16] to characterize the corneal roughness.
The plots in red frames of Fig. 3 show the negative log of the conditional probability, which characterizes the probability that one observed spectrum N lg is generated by a tear film thickness d. Simulations for three different optical axial resolutions are shown. Results show that the conditional probability oscillates when the possible thickness changes. The distance between two adjacent peaks is half of the center wavelength in the sample, or 300 nm (in the tear film). For a tear film whose thickness is 0.5 μm the three resolutions of 1, 2, and 4 μm yield estimates of 505 nm, 485 nm, and 545 nm, respectively, where each estimate is determined by the position of the minimum point in the corresponding conditional probability plot shown in Fig. 3. We will characterize precision of this measurement in section 6.1. Note that with this method we can obtain estimates with precision greater than the optical axial resolution of the system. It is clear from Fig. 3 that for low resolution (i.e., 4 μm) the estimator misses the minimum value because nearby thicknesses yield values comparable to the minimum.

Impact of detector integration time
Imaging speed is a key factor in the performance of the system. The two parameters that may affect the imaging speed, the repetition rate of the pulsed source and the detector integration time, must be considered. In OCT, a first requirement is that the source speed is at least as great as the detector's response rate. Therefore we will study the impact of the detector integration time. To quantify how the system performance changes with integration time, we set up a simulation in which we consider a fixed tear film thickness of 0.5 μm and we repeat the estimation 5000 times. We then compute the root mean squared error (RMSE) to evaluate the performance for different optical axial resolutions as a function of integration time. We plot the resulting RMSE as a function of integration time in Fig. 4. These results show that for a fixed axial resolution, the RMSE decreases exponentially with increasing integration time, eventually reaching a plateau value in the nanometer range. For the same integration time, higher axial resolution results in a higher precision. There are two costs to consider in pushing the system to highest axial resolution: one is the cost of the light source, and the other is the possible reduction in imaging depth. Indeed, as the axial resolution increases, the spectral bandwidth increases as well, but may still be collected on an array of the same size. Therefore, the spectral resolution typically decreases, unless a more costly detector is used; if one uses the same detector, the depth of imaging is typically reduced.

Limit of the estimation
In estimating thickness, it is important to identify the limit of precision of the ML estimation. To study this limit, we vary the thickness of the ground truth in the simulation and calculate the relative error, which is defined as the ratio of the RMSE to the tear film thickness. In this simulation, the integration time is fixed to be 5 μs and the RMSE is calculated based on 5000 simulated data sets. Results are shown in Fig. 5. For axial resolutions of 1 μm and 4 μm, the relative error diminishes as the tear film thickness increases. For a maximum 10% relative error (shown as a red dashed line in Fig. 5), the ML estimator can estimate thicknesses as low as 0.04 μm and 1.3 μm for axial resolutions of 1 μm and 4 μm, respectively, where each estimate corresponds to the intersection of the red dashed line and the curve of relative error.

Validation of the ML estimator with a digital phantom
Maki et al. were the first to develop a comprehensive mathematical model for the tear film dynamics on an eye-shaped domain. They studied the relaxation of the human tear film after a blink on a stationary eye-shaped domain (corresponding to a fully open eye) using a mathematical model based on lubrication theory [17]. The model provides the ground truth of our simulations. We evaluate the performance of the ML estimator assuming an optical axial resolution of 1 μm and integration times of 5 μs and 20 μs, respectively, which is plotted in Fig. 6. Results show that the ML estimator is unbiased, as expected. We further learn that as the integration time increases by a factor of 4, the precision improves by only a factor of 2. Results show uncertainty in the estimates that are less than 10 nm and 5 nm for integration times of 5 μs and 20 μs, respectively.

Conclusion
We have developed and presented a mathematical model of FD-OCT with a single layer (aqueous) tear film model laid on a rough corneal surface. To benchmark the framework of ML estimation in FD-OCT, we implemented an ML estimator to interpret simulated OCT data sets that were generated from the mathematical models. We quantified the tradeoff between the detector integration time and the precision of the estimation. Results show that a combined axial resolution of 1 μm (defined conventionally as the half width at half maximum of the axial point spread function) and an integration time of 5 μs allow estimates in the nanometer range (e.g. 40 nm). Finally, the estimator was validated with a digital phantom of tear film dynamics. Results reveal estimates of nanometer precision.
In future studies we will first further develop the theoretical framework presented in this paper to account for the two-layer structure of the tear film for both its lipid and the aqueous/mucin layers. We will also next consider the statistics of real sources that will be used in our future experiments and compare theoretical results against the performance benchmarked in this paper. In validating the theoretical framework in experiments, physical phantoms will also need to be developed with high accuracy. The theoretical foundation presented in this paper using a digital phantom is guiding step by step the development of an optimal OCT system to quantify tear film dynamics. Key theoretical results of an optimized OCT system developed from this guiding theoretical framework will be fully validated in experiments.