Implementation of the extended Kalman filter for determining the optical and geometrical properties of turbid layered media by time-resolved single distance measurements.

In this article we propose an implementation of the extended Kalman filter (EKF) for the retrieval of optical and geometrical properties in two-layered turbid media assuming a dynamic setting, where absorption of each layer was changed in different steps. Prior works implemented the EKF in frequency-domain with several pairs of light sources and detectors and for static parameters estimation problems. Here we explore the use of the EKF in single distance, time-domain measurements, together with a corresponding forward model. Results show good agreement between retrieved and nominal values, with rather narrow analytical credibility intervals, indicating that the recovery process has low uncertainty, especially for the absorption coefficients.


Introduction
During the past thirty years, Near Infrared Spectroscopy (NIRS) has become an increasingly interesting field in Biomedical Optics, due to the low absorption that NIR light presents in biological tissues [1][2][3][4][5][6]. Among these, one of the most important organs is the human brain, where chromophores concentration in blood (such as haemoglobin) is modified by external stimuli or internal lesions [7][8][9]. In this sense, brain haemodynamics can be studied by means of the analysis of changes in light absorption when light signals are injected from the outside. Hereby photons must propagate through a layered system consisting of scalp, skull and cerebrospinal fluid (CSF)-among others-before reaching the cerebral cortex. It is important to translate these absorption changes into changes in the content of chromophores in the blood. That can be done by analyzing the distributions of time of flight (DTOFs) of photons that are affected by variations in the optical properties in the studied system, which are the absorption coefficient (µ a ) and the reduced scattering coefficient (µ s ).
Several methods have been introduced in order to solve the inverse problem of retrieving the optical properties of layered media from a single measurement or a set of measurements [10][11][12][13][14]. Most of these methods are based on the Maximum Likelihood Estimator (MLE) approach. There are, at least, two disadvantages in using such techniques: first, depending on the configuration of the solvers, it might be necessary to keep track on the initial points, and propose ad hoc criteria for the selection of the corresponding initial point for each measurement separately. This approach was considered in [12], where the authors use a fixed initial distribution (characterized by an initial mean and covariance) for every single measurement. Second, we may waste prior information on the dynamics as well as any other information we may have collected prior to the measurements. For example, if after a given set of recoveries, all of them showed an increasing behavior, we might expect the next recovery to increase as well.
In this publication we present an algorithm based on the Extended Kalman Filter for retrieving the optical as well as the geometrical properties of two-layered media, using time-resolved, single-distance measurements. This algorithm is a modified version of the one presented in Ref. [15], where the recovery of optical properties and thicknesses of a four-layered turbid medium in a multi-distance setup using a frequency-domain approach was performed, through the proposal of a static estimation problem. The main improvements are the use of a single-distance approach, a more straightforward deconvolution process that does not require calibration of the corresponding hyperparameters, and the discard of the calibration process, which implies less complexity and computation times. Other changes regarding the forward model were also implemented, which strongly impact the efficiency of our method.
This work is structured as follows. Section 2 introduces the analytical model and the experimental setup. Section 3 describes the extended Kalman Filter together with the corresponding improvements in the methodology. Section 4 deals with the tools used to reduce the complexity and the time computation of the forward model. Section 5 gives details on the particular implementation on the EKF from the statistical point of view for the different studied situations. Section 6 presents the main results, which are finally analyzed in Section 7, also concluding with a brief summary of the whole work, together with some ideas for future improvements.

Theoretical model
The situation to be modelled is presented in Fig. 1. A laser beam impinges on the center of the top surface of an N-layered cylinder of radius R, where layer j = 1, . . . , N has optical properties µ a, j and µ s, j (so that the diffusion coefficient is defined as D j = 1/(3µ s, j )), refractive index n j and thickness l j , except for the last layer, which is taken as semi-infinite [16], i.e, l N = ∞. According to the diffusion approximation, scattering becomes isotropic at a depth z 0 = 1/(µ s,1 ) [1], so the actual position of the isotropic source is r = (0, 0, z 0 ). The reflectance R (ρ, t) with optode separation ρ at time t at the surface z = 0 can be obtained as follows: where J 0 and J 1 are the Bessel functions of the first kind and orders zero and one, respectively; A = A(n) is a factor that depends on the refractive index mismatch between the first layer and the surrounding medium [3], R EB is an extrapolated radius given by R EB = R + z b,1 (with extrapolation distance z b,1 = 2AD 1 ); and s n is the n-th scaled order zero Bessel function root such that J 0 (s n R EB ) = 0. The Green's function G 1 (s n , z, ω) for the first layer has the form: , being α j = µ a, j D j + s 2 n + iω D j c j and c j the speed of light in layer j. The factors β N and γ N depend on the optical properties and thicknesses of the remaining layers, except for the two-layered case, where β N = γ N = 1.

Experimental setup
Experimental data were taken from single-distance, time-resolved reflectance measurements conducted on a two-layered liquid phantom, as illustrated in Fig. 2. The experiments were performed with the time-domain NIRS instrumentation described in Refs. [17,18]. It was based on picosecond diode lasers (Sepia, Picoquant GmbH, Berlin, Germany), fast single-photon detectors, and time-correlated single-photon counting (TCSPC) modules (SPC-134, Becker & Hickl GmbH, Berlin, Germany). The measurements were part of a performance assessment of time-domain optical brain imagers according to the "nEUROPt Protocol" [19] in which several systems and configurations were compared. The dataset used in the present work as well as in Ref. [20] corresponds to configuration "PTB 1" in Ref. [19]. It was acquired with a 797 nm laser head and with a R7400U-02 photomultiplier tube (Hamamatsu Photonics, Japan). Data from the same two-layer phantom experiment but recorded by another detector were analyzed in Ref. [12]. The picosecond laser pulses were guided to the surface of the phantom by a multimode optical fiber (core diameter 200 µm). At a distance of ρ = 30 mm diffusely scattered photons were collected by a fiber bundle (diameter 4 mm, length 1.5 m, NA 0.54) connected to the PMT. The phantom that is described in detail in Ref. [21] consisted of a container made of black polyvinyl chloride. The front plate (thickness 2 mm) was equipped with two plexiglass windows (diameter 7 mm) for the source fiber and the detector fiber bundle. A Mylar foil of 30 µm thickness was used as a separator to realize the two-layer structure. The thickness of the first layer was l 1 = 9mm, the thickness of the second layer l 2 = 61 mm. Both layer volumes in the container were filled with liquid solutions made of water, Intralipid-20% to adjust the scattering coefficient, and black India ink to adjust the absorption. The dependence of the (nominal) optical properties on the Intralipid and India ink concentration was known with high accuracy from previous measurements within a multicenter study [17]. The procedure of the preparation of the liquid was described in detail in Ref. [20].
A total of 24 measurements with different absorption coefficients µ a,1 and µ a,2 of the two layers was done, while the reduced scattering coefficients of both layers remained constant (µ 0, s, 1 = µ 0, s, 2 = 1mm −1 ) throughout the study. In the first twelve measurements, the absorption coefficient µ a,1 of the first layer was increased almost linearly, with smaller steps in the range of the smallest changes. In the second twelve measurements, µ a,2 was stepwise increased after µ a,1 was set back to the start value of measurement 1. Figure 3 illustrates the nominal values of the parameters of interest that are considered as unknown parameters in the following analysis. The photon count rate was adjusted to 1 × 10 6 s −1 in the homogeneous case (measurements # 1 and # 13), and the filter settings were not changed when adding absorption. The lowest count rate in the series was 2 × 10 5 s −1 . The DTOFs were recorded with a collection time of 1 s,  with 100 repetitions in each measurement. After each measurement the instrumental response function was recorded. Due to the small time gap to the measurement, long term drifts could be excluded in this way. Figure 4 shows the DTOF for measurement # 1 (homogeneous case) after summation over the 100 repetitions, together with the normalized IRF. Furthermore, the DTOFs for measurement # 12 (largest absorption in layer 1) and measurement # 24 (largest absorption in layer 2) are plotted for comparison. The DTOFs have been normalized to the maximum of DTOF # 1. The IRF exhibits a small shoulder due to the shape of the laser pulse and its halfwidth is approximately 500 ps.

The extended Kalman filter
The Extended Kalman Filter (EKF) [15,22] is a non-linear, non-optimal generalization of the Kalman Filter which is suitable for problems where the forward model is non-linear. In our specific application we will name x to the vector containing the optical properties µ a and µ s (for the corresponding layer), the geometrical parameter l and t 0 (an optional time shift to account for an imperfect correction of the delay due to different fiber arrangements in IRF and phantom measurements), and y to the measured DTOF. The subscript t will indicate the respective measurement in the sequence described above, i.e. t = 1, . . . , 24. Considering an evolution operator F, an observation operator H, the Extended Kalman Filter can be used to solve problems written in the form where the covariance matrices Q t and R t stand for the process noise and the measurement noise, respectively. This means that the uncertainty of the evolution of the variable x t follows a Normal distribution with zero mean and covariance matrix Q t (denoted as N(0, Q t )), and the uncertainty in the measurement follows a Normal distribution with mean 0 and covariance R t . The variables η and ν are realizations of those random variables. Assuming independence between F and η and between H and ν, it can be seen that the residuals follow a Normal distribution as described above. The EKF advances in time by updating information provided by two operators: the evolution operator and the observation operator. The predicted distribution is constructed using the information of the prior step and the evolution operator; then, the data is compared with the measurement using the observation operator and the discrepancies are corrected. The result of this procedure is the updated distribution, and it can be summarized in the following iterative process: considering the initial distribution N(x 0 , Γ 0 ), in step t + 1 the mean and covariance x t+1 , Γ t+1 follow the upgrade equations Prediction step: Update step: where H = ∂H ∂x (x t+1 |t ) is the Jacobian matrix of the observation operator and K t is the Kalman gain matrix. The output of the EKF is the updated distribution after considering, at each step, the prediction according to our evolution proposal, and correction after seeing the measurement, i.e. N(x t+1 |t+1 , Γ t+1|t+1 ).
Although the output of the EKF is a distribution, we may be interested in particular point estimates which may be useful to analyze particular cases, such as the mode, the median, the mean, etc. In this work we will use the Maximum A Posteriori (MAP) estimate, which corresponds to the mode of the resulting distribution after an appropriate reparametrization [15].

Model acceleration
The model presented in Section 2.1 requires the calculation of the G 1 function for many frequencies ω and for many Bessel zeros. In order to ensure stability and convergence of the model, it is necessary to perform many calculations which can be superfluous. In order to accelerate the calculations we propose two improvements which give substantial gain in computation time.

GPU implementation
As the method involves an inverse Fourier Transform and an inverse discrete Hankel Transform, the computational cost is of O((sw) 2 log(w)) where s and w are the number of Bessel zeroes and frequencies, respectively. Although a vectorized implementation can be used, the amount of operations can be huge. The impact of this can be observed in the inverse problem time, i.e. on an Intel i7 3630QM processor with 16 GB RAM at 1600 MHz the evaluation time is about 2.15 seconds. In an iteration of the EKF we may need some hundreds of evaluations per iteration, leading to very large iteration times. The first proposal is to use a tailored GPU parallel implementation. To this end, we applied a NVIDIA GeForce GT 760M graphics card. In Fig. 5 the ratio between CPU and GPU times is plotted, showing that we can get an up to 4x gain with this GPU.

Model reduction
The model defined in Section 2.1 is constructed in the frequency-domain and taken to the timedomain through a Fourier transform. A possible reduction of the number of frequencies in the model can be obtained by seeing how much information can be retrieved through deconvolution processes. Given that the measurement is the convolution of the DTOF and the instrument response function, it is necessary to extract the DTOF from the measurement in order to compare the information provided by the DTOF with our models and this is done by deconvolution. As the measurement is noisy, as well as the IRF (because they are realizations of random variables), we may encounter problems in such process. Here, we will show two types of deconvolution: a plain deconvolution and a Tikhonov regularized deconvolution [23].
The plain deconvolution can be stated in a very natural form and is obtained by performing: where g is the Instrument Response Function (IRF) and f measured is the collected data in timedomain. This implementation is not reliable because both, f measured and g are contaminated by noise and we cannot guarantee that the quotient F(f measured ) lies in the Schwarz space and, thus, may not converge to a useful solution. Nevertheless, some useful information may still be recovered in the frequency-domain at low frequencies before noise dominates, as Fig. 6 shows in the highlighted region of interest. The Tikhonov regularized deconvolution [23] can be stated as: where * stands for complex conjugation and λ is a tunable hyperparameter which, in this work, was obtained through the L-curve criterion [24]. It can be shown that the Tikhonov regularized deconvolution is well defined and can be used to reconstruct the underlying DTOF [25]. However, our interest is the region where both approaches share the same kind of information when comparing them with the corresponding model and we need to see the usable information in frequency-domain, not in time-domain. This is also necessary because our model is constructed in the frequency-domain and then transformed to the time-domain. In Fig. 6 we can see a region which allows us to devise a reduced model. Instead of using the full information provided by the model using all the frequencies in the Fourier space, we can use the information provided by the model evaluated in a subset of the frequency-domain. Using that subset, we reduce the computational effort of computing function G 1 from O(sw) to O(s) once we have chosen an appropriate amount of frequencies, which will be fixed (the total cost for computing the reduced model is O(s 2 log(s)) which can be dramatically lower than the full model). For the rest of the frequencies, we set G 1 (s n , z, ω) = 0. In Fig. 7 a comparison of the full model (where we calculate G 1 (s n , z, ω) without truncating) with the reduced model is shown (both after convolution with the same IRF). In log-scale there is a good agreement except at very early and very late times. However, this disagreement will be treated in the next Section as we will not consider those regions where there is a large relative error. The EKF uses an observation operator which is not the model compared directly against the measurements, but its convolution with the IRF instead, i.e. H(x) = g * R(x) . Such convolution must be compared using both, full and reduced model, to see how much information is lost. In order to quantify the information gain of using the full vs. the reduced model, we use the Kullback-Leibler divergence [26], which measures how much information is lost when we replace the full model with the reduced one.
According to this test, the obtained value of the divergence is 5.4852×10 −4 , which is the same information lost as in the case of a standard Gaussian distribution, considered as the full model, replaced by a Gaussian distribution of zero mean and standard deviation of 0.9770, which suggests that the information lost is negligible. The Kullback-Leibler divergence is also used as a measure of discrepancy between probability distributions, so this interpretation is also appropriate because the DTOFs can be regarded as probability distributions. Figure 8 shows the speed gain between the full model implemented in GPU and the reduced model. The total speed improvement is almost 60 times.

Implementation of the EKF
The implementation of the EKF is straightforward once the inputs are specified. However, the main assumption of the method is the normality of the measurement error, i.e., if the measurement y obtained from a sample with optical properties µ a, j , µ s, j and layer thickness l j , given an appropriate model H evaluated in the same properties, we have: (H(µ a, j , µ s, j ), R) (11) y − H(µ a, j , µ s, j ) ∼ N (0, R) Which means that the samples, after appropriate scaling, are independent and identically distributed samples of a standard normal distribution (D is the matrix obtained after Cholesky factorization [27] of R, i.e. DD T = R). This allows to develop a routine that can verify normality of the data. First, the covariance matrix R must be provided (this should be the same covariance matrix that will be provided to the EKF, Γ t ) and its Cholesky factorization, D, must be calculated. Second, using the model and the corresponding optical properties, calculate z = D −1 (y − H(µ a, j , µ s, j )). Finally, perform a hypothesis test on the normality of the resulting sample to decide whether the method is applicable or not. Also, considering Poisson statistics, it is possible to apply the same reasoning as above by considering the following transformation when the expected value of counts is large [28]: (y − H(µ a, j , µ s, j )) √ y ∼ N(0, 1) (16) and the procedure described above holds for R a diagonal matrix with the square root of the measurement. As our model is based on the diffusion approximation, care must be taken when applying the method described above, especially for early times, where the assumed diffusive regime fails [1,29]. An example of this situation can be seen in Fig. 9 where, within the first nanosecond, the residuals are not in the range [−3, 3] where the samples should lie with 99% probability, suggesting that our discrepancy is not only caused by noise but also because of the model which is not appropriate in that region and we get a histogram with way too heavy tails to follow a Gaussian distribution. The proposal here is similar to previous works [30]; the DTOFs and the model (reduced or not) are cropped and the region of interest selected follows a Gaussian distribution. In Fig. 10 the DTOF is cropped between 1 and 5 ns, where the resulting residuals histogram follows a Gaussian distribution as expected. In this particular case the employed test was the Kolmogorov-Smirnov [28] test, which uses the empirical cumulative distribution to infer whether the sample comes from another proposed distribution, in our case, a standard Normal one. After cropping the resulting data, the test fails to reject the normality hypothesis, allowing us to apply the EKF over this data.

Results
In this Section we discuss the results of two particular situations using the data acquired as explained in Section 2.2. First, we retrieve the parameters of interest µ a,1 and µ a,2 considering the rest of the parameters fixed (namely µ s,1 , µ s,2 , l and t 0 ). The second retrieval is by allowing the entire set of parameters to vary. The data was pre-processed in the way explained in the previous Section starting by the leftmost 5% percent of the maximum of the DTOF and ending in the rightmost 0.5% of the maximum of the DTOF, where we ensure to have a Gaussian distribution of the statistical uncertainty. In this way, we can get information from the lower layer by collecting late photons. As explained above, after we ensure that we can apply the EKF, we must provide the initial distributions in order to apply it. For the parameter distributions of µ a, j , µ s, j and l, we use the same change of variables as in [15], but not for t 0 which we assume that follows a Gaussian distribution, as it can be either positive or negative. We assume that the parameters are independent a priori and, using a 3σ-confidence interval around the mean we obtain the values in Table 1.
In Fig. 11 the MAP for each absorption coefficient is shown, together with their corresponding limits of the 99% credibility intervals, where the behavior is consistent with the nominal values as well as the reported results in prior works [12,20], which make use of the same experimental data. It is possible to see that these intervals are very narrow, giving high confidence to the recovery. In particular, the interval for µ a,2 is narrower than the interval for µ a,1 ; the reason for this might rely in the fact that the DTOFs were cropped at rather late times only, gathering in this way a large fraction of information from late photons which should have travelled mostly through the second layer. As the method evolves through the iterations, we make use of information of past measurements which can lead to higher uncertainty as can be seen in Fig. 11. This phenomenon is well-known and can be reduced using memory-fading variants [31].  Figure 12 presents the recovery results for all six parameters. Note that in this case the retrieved values of µ a,1 and µ a,2 show qualitatively the same behaviour as those obtained when fixing the other parameters, i.e., the algorithm correctly predicts the expected changes in both absorption coefficients throughout all the measurements (except, for example, in measurement 21); however, the credibility intervals are wider, being this effect particularly evident for µ a,1 . We can also observe that µ s,1 remains almost unaltered along the measurements, as expected, although the interval for µ s,2 is wider than the interval for µ s,1 . The reason for this behavior is that large variations in the scattering of the second layer do not change dramatically the shape of the DTOFs [16], resulting in high uncertainties for this particular parameter. The uncertainty increases also because of the correlation between the effects produced by different parameters and their impact in the model, i.e. each parameter does not have a unique effect in the shape of the DTOF (masked also because of the IRF) and this increases the plausibility region.
Although the only parameters that evolve in the experiment are µ a,1 and µ a,2 and the rest are constant, the latter must also be estimated. We expect the parameters to converge to the corresponding constant and then stay there. That behavior can be seen in all the constant parameters, showing the robustness of our method. The thickness of the first layer, which starts at 11 mm, takes some more iterations to reach a value closer to the nominal one of 9 mm. This is a reasonable behaviour, considering that the corresponding variance was set rather low, implying good a priori knowledge and, consequently, high difficulty in presenting large changes through iterations. Finally, the retrieved value for t 0 , which was initially considered to be 0.006 ns, instantly increases up to almost 0.025 ns which is about two or three channels in our experimental resolution, and remains stable for the rest of the measurements. Figure 13 shows the marginalized distribution for each parameter, as opposed to Fig. 12, where we present point estimates. As explained in Ref. [15], the result of the EKF is a multivariate distribution which can be too complex to analyze, in order to study each parameter separately, is is possible to marginalize every parameter by integrating the distribution with respect to all the parameters but the one of interest. This gives us information of the uncertain of the parameter of interest after considering all the possible values of the other ones (according to our full distribution). In this case an asymmetry of the rescaled parameters can be observed, which is the result of using log-normal distributions. Since we are dealing with distributions, it can also be seen that values closer to the MAP present higher probability of being correct than those far away from it, even when lying inside the 99% credibility interval.
In Fig. 14 we can see the good agreement between the recovery and the collected data for the particular case of measurement 2 and the Reduced-χ 2 estimate between the recovery and the measurement, which shows a good agreement. The window used is shown in the lower plot of Fig. 15 which is the result of the analysis performed previously in Section 5. The resulting time window is [1.1238, 3.9822] ns and it was used for all measurements. It is important to note that the change in frequency actually changes the temporal resolution; by interpolation it is possible to perform the required calculation within the chosen interval. Figure 15 summarizes the reduced χ 2 estimate for the analysis of all measurements. Although we are minimizing two terms (measurement and evolution) together, the EKF is capable of obtaining a good estimate according to the error variance.

Discussion and conclusions
In this work we present an implementation of the Extended Kalman Filter for determining the optical parameters as well as the thicknesses of layered media through single-distance, time-resolved reflectance measurements. This implementation incorporates several different improvements with respect to the algorithm introduced by some of the authors in a previous publication [15], by simplifying the data acquisition using just a single pair of light source and detector and avoiding a calibration procedure. It is also implemented in the time-domain, leaving aside the need of further data processing to take it to the frequency-domain. The present algorithm was validated with measurements on a two-layered liquid phantom, which have already been used in previous works [20], suggesting that this method could be enhanced in order to implement it in true dynamic situations. The algorithm is a probabilistic approach whose output returns a distribution fully characterized by its mean and covariance, and it can be analyzed using statistics tools.
We have introduced two possible approaches for the acceleration of the forward model. They are useful to reduce the recovery time which is directly related to the forward model computation times. The GPU approach led to a 4x improvement, but a better implementation can obtain a higher reduction. This is also technology-dependent which means that a better GPU may also obtain better times. The second approach is a model reduction which loses some numerical precision in order to gain speed. We have shown that, for the studied cases, this loss in negligible. The gain in this case was about 15x against our GPU implementation, which means a total gain of 60x compared to a full CPU implementation. A proper numerical analysis of the loss can be performed so the error can be bounded.
We have also performed an analysis of the DTOF that justifies its crop, needed by the EKF. As the measurements are described by Poisson statistics, if we take many measurements, we can apply the Central Limit Theorem [28] to describe the noise statistics of the mean through a Normal Distribution. However, there are some troubles in the tails of the error distribution, especially if we use a reduced model because the main loss is in that region, making the crop necessary. This was tested using the Kolmogorov-Smirnov test that failed to reject our hypotheses.
In Ref. [15], a calibration procedure was implemented to make measurements and theory each other compatible. In this work, we have shown that this calibration procedure was not necessary, reducing some uncertainties introduced by it; namely, the regularized deconvolution is unnecessary, leading to a simpler and faster method but also less prone to errors, in the region of interest where we have worked. This method can be applied to the multi-distance approach, as well as the analysis performed in Section 5, to enhance the recovery of parameters, i.e. the true values lies within our a posteriori credibility intervals, and those were to be narrow. Since the advantage of the multi-distance approach is the availability of perspectives through different layers, as the distances are related to the light penetration in the media, the combination of approaches of this work and [15] might result in a better performance, in terms of speed and certainty.
Despite that the set of measurements was not meant to be used for a dynamic setup, it could be used artificially to that purpose. The objective of using the extended Kalman Filter instead of performing a recovery from each measurement separately was to incorporate information from past measurements, given that they are not "far" from each other, since separated recoveries would mean start over and waste information recently acquired. It could be argued that the use of a recent recovery and used as an initial point of a subsequent iteration, would be analogous to the proposed approach, but this would neglect the information provided by the updated covariance matrix Γ t+1 |t+1 . This is the idea of using prior information, to use something obtained before (or even assumed) in the retrieval of information in a further step. The extended Kalman Filter works for Normal distributions, but other approaches are available when this is not true, i.e. the Unscented Kalman Filter, the Particle Filter, the Gaussian Sum, etc. [31].
In a future work we propose to establish a comparison in the performance of this new implementation of the EKF between the time-domain and the frequency-domain approaches.