Comparison of source localization techniques in diffuse optical tomography for fNIRS application using a realistic head model

: Functional near-infrared spectroscopy (fNIRS) is a non-invasive imaging technique that elicits growing interest for research and clinical applications. In the last decade, efforts have been made to develop a mathematical framework in order to image the effective sources of hemoglobin variations in brain tissues. Different approaches can be used to impose additional information or constraints when reconstructing the cerebral images of an ill-posed problem. The goal of this study is to compare the performance and limitations of several source localization techniques in the context of fNIRS tomography using individual anatomical magnetic resonance imaging (MRI) to model light propagation. The forward problem is solved using a Monte Carlo simulation of light propagation in the tissues. The inverse problem has been linearized using the Rytov approximation. Then, Tikhonov regularization applied to least squares, truncated singular value decomposition, backprojection, L1-norm regularization, minimum norm estimates, low resolution electromagnetic tomography and Bayesian model averaging techniques are compared using a receiver operating characteristic analysis, blurring and localization error measures. Using realistic simulations (n = 450) and data acquired from a human participant, this study depicts how these source localization techniques all with a single compared to multiple clusters. of sensitivity. rLSQR computing concurrent specificity localization in real fNIRS data of a visual experiment, showed results in concordance with retinotopic organization of the visual previous fMRI and fNIRS studies. We this a use these in the fNIRS context, accurate image better of fNIRS and multimodal studies in


Introduction
Functional near-infrared spectroscopy (fNIRS) is a non-invasive neuroimaging technique growing in popularity both in the research and clinical realms due to several distinct advantages. First, the ability to interact directly with the participant during recording is a major advantage, especially in pediatric populations or with individuals with significant cognitive and behavioral impairments. Second, recent development of portable fNIRS systems makes cognitive and clinical experiments easier and less costly than functional magnetic resonance imaging (fMRI) [1]. Although the temporal resolution of fNIRS is higher than it is for fMRI, its spatial resolution is relatively low (~5 to 10 mm) with a depth of ~2 to 3 cm into the cortical surface due to the limited penetration of diffuse light sources into tissues [2]. fNIRS measurements can be used to infer images in 3D, reflecting those areas where the light has undergone changes in intensity, a technique called diffuse optical tomography (DOT). The head tissues are highly scattering medium, and light propagation can be described by the diffusion approximation (DA), which is an approximation of the radiative transfer equation (RTE) [3]. Analytical solution exists for the DA in simple geometry but solving it in complex medium such as brain tissues is non-trivial. In the case of complex and inhomogeneous geometries, such as brain tissues, an accurate way to solve the DA is to use a discrete model based on the optical properties of each tissue (i.e., scalp, skin, cerebrospinal fluid, grey matter, and white matter). Discrete approaches, such as finite element modeling (FEM) or numerical Monte Carlo simulation, can then be applied [4][5][6][7][8]. A discretized solution using finite element modeling (FEM) for realistic head mesh geometry has been added to the NIRFAST [7] and TOAST + + [8] software programs. Modeling light transmission from sources to sensors across the head is known as the forward problem in DOT. In most cases, standard brain templates are sufficiently effective to solve this forward problem [9,10]. However, the use of brain templates may lead to significant errors in forward model geometry under certain circumstances, notably in clinical settings when dealing with patients who have had brain surgery or present CSF or skull asymmetries [2]. The forward problem has been solved for realistic geometries using a Monte Carlo approach [5,6,9], which simulates the absorption, scattering, reflectance and anisotropy of each tissue in individual 3D head geometry. This approach involves few assumptions on the medium geometry which may reduce the localization error in DOT reconstruction [11,12]. The localization of the absorption changes along the way of the diffuse light is possible using the relationship between the boundary measurements on the scalp and laws of light propagation. This leads to an underdetermined inverse problem, i.e. to an ill-posed problem that does not have a unique solution. In fact, the boundary measurements might be explained by a linear variation resulting primarily from a hemodynamic change within the head volume. Scattering and absorption are possible to identify using nonlinear iterative reconstruction methods based on the Newton conjugate gradient or Gauss-Newton approaches and considering phase and intensity measurement using Frequency-domain system. This study will emphasize only on reconstruction using light intensity variations measured by continuous wave fNIRS systems, which can be easily applied in multiple clinical scenarios given their relatively low cost and the ease with which they can be miniaturized [13]. Due to the non-uniqueness of the solution, several approaches have been proposed to solve this inverse problem, such as the least squares method with and without Tikhonov regularization as implemented in NIRFAST [7], and the back-projection (BP) algorithm used in PMI toolbox (PMI Laboratory, Athinoula A. Martinos Center for Biomedical Imaging, Charlestown, MA, USA). These methods, together with the truncated singular value decomposition (tSVD) [14] usually lead to smooth solutions, thus yielding blurry reconstructed images rather than focused activations [15,16]. Therefore, a variety of sparse image reconstruction methods, such as those based on minimizing the L1-norm of the solution, have been introduced in DOT [17][18][19][20][21]. In addition, intensive research to solve similar ill-posed linear problems has been conducted in magnetoencephalography (MEG) and electroencephalography (EEG), leading to the development of various methods for source localization [22]. Based on this knowledge, several alternative approaches can be adapted to DOT, such as: minimum norm estimates (MNE) [23], low resolution electromagnetic tomography (LORETA) [24] and Bayesian model averaging (BMA) [25]. To our knowledge, only very few papers have used the Bayesian approach to solve this type of fNIRS inverse problem [26][27][28]. One group has tried a hierarchical Bayesian model for DOT in human brains [26,27] in which different Gaussian priors are used for the blood flow in the scalp (smoother) and the cerebral blood flow. Miyamoto et al. (2015) proposed the same variational Bayes procedure using the typical prior of automatic relevant determination for the activations, which is similar to minimum norm estimates but using a different hyperparameter to control sparsity of the activation in each voxel [28]. These two methods do not use the BMA approach as this implies evaluating the goodness of different models to solve the inverse problem. The method based on the Bayesian formalism that we use in this paper has been shown to be an effective method to improve spatial resolution and decreasing ghost sources in electrophysiological brain imaging [25].
Similar to the EEG/MEG inverse problem, in the fNIRS context there is no clear way to know the exact true distribution of sources inside the brain in a particular state, making it impossible to decide which of the inverse solutions proposed in the literature is the correct one. The best approach is then to compare different methods using a large set of simulations covering all possible realistic tomographic configurations (complying or not with the assumptions made by each method) by using some measures of the quality of the reconstruction. So far, very few comparisons of these reconstruction techniques using fNIRS data have been published, leading to uncertainty about the best technique to solve this illposed and ill-conditioned problem. Using a brain template and a semi-infinite medium, Habermehl and colleagues compared several inverse solutions (tSVD, MNE family, and beamforming) [10]. Other validations were also performed using a complex structure in a phantom material [15]. At the same time, there is not a unique set of quality measures to assess the adequacy of the reconstruction, since complex configurations with more than one active source of different extent are very difficult to compare. In the EEG literature, it has been shown that simple linear measures such as Pearson correlation are not useful as the main attributes to explore are the capability of locating the main activation and estimating the extent of the sources [29,30]. In the last years, more general and nonlinear quality measures based on ROC analysis and the earth mover's distance [31][32][33] have been added to the more traditional measures of localization error and blurring when comparing inverse problem methods [34,35].
In this study, we calculated the forward model using realistic individual 3D head geometry [6,11,12], and compared seven different reconstruction techniques. ROC curve analysis, blurring and localization error measures in presence of noise and deep sources were used as quality criteria to compare image reconstructions obtained from 450 simulated fNIRS data sets. The following reconstruction techniques for DOT were compared: Tikhonov regularization applied to least squares regression (rLSQR), truncated singular value decomposition (tSVD), back-projection (BP), L1-norm based regularization (L1), minimum norm estimates (MNE), low resolution electromagnetic tomography (LORETA) and Bayesian model averaging (BMA). Finally, these algorithms were also applied to experimental data acquired during a visual task to illustrate the validity of these DOT methods during a real fNIRS recording in a human participant.

Linearization-forward model
As the biological tissue is a highly scattering medium, the photon transport equation can be described as a diffusion process. In this condition, the DA equation depicts ( , ) where N is the refractive index and c 0 is the speed of light in the vacuum. In a continuous wave equipment, the frequency ω = 0, whereas using a frequency domain instrument, the light is modulated by frequency ω≠0. In the fNIRS context and with both types of instruments, we measure light intensity to estimate the hemodynamic variations occurring a few seconds after a stimulation. Although the DA equation is nonlinear, in the context of fNIRS it could be linearized by using the Rytov approximation to achieve image reconstruction [3]. The first Rytov approximation estimates the logarithmic perturbation between the optical densities in two states ( Φ and 0 Φ , the latter corresponding to an initial state at time 0 t ), also known as delta optical density (dOD): In this case, the scattering coefficient variations are negligible relative to chromophore absorption [36] thus the scattering coefficient may be assumed to be constant. As the absorption perturbation Δµ a (proportional to hemoglobine variation) is small compared to absorption in tissues themselves, the forward problem can be discretized and linearized as a function of time (t), position of the source (r s ), position of the detector (r d ) and light wavelength ( λ ) which define the tissues optical properties: where G represents the Green's function of Eq. (1) (solution of the homogeneous equation). An analytical closed form for this function is very difficult to obtain in a head model with realistic geometry and optical properties, therefore these functions are computed by using a Monte Carlo simulation approach. Basically, numerical Green's functions are computed as the normalized photon density (photon fluency) between each boundary measurement (defined by a pair of source and detector) and each voxel inside the brain. Details of the Monte Carlo procedure using the diffusion, absorption and scattering coefficients of the tissues and a realistic geometry of the brain is described in the next section. In Eq. represents the relative contribution of absorption in voxels inside the brain to each of the boundary measurements. Each element of this matrix is calculated by combining the Green's functions involved in Eq. (3), and each i-th column can then be interpreted as the delta optical density that would be measured if a variation in absorption of unit magnitude occurs only in the corresponding i-th voxel in the brain.

Light propagation -Monte Carlo simulation
Photon fluency in the initial condition, i.e. without perturbation of absorption due to brain activity, was computed by using the diffusion equation to simulate photon migration with a Monte Carlo (MC) procedure [6]. Monte Carlo is a statistical simulation method in which the paths of photons are traced as they are scattered and absorbed within the medium. The medium was defined using magnetic resonance imaging of an individual anatomical 1.5-T T1-weighted scan (Magnetom Vision, Siemens Electric, Erlangen, Germany), with an isotropic voxel size of 1 mm 3 to identify tissue geometry. Skin, skull, cerebrospinal fluid (CSF), and grey and white matters were segmented. Automatic segmentations were computed on CSF, white and grey matters (SPM8, Wellcome Department of Imaging Neuroscience, University College, London, UK) [37]. Skin and skull were segmented using Brainsuite14C [38]. Visual inspection was performed to ensure the quality of segmentation. Positions of sources, detectors, and fiducial markers (nasion, left and right preauricular points) were digitized on a subject's head with a stereotactic system (BrainsightTM Frameless 39, Rogue Research, Canada). Anatomical MRI and position measurements were co-registered using a rigid body transformation. Eight sources and eight detectors were placed over the occipital cortex, for a total of 64 boundary measurements (i.e. source-detector pair). All measurements with a distance higher than 5 cm were excluded from analyses since they had a pSNR lower than 10dB in experimental data. Measurements with a sourcedetector distance between 3 and 5 cm, which are the optimal distances for fNIRS [39], were kept for analyses, for a total of 54 measurements ( Fig. 1(A)). For each of them, the Monte  [15][16][17] were solved using the commercial software Neuronic Source Localizer [46]. In this work, we are considering the linear underdetermined inverse problem of the form: where y represents the data vector with m elements, x the unknown vector of n elements and A is the sensitivity m n × matrix, for the case in which m n  . As the measurements are always affected by external sources of noise, this problem needs to be solved taking into account the error or noise term ξ . The methods we use and compare here are described as follows.

Regularized least squares by QR decomposition (rLSQR), truncated singular value decomposition (tSVD) and back-projection (BP)
The regularized least squares approach (rLSQR) is based on classical Tikhonov regularization by adding a penalty or regularization term to the least squares function that needs to be minimized [43,47]. The regularization term is a constraint or penalty on the solution that represents "a priori" information about the system and usually improves the estimated solution and reduces its sensitivity to noise. In our case, this term consisted in the squared Frobenius norm of the difference between the solution and the initial condition (i.e. the constant value of the brain tissues without perturbation). The solution is then found as: The influence of the penalty term is adjusted by the regularization parameter α . The choice of this parameter will fundamentally affect the reconstructed image. To adjust this parameter, we used an automatic procedure based on the L-curve criteria. This consists in exploring the L-curve, i.e. the plot of the norm of the residual error for a particular regularized solution  y Ax( ) α − versus the norm of the regularized solution  x( ) α . Assuming that the optimal regularization parameter α is a trade-off between these two magnitudes, its value is selected as the one corresponding to the point with the highest curvature for a logarithmic range of α values [48]. A simpler approach, the tSVD, attempts to compute the pseudo-inverse of the sensitivity matrix without taking into account the smallest singular values that lead to the numerical unstability. In this way, the solution will only reflect the main contribution of the sensitivity matrix [14]. Given the singular value decomposition of A as:  If A is full rank, both M and M t represent the number of measurements. Otherwise, the sum in Eq. (6) is "truncated" to discard the terms corresponding to singular values that are zero or very small. To find an optimal value for the number M t of singular values to consider, the Lcurve criteria was also employed, i.e. M t corresponded to the point of highest curvature in the L-curve [48]. An even simpler approach, the BP technique, consists of back-projecting the boundary measurements in the sensitivity matrix as follows [49,50]: This approach can be interpreted as assuming that the sensitivity matrix is orthogonal in a general sense (e.g. A T being an estimate of its pseudo-inverse). Although it usually leads to an overestimation of the amplitude in the presence of overlapping measurements, this method has been used in the literature [51,52] and included in the PMI toolbox (PMI Laboratory, Athinoula A. Martinos Center for Biomedical Imaging, Charlestown, MA, USA) to perform basic image reconstruction.

L1-norm regularization applied to least squares regression (L1)
When the changes in absorption for a particular experiment are expected to occur in a small area inside the brain, the reconstructed DOT images should be spatially sparse (i.e. show only nonzero values in a few voxels). This information can be added in the regularization approach by using as a penalty term the L1-norm of the solution, in the form: The In this work, we will use a simple L1 least squares procedure, based on a truncated Newton interior-point method described in [45] and publicly available online (L1-LS Toolbox, http://www.stanford.edu/~boyd/l1_ls/). Although it is designed to work faster when the sensitivity matrix is sparse, which is not our case, we chose this method as it can handle high-dimensional data with an acceptable computational speed [20,45]. Like the other methods compared here, the regularized solution depends on the choice of the regularization parameter α . In previous works this has been usually selected ad hoc in a range of 0.1-0.01, depending on experimental noise levels [20]. However, we decided to use the L-curve method to select an optimal parameter in the same way as we did with the rLSQR and tSVD solutions, in order to carry out a fair comparison.

Minimum norm estimates (MNE) and low resolution electromagnetic tomography (LORETA)
MNE and LORETA can also be formulated as instances of the regularization approach. They both use a regularization term consisting in the squared L2-norm of the solution. In the case of MNE, the rationale is to assume that the best solution is the one with minimum overall energy, i.e. the one with the smallest L2-norm [23]. Therefore, the solution is found as the one that minimizes a cost function as: This formulation allows to simply derive an equation for the solution, which depends on the regularization parameter: where I n is the n n × identity matrix. Note that the effect of α is precisely to regularize the (quasi-) singular matrix A A T , by adding this constant value to every diagonal element. Although this equation includes the inverse of a very large matrix, the computational cost can be significantly reduced by using the singular value decomposition of the sensitivity matrix, making it a very fast and simple estimation. Typically, the main activations estimated by this solution are biased to be very close to the position of the measurement sensors. This is explained by the fact that lights coming from deeper sources are more attenuated, and this is reflected in the sensitivity matrix, which means that the solutions will tend to have the highest values closest to the sensors.
With a very similar formulation, LORETA finds the smoothest solution by introducing a spatial roughness operator L in the penalty term [24]. This roughness operator is a discrete version of the Laplacian operator (i.e. second spatial derivatives), so asking for a minimal second derivative will end up with very spatially smooth solutions. The LORETA solution is found from: The Laplacian operator can incorporate in practice the information about each voxel's neighborhood and implicitly considers boundary conditions, i.e. how to penalize edge voxels with fewer neighbors than deeper ones. This type of condition usually penalizes the edges and the main activations tend to "move" to locations with many neighbors. Thus, LORETA typically outperforms MNE in terms of both bias towards surface sources and localization error, but it does not completely solve these problems. These two solutions, as well as the rLSQR, tSVD and BP, are known as linear inverse methods as the estimated solution is a linear combination of the measurements (see Eqs. (6), (7), (10), (12)). These two solutions strongly depend on the choice of the regularization parameter α and suffer from an additional problem that is common to all distributed linear inverse methods: the existence of ghost activations, i.e. when using simulated sources, these methods tend to show more sources than the simulated ones [24,57]. In both cases, MNE and LORETA, the regularization parameter can be chosen by heuristic methods such as the L-curve, or the minimization of information criteria such Akaike's, Bayesian's or of the generalized crossvalidation function (GCV) [58]. For computing these solutions we obtained the optimal regularization parameter by minimizing the generalized cross-validation function using 50 values in a logarithmic range, as implemented in Neuronic Source Localizer software [46].

Bayesian model averaging (BMA)
Although BMA is usually referred to as a statistical method itself, this general framework can also be used in conjunction with any of the methods described before. In our case, we will use the original implementation of the BMA applied to the solution of the EEG/MEG inverse problem using a version of LORETA solution constrained to anatomical models [25]. Mathematically, this approach uses a full Bayesian formulation of the problem, which starts by assuming a normal probability density function (pdf) for the measurement noise that in turn defines the pdf of the data y given the unknown parameters x: This term is known as likelihood and θ represents all other parameters involved in the distribution (hyperparameters), for instance the variance, which can be treated as known or unknown. The estimation of the parameters (inverse solution) is based on the Bayes theorem which offers a way to compute its "a posteriori" pdf as: In this equation we have used θ again for representing the hyperparameters, now including also those coming from the "a priori" pdf ( ) x θ, k p H . We have also included explicitly a variable k H for representing the k-th model assumed, i.e. any other particular choice for the model, such as the spatial correlation matrix or the subset of elements to be used in the estimation. This equation allows what is known as the first level of inference, i.e. estimating the solution with highest posterior probability by finding the value of the vector x that maximizes the numerator of the equation. The denominator can be regarded as a normalization constant (it does not depend on x and therefore it does not affect the estimation of the solution) and computed by integrating the numerator over x. This magnitude is also called the "evidence" of the hyperparameters, interpreted as the probability of the data for a particular given set of hyperparameters θ. If the hyperparameters are assumed to be unknown, the second level of inference can be written with the Bayes theorem again to find the posterior pdf for θ: From this equation, we can estimate the most probable values of the hyperparameters which are equivalent to the regularization parameters in the classical regularization approaches previously explained. In this sense, instead of evaluating the solution for many possible values and deciding heuristically which one is best, the Bayesian formalism allows us to obtain update equations and use iterative algorithms to estimate both parameters and hyperparameters. Then, again, the denominator of this equation can be computed by integrating the numerator over the hyperparameters. This pdf is also called the "evidence" for the model k H , i.e. the probability of the data given a chosen model, which can be interpreted as a measure of how good is the chosen model to explain the measured data. The third level of inference is then to find the posterior pdf of the models using the Bayes theorem again: Finally, the Bayesian model averaging consists in marginalizing over all models to find a posterior pdf of the solution, independently of the chosen model: which will favor models that receive more support from the data and penalize those with low evidence. It has been demonstrated that this implementation of BMA can address the bias towards cortical sources that affects the linear inverse solutions, allowing for the estimation of deeper sources with greater accuracy. Additionally, it provides solutions with significantly less localization error and higher resolution as compared to traditional methods. As with the original implementation, here we use the first and second levels of inference (for each model) with the mathematical priors that lead to a solution equivalent to LORETA but using the Bayesian updates for the regularization parameters instead of the L-curve or the GCV heuristic methods. However, in the case of fNIRS, the light does not reach all parts of the brain and it does not make sense to use all regions of the brain as models. Moreover, when using too many models it is impossible to compute all of them and a very computationally expensive Markov Chain Monte Carlo algorithm is needed to explore models in a probabilistic way. Therefore, to test the conceptual performance of the BMA in the context of fNIRS inverse problem, we follow here a simple preliminary application of the method in which anatomical prior models are obtained by dividing the brain area reached by the light, mainly the visual cortex, into four quadrants. This means that we will have 2 4 = 16 possible models and therefore they can all be computed, avoiding the use of the Markov Chain Monte Carlo algorithm. Future work could be devoted to studying other anatomical or spatial models that more closely reflect the biophysical aspects of the fNIRS measurements.

Simulation data and statistical analysis
A total of 450 activations were simulated. Each was a set of binary activated voxels at different positions over the region of interest. For each method, a combination of single (150 simulations) and multiple (150 simulations) square-like active clusters, with volumes from 3 to 20 mm 3 , were randomly placed in a selected anatomical model in order to cover a wide variability of cortical activation scenarios in the first ROI excluding CSF, with a light intensity of 0.001%, in a grid of 4032 isotropic 2x2x2 mm 3 voxels (Fig. 1(B)). Additional simulations were prepared to compare the performance of the methods in the presence of noise and deep activations. Gaussian noise was added to simulated data in order to have a peak Signal-to-Noise ratio of pSNR = 20dB. This is a typical value which was in fact measured in average in our in-vivo raw data by using the intensity recorded in boundary measurements with a very long source-detector distance (around 7.5 cm) as a measure of the noise. The pSNR is defined as the ratio between the maximum value of the signal (dOD) and the standard deviation of the Gaussian noise in dB units: Finally, to study the performance of the methods when estimating deep activations, 150 additional square-like clusters of 20 mm 3 in size, were simulated with 25 to 50 mm depth from the skin surface, in a larger second ROI with a new grid of 35115 voxels with light intensity higher than 0.0001%. Several quality metrics are used to discriminate the methods. Since reconstruction is a continuous variable evaluated for each voxel, the statements of an active or inactive voxel will change based on the threshold selection. Receiver operating characteristic (ROC) graphs have the ability to evaluate a binary classifier as its discrimination threshold varies [60]. These graphs have been previously used to measure the quality of the reconstruction in inverse problem [22,32,61,62]. ROC graphs are plots of the sensitivity vs. (1-specificity) for all possible thresholds of the reconstructed image. In our case, the sensitivity is the ratio of voxels active in the simulation (true activations) that were also active in the reconstruction, while specificity is the ratio of all non-active voxels in the simulation that were estimated also as non-active. As the selection of an optimum threshold T is usually difficult and depends on the problem at hand, a typical measure of the overall ability for reconstruction is the area under the ROC curve (AUC), which is 1 for a perfect reconstruction and around 0.5 for a random reconstruction. In practice, a threshold must be defined to establish a particular solution. Therefore, several methods have been proposed to derive an optimal threshold from the ROC curves obtained for large data sets, which can then be applied (generalized) to new data where there is no gold standard. In this work, we select the threshold that maximizes the Youden index (sensitivity + specificity -1) [63,64]. We averaged the thresholds corresponding to this index for all the simulations to select a common overall threshold to be applied to all inverse solutions for a fair comparison. Using the common threshold, we report the values of sensitivity, specificity and accuracy, which is the ratio of correct classifications (either active or not active voxels) over all voxels.
A gradient-based measure of blurring and localization error were used to evaluate the performance of the different reconstruction algorithms. A common measure of image quality evaluates the blurring of the reconstructed sources by estimating the root mean squared (RMS) of the gradient [65]: A high value indicates a blurred image, while a low value indicates an image with a sharp contour. The absolute difference between the values of this measure for the estimated solution and for the initial simulated cluster is reported as a relative measure of blurring. Localization error consists in the Euclidean distance between the center of the simulated target and the location of the maximum peak of the estimated solution and can be calculated in all cases where only one cluster was active: Finally, a non-parametric Friedman rank test was used to compare the performance of the methods across all simulations [66]. Post hoc Tukey analysis was applied to report the methods that were statistically identical to the best-ranked method. The median of the localization error and the associated Friedman rank test are reported in presence of noise and deep cluster simulations.

Experimental data acquisition and analysis
It is always of interest to see how reconstruction algorithms work with real data. The same configuration of 8 sources (690 and 830 nm wavelengths) and 8 detectors was used to record a healthy participant with an Imagent Tissue Oximeter (ISS Inc., Champaign, Ill, USA). This equipment is a frequency domain system. However, in the current study, we consider only light variation intensity (DC intensity) to measure hemoglobin concentration using a formulation of the Beer-Lambert law [3]. Therefore, the intensity signal is comparable to a signal provided by a continuous wave instrument. A healthy right-handed 25-year-old woman was exposed to a visual stimulation. Each of the ten blocks of stimulation included a 15-seconds fixation cross period, followed by a 30-seconds stimulation and a 15-seconds fixation cross period. Each stimulus was composed of a high-contrast black and white checkerboard presented in either the left or right lower corner of the screen at a reversing frequency of 2 Hz (Fig. 4(A) and (E), bottom). This experiment was chosen based on the well-known cytoarchitecture of the human visual cortex, which is characterized by a functional organization map of a point-to-point inversed projection from retina to cortex (retinotopic effect). Based on previous studies [67,68], the lower right corner visual stimulation is expected to evoke a superior left visual cortex response, whereas the lower left stimulus should induce a superior right visual cortex hemodynamic response. Using this paradigm, we aimed to investigate the effectiveness of the reconstruction algorithms to spatially localize individual data and show the retinotopic effect.
Following data acquisition, measurement distances between 3 and 5 cm were kept for the analysis. Distances larger than 5 cm were all contaminated by noise (pSNR < 10dB) and excluded from the simulation and in vivo analysis. Movement artifacts are characterized by a high frequency and very abrupt change in light intensity; these periods were removed from the analysis [69]. The dOD was calculated to measure the light variation between the prestimulus period of 5 seconds Eq. (2). A low-pass filter of 0.1 Hz was applied to remove cardiac and breathing artifacts. The ten blocks were averaged over a period of 25 s (from 10 to 35 s from the beginning of the stimulation period) to obtain the boundary measurement for two wavelengths λ 1 = 690 nm, λ 2 = 830 nm. The inverse problem was applying on HbO and HbR directly [70] using the formulation of the Beer-Lambert law as: where [HbO] Δ and [HbR] Δ are the concentration variations of the chromophores oxyhemoglobin and deoxyhemoglobin in the blood, respectively. The parameter ε is the extinction coefficient for each wavelength and each chromophore [71].

Comparison of source localization methods
Image reconstruction methods were compared based on several image quality criteria: AUC, sensitivity, specificity, accuracy, gradient-based blurring, and localization error. Friedman analysis classifies methods according to each of these quality criteria, such that the highest rank (7) means the best performance and lowest rank (1) means the worst performance. Since the results of the Friedman classifications were similar for both wavelengths (690 and 830 nm), only results from 830 nm wavelength simulations are reported. Table 2 reports Friedman average ranks and the mean values for each of these quality measures. The post hoc Tukey-Kramer analysis identified the methods that are equivalent to the best rank method (highlighted in gray in Table 2). Overall, BMA showed relatively good and stable ranks across all different quality measures (4.5 or higher), while rLSQR and L1 method showed the best ranks in particular measures. The good performance of BMA is based on a small localization error (mean of 8.6 mm, rank 5.1) and a low blurring (15.9, 5.0), together with acceptably high specificity (95.9%, 4.8), and accuracy (92.7%, 4.8). In the few cases BMA offered an incorrect localization, the simulated clusters were located in two or more different areas of the prior atlas segmentation of BMA (see Fig. 5 in the Appendix). rLSQR showed the best average ranks among all methods on the AUC (mean of 94.1%, rank of 6.2) and sensitivity (73.6%, 5.7), which means that it is always finding activations around the real active sources. However, the source distribution is usually more blurred and spread out than the original simulation (38.4, 2.9). As expected, L1 method showed the best performance for specificity (97.6%, 6.9) and accuracy (94.2%, 6.9) -at the expense of the worst sensitivity (1.7%, 1.3)-as it provided the sparsest solutions, which also make it similar to BMA in providing the best relative blurring (3.3, 5.8) i.e. the most focus activation. To better nuance the limitations of each algorithm, single and multiple activated clusters were analyzed separately. For all criteria, the classification in terms of average ranks was the same when using a single and multiple clusters, and all methods showed better performance with a single cluster compared to multiple clusters. p-value < 10 -5 < 10 -5 < 10 -5 < 10 -5 < 10 -5 < 10 -5 a The best methods according to the Friedman rank were identified by post-hoc Tukey-Kramer analysis and highlighted in gray (significantly different from the others but not among them).
Typical solutions obtained from the 90, 70, 30 and 10 percentiles of the average AUC distribution, are shown in Fig. 2 as examples of the reconstructed images; the highest percentiles being the more accurate results, regardless the method. Note that rLSQR and tSVD accurately depicted the simulated cluster with some blurring. BMA reconstruction is sharper, i.e., better in terms of specificity; however, it fails to reconstruct some clusters (localization error higher than 1 cm in 9% of the simulations). The L1 solution always showed the sparsest images as estimated activations consisted of one to three voxels maximum (smaller than real activations), but it did not always placed the activations in the right location. BP overestimated the amplitude in many voxels leading to high blurring and low specificity, while MNE and LORETA presented the worst performance in terms of AUC over simulations.
Additional simulations were performed to investigate the effect of additive Gaussian noise and depth activation on the localization error. The median localization error across the different types of simulations are presented in Fig. 3. The Friedman ranks for this simulation study are reported in Table 3 and again BMA showed the more stable performance along all conditions (highest average rank of 4.5). With the Gaussian noise, LORETA has the highest Friedman rank followed by the BMA and L1 methods. In addition, the Friedman rank increased in noisy data with respect to noise-free simulations for L1, MNE and LORETA probably due to an effective regularization, while it decreases for BMA, rLSRQ, tSVD and remain similar for BP. When reconstructing simulations with deep sources, LORETA and BMA also showed the highest ranks. Nevertheless, for simulations deeper than 25 mm, the median error is higher than 10 mm for all methods, suggesting that none of the methods is highly reliable for estimation of deep sources (Fig. 3). In Fig. 6 in the Appendix, we show a scatter plot of the distribution of localization error vs. the depth of activation for all simulations for noise-free and noisy data.  and reconstructions of HbO and HbR variation using BMA (C, G) and rLSQR (D, H). Those methods were selected because they showed the highest average ranks across all criteria in the previous evaluation (see Table 2 and Table 3). As expected, a cluster of activated voxels in the left occipital lobe (Talairach coordinates: −20, −90, −4) for the right visual stimulation was found for HbO and HbR. Similarly, a cluster of activated voxels was found in the right occipital lobe in response to the left visual stimulation (Talairach coordinates:14, −95, −4). These results are in agreement with retinotopic organization of the visual cortex as demonstrated by previous fMRI and fNIRS studies [67,68].

Discussion
In addition to a reliable estimation of the location and size of the main activations, the best reconstruction method should combine high sensitivity (ability to find the active sources) and specificity (ability to avoid ghost sources, i.e., estimated sources located where there is no activity). Also, in simulations, the AUC is a good general measure to show how well reconstructed are active and non-active voxels in an image independently of an arbitrary threshold selection. Our results across 450 different simulation scenarios suggest that the rLSQR and BMA methods are the most flexible methods and therefore the most appropriate for the analysis of real fNIRS data. On the one hand, rLSQR is a fast computation method, which has been widely used in DOT reconstruction [7,72]. Moreover, it is stable, always identifies sources close to the actual ones and is less sensitive to noise than BP and tSVD if the regularization parameter is adequately selected. However, the source distribution is usually more blurred and spread out than the original simulation (Fig. 2). On the other hand, among all techniques, BMA provided the best overall results (highest average ranks), mainly due to a high specificity and a small blurring and localization error, even in the presence of noise and deep activations.
A major problem with classical methods for DOT is that they often provide heavily blurred images [15,16]. Our results, in accordance with previous studies, showed that the L1 method has the ability to provide sparser reconstructed images and reduce the blurry effect on the images, at the cost of higher computational burden. However, it does not totally compensate for the depth of the sources and not always provide accurate localization [14,20]. Even when this method showed similar localization errors than the other blurred estimates, such sparse images show a clear separation between estimated and real sources while blurred images show activations that cover the position of the real source even if the maximum activation is not exactly located in the right voxel. It is worth to mention that in our study we have used an L1 least squares algorithm for computing the L1 solution base on a truncated Newton interior-point method [45]. This is a simple approach that allowed fast computations to make the comparison using hundreds of simulations feasible, (recall that we needed to compute 50 solutions for different regularization parameters to select an optimal value using the L-curve method). This method seemed to work well as localization errors did not increase in the presence of noise. However, the truncated Newton interior-point method can work less efficiently if the regularization parameter is too small, and/or the columns of the sensitivity matrix are highly correlated [45]. Although we believe that our comparison results reflect the general behavior of L1-based sparse solutions [56], we do not discard the possibility that other variants using tailored algorithms can lead to better performance in the type of simulations explored here.
Using BMA, the sources correctly reconstructed are visually close in size and localization from the simulated ones. To our knowledge, BMA is not commonly used in fNIRS inverse problem reconstruction. Therefore, it would be interesting to further investigate the potential of this method with regard to its excellent specificity, combined with a good localization. Although many simulations achieved the best reconstruction using BMA, this method failed to reconstruct the source at the correct localization in a few cases (less than 10%) similarly to other methods. Some refinement in the original BMA method [25] might improve the performance in a DOT context. In this study, we used a simple non-realistic approach for the anatomical atlas, which consisted in dividing the occipital cortex into four large regions. We observed that most of the cases where BMA offered an incorrect localization, the simulated clusters were located in two or more different areas of the prior atlas segmentation of BMA. This could be explained by the fact that BMA behaves like a model selection method, such that usually only one model survives in the averaging processing. As further exploration, a more realistic anatomical or different prior segmentation might also be considered in the future using BMA.
According to our simulations, tSVD can also be considered a good option for DOT, as it generated stable results similar to rLSQR. However, BP, MNE, and LORETA are not the best candidates for estimating sources in fNIRS data. The back-projection (BP) technique is conceptually simple but tends to overestimate the cerebral response where multiple boundary measurements overlap. MNE and LORETA are not the best methods according to the ranks obtained with the Friedman test (Table 2) in the case of noise-free data, but they perform much better in the presence of additive noise with respect to the localization error. In most simulations, the quality measures obtained using these two solutions were just behind those given by the other methods, except for some simulations where MNE estimated less blurred solutions. It should also be pointed that these methods were computed using the Neuronic Source Localizer software, specifically oriented to the estimation of EEG/MEG sources. For instance, this software uses the generalized cross-validation function [58] adjusted for EEG and MEG applications for selecting the regularization parameter, which might not be optimal in the fNIRS context.
As a general point, in this work we computed the normalized reconstructed images (i.e. after estimation each image was divided by its maximum value), following a common practice in DOT [12]. In this context, researchers using these methods recommend focussing on evaluating the ability of correct localization of the maximum values and spatial distribution of sources, as we did here with the quality measures analyzed in Table 2. However, the study on the ability to correctly estimate the magnitude of variation of absorption should be carried out in the near future. Another interesting point to consider while performing source localization analyses is the computational efficiency. Although the Monte Carlo simulation might be costly in computational time, the duration of the calculation has been reduced by a factor of 300 compared to the original version by using the Accelerated Graphics Processing Units [5,6]. For the inverse solution calculation, all techniques used in the current study are quite fast at computing, except for L1 method and the BMA that has a variable computational time. First, it varies according to the size of the grid. For example, the simulation using the BMA method takes a few seconds with a grid of 4032 voxels compared to a few minutes with a grid of 35112 voxels. The calculation time will also be affected by the number of models to explore.
Theoretically, DOT estimation entails its own limitations, which cannot be overcome by the use of any of the specific reconstruction techniques evaluated here. One limitation of this study is that the exact values of the optical parameters of the tissues (µ a0 , µ s , g, N) are unknown in this specific human participant. For each tissue, standard values from the literature [40] were used rather than measuring the optical properties of the participant's tissues. In addition, this study only investigates changes in blood oxygenation measured using the attenuated light and the modified Beer-Lambert law and a linearization of the DA equation with the Rytov approximation. This method provides a linear relationship to fNIRS data and assumes the scattering coefficient variations are negligible relative to chromophore absorption [3,36]. Another limitation is that DOT resolution is limited by the number of boundary measurements at the surface. We used 54 measurements to cover the occipital cortex and the average localization errors were less than 1 cm close to the surface and 2 cm in deep tissues using BMA reconstruction. This geometric configuration was chosen to maximize the spatial sensitivity over the region of interest [62]. This configuration appeared to be more sensitive than the regular "star" montages (sources surrounded by detectors) due to a higher number of overlapping measurements. In order to reconstruct images within a 1 mm 3 scale with a highly scattering medium, the number of overlapping measurement sensors must be very high (e.g., over 10 7 ), which is not achievable in practical clinical recordings [15]. In this work, the participant's MRI was carefully segmented instead of using a template brain atlas. The anatomical MRI helped to improve spatial localization of DOT by providing improved geometric modeling of the inhomogeneity in brain tissues and use of cortical spatial constraint [11]. Accurate modeling of the light in the volume could further minimize the effects of the forward problem assumptions on the localization error and the other quality measures [12]. In a clinical context, because it is not always possible to meet every technical requirement to perform DOT, a quick projection of the measurements onto a template or an individual MRI-derived scalp-as presented in Fig. 4(B) and (F)-may adequately regionalize the activity without the spatial accuracy of DOT. This representation is interesting, especially when there is too little overlap in the sensors to perform the inverse estimation with a good resolution.
In the current study, the simulations were designed to cover different sizes of activations and different numbers of clusters, not necessarily complying with the typical constraints imposed by the methods, to reveal the limitations of the reconstruction algorithms. We created these realistic simulations instead of using phantom tissues to obtain a large number of scenarios. Recent developments of geometrically complex 3D-printed phantom using a standard 3D printer will allow us to test, in the future, inverse methods with precisely known properties and geometries for complex configurations [73].

Conclusions
In conclusion, our findings suggest that the reconstruction technique must be chosen carefully in order to obtain an accurate localization. The performance of most of the methods declined when multiple regions of the brain were active, as well as in presence of noise or deep sources. Simulation results suggested that BMA provides images with less blurring and lower localization error, even in the case of noisy data and deep sources. The BMA algorithm, initially designed for EEG and MEG, showed promising results in DOT and may be further improved using rLSQR or L1-norm regularization instead of LORETA in computing the solution for each anatomical model. rLSQR is robust to additive Gaussian noise and offers the best AUC and sensitivity, but more blurred reconstructed images than BMA. L1 method provides the sparsest solutions, but this leads to a poor performance in terms of sensitivity. Based on these simulations, rLSQR approaches proved to be appropriate for computing DOT and its concurrent use with BMA might improve specificity and reduce blurred images. Source localization in real fNIRS data of a visual stimulation experiment, also showed results in concordance with retinotopic organization of the visual cortex as demonstrated by previous fMRI and fNIRS studies. We hope this study encourages a wider use of these techniques in the fNIRS context, given that an accurate image reconstruction is essential for a better understanding of fNIRS and multimodal studies in clinical or research purpose.