Direct approach to compute Jacobians for diffuse optical tomography using perturbation Monte Carlo-based photon “replay”

: Perturbation Monte Carlo (pMC) has been previously proposed to rapidly recompute optical measurements when small perturbations of optical properties are considered, but it was largely restricted to changes associated with prior tissue segments or regions-of-interest. In this work, we expand pMC to compute spatially and temporally resolved sensitivity proﬁles, i.e. the Jacobians, for diﬀuse optical tomography (DOT) applications. By recording the pseudo random number generator (PRNG) seeds of each detected photon, we are able to “replay” all detected photons to directly create the 3D sensitivity proﬁles for both absorption and scattering coeﬃcients. We validate the replay-based Jacobians against the traditional adjoint Monte Carlo (aMC) method, and demonstrate the feasibility of using this approach for eﬃcient 3D image reconstructions using in vitro hyperspectral wide-ﬁeld DOT measurements. The strengths and limitations of the replay approach regarding its computational eﬃciency and accuracy are discussed, in comparison with aMC, for point-detector systems as well as wide-ﬁeld pattern-based and hyperspectral imaging systems. The replay approach has been implemented in both of our open-source MC simulators - MCX and MMC (http://mcx.space)


Introduction
Diffuse optical tomography (DOT) and fluorescence molecular tomography (FMT) with nearinfrared light have been active research areas for both clinical and pre-clinical applications for over three decades. The appeal of these tomographic techniques resides in their high sensitivity, relatively low cost and unique utilization of intrinsic and exogenous functional biomarkers in charactering optically-thick tissue samples noninvasively [1]. These methods aim to recover the 3D distributions of either fluorophores or intrinsic absorption and scattering contrasts inside turbid media by employing measurement data collected on the surface of tissues and match them with an accurate light propagation forward model through solving an inverse problem [2]. Over the past years, DOT and FMT have been successfully applied to imaging human breast cancers [3], prostate cancers [4], muscle functions [5], joint conditions [6], peripheral vascular diseases [7], brain functions [8], small animals [9], and have also found applications in imaging ex vivo biological samples [10] as well as engineered tissues [11]. Regardless of the application, building an accurate and efficient forward model remains crucial to the quantitative imaging performance of inverse model-based techniques.
It has been widely agreed that the radiative transfer equation (RTE) provides an accurate model for light propagation in complex bio-tissues but it is known to be difficult to solve. Therefore, the diffusion equation (DE), a low-order expansion to the RTE, is usually adopted as an alternative to generate approximated solutions [12]. With the introduction of more sophisticated imaging strategies over the past decade, such as mesoscopic imaging [13], time-domain imaging with early photons [14] and whole-body imaging with structured light [15,16], DE becomes increasingly limited in both accuracy and flexibility. In these cases, the Monte Carlo (MC) method, a stochastic solver to the RTE, has gained increasing popularity.
Because MC solves for the RTE directly, in many studies, it was considered as the "gold standard" for modeling photon migration in biological tissues [17,18]. Nevertheless, MC was not deemed suitable for providing the forward solutions in DOT/FMT due primarily to two limitations. First, only limited types of domain configurations, such as layered and voxelated space, are supported by public MC packages, restricting their use in modeling complex tissues. This limitation was addressed by the recent development of mesh-based MC methods [19,20]. Second, MC has been associated with high computational expenses when simulating large number of photons especially in 3D applications. Thanks to the wide availability of modern accelerators such as general purpose graphics processing unit (GPGPU), the simulation time of MC has been reduced by orders of magnitude over the past years [21]. Additionally, efficient MC formulations such as the perturbation Monte Carlo (pMC) have been proposed to speed up the forward computation of MC [22,23]. Hence, the combination of efficient MC formulations with increased computation power has led to a rejuvenated interest in using MC to compute the forward model for DOT/FMT [24,25].
A Jacobian matrix, also known as the sensitivity matrix, is a central element in model-based DOT/FMT image reconstruction. It represents a map associating the changes of localized optical properties with the changes of measurements from selected source-detector pairs. Traditionally, the adjoint Monte Carlo (aMC) is adopted as an efficient method to calculate the Jacobians, especially when optodes are point-sources and point-detectors [25,26]. The adjoint method works by multiplying the light fields of a forward simulation, computed by propagating photons from the source position, and an adjoint simulation, computed by propagating photons from an imaginary source at the detector position [27]. However, with the rise of more advanced DOT imaging architectures, such as those leveraging structured light illumination or detection [28], the aMC method becomes cumbersome with the rapid increase in data dimensionality via spatial, temporal and spectral multiplexing. The investigations to these expanded data dimensions have been driving the development of next generation DOT/FMT systems and resulted in significantly improved 3D imaging performances [29,30]. In many of these studies, pMC was found to be well suited as it benefits greatly from rescaling strategies that do not require relaunching MC forward simulations. However, studies utilizing pMC have been primarily focused on analyzing domains with known segmentations; the perturbations being estimated are restricted to known tissue segments or regions-of-interest, and are not spatially distributed. Extending pMC to support spatially resolved image reconstructions can greatly benefit DOT in exploring additional measurement dimensions.
In this paper, we propose an approach to efficiently build the Jacobians via perturbation Monte Carlo with photon numerical "replay" , where the trajectory information of detected photons is first recorded from a "baseline" simulation and then utilized to build sensitivity profiles. By storing the pseudo random number generator (PRNG) seeds and recreating the photon trajectories in an MC simulation [31], we can avoid the prohibitive memory cost of storing photon "biographies" and efficiently generate spatially and temporally resolved Jacobians with only a small overhead. Additional optimization strategies are investigated for hyperspectral imaging and single-pixel detection to further accelerate the Jacobian calculation.
The remaining sections of the paper are organized as follows. In Section 2, we will briefly review perturbation Monte Carlo principles and derive spatial/temporal Jacobians for both absorption or scattering perturbations from pMC formulations. We will also provide a brief review on adjoint Monte Carlo. In Section 3, we will detail the "photon replay" algorithm for direct Jacobian constructions. We will particularly discuss two scenarios -camera-based wide-field detection and hyperspectral imaging. In Section 4, we validate the Jacobians from replay with those from the adjoint method -first provide a simple in silico DOT example as a demonstration, and then another example in solving an in vitro hyperspectral wide-field DOT problem. Detailed discussions and comparisons between the replay and adjoint methods are provided, followed by conclusions and future improvements in Section 5.

Forward and inverse perturbation Monte Carlo
When a photon is launched from a source in a Monte Carlo simulation, it undergoes multiple scattering events before reaching a detector (or exits without being detected), and its weight decreases exponentially along the trajectory following the Beer-Lambert law [32]. Sometimes, the optical properties, i.e. absorption coefficient (µ a ), scattering coefficient (µ s ), and anisotropy factor (g), can be perturbed, ( µ a , µ s , g), from their background values (µ a , µ s , g) in a small region, such as in a tumor. In such cases, instead of running a new MC simulation with perturbed optical properties, we can predict the measurements by processing the recorded photon "biographies" from the background simulation, known as the perturbation Monte Carlo method [22,23]. For simplicity, here we assume g is not perturbed. In such case, the detected photon weight can be expressed as: where w and w are the photon weights before and after the perturbation and µ t = µ a + µ s is the transport coefficient; L and p denote the total trajectory lengths and the number of scattering events of the photon inside the perturbed region, respectively. Note that we have implemented a continuous absorption weighting scheme [19,21], referred to as the "microscopic Beer-Lambert law" approach (mBLL) in [33]. In this method, a photon's trajectory is solely determined by µ s and the photon weight loss is solely determined by µ a . This is different from the "albedo-weight" approach [33] commonly used in pMC literature [23,24], but was shown to be equivalent to the latter [33]. For a complex domain, we rewrite Eq. (1) into a spatially resolved form [24,34] as: where Ω j ( j = 1, 2, · · · , M) denote discretized spatial regions (such as segmentations, voxels or tetrahedra); δµ s = µ s − µ s and δµ a = µ a − µ a . Next, we can apply the rescaling relationship to all detected photons within a time gate, leading to the accumulated time-resolved form as: where k = 1, 2, · · · , N T is the index of photons detected during the selected time gate T (out of N tot photons). Note here that T is not the accumulative time-of-flight for the k th photon in region Ω j , instead, it is the total time-of-flight when the k th photon is captured by the detector.
If we perturb the absorption or scattering coefficient one region (Ω j ) at a time, we can obtain the corresponding change δW due to the perturbation as: respectively. We take the Taylor's expansion for each right-hand-side term in Eqs. (4)(5) and keep only the first order terms. Then we can compute the Jacobians in the form of limits as: Finally, if we define Jacobians as the sensitivity of the relative change of measurements, i.e.
∆W (T) = W (T) − W (T) /W , to localized optical property changes, where W = N tot k=1 w k is the accumulated photon weights from all time gates, Eqs. (6)(7) can be further simplified to whereL are the average path lengths and scattering counts of all photons traversing Ω j in time gate T, respectively.

The adjoint Monte Carlo
Derived from the diffusion equation [35,36] under the Rytov approximation, the continuous-wave (CW) Jacobians for absorption and scattering perturbations are expressed as where G r j , r s is the Green's function defined at any discretized region in the medium r j ∈ Ω j due to a source located at r s , obtained by solving the forward model; Φ r d , r j denotes the quantity measured at the detector located at r d as a result of the scattered light originated from the medium at r j . In previous literature, the measurement Φ is often defined as the fluence on the domain boundary, immediately inside the diffusive media. Note that the formulation to perform FMT is very similar to Eq. (10) when using the Born normalized approach. For simplicity, we will focus on DOT hereinafter but interested readers are referred to [34]. In many realistic DOT systems, the quantity being measured is not the fluence, but the total diffuse reflectance or transmittance (R) at a detector immediately outside the domain [37], i.e.
where all w 0 is the total simulated photon weights, detected w is the total detected photon weights, and A d is the area of the phantom surface that's covered by the detector. In some other DOT systems, the measurement quantity is defined as the flux ( ì J) along the normal direction (ì n) of the detector aperture. In these cases, we have where ì v is the direction vector of a detected photon. Note that fluence, diffuse reflectance and flux all have the same unit of mm -2 ; one must use caution and choose the proper form for Φ .
If the measurement is the fluence inside the diffusive media, i.e. Φ r d , r j = G r d , r j , it can then be simply replaced by the adjoint forward solution of the fluence G r j , r d as a result of reciprocity of the Green's function, i.e. Φ r d , r j = G r d , r j = G r j , r d . However, when the measured quantity is diffuse reflectance or normal flux on a refractive index mismatched boundary, reciprocity is not valid any more. In such cases, the relationship between R (r d , r) , or J n , and the adjoint fluence Green's function G r j , r d is more complex, and is related to 1) the boundary condition, 2) the exiting photon angular distribution [38], 3) the asymmetry between the emitting profile of the adjoint source and the receiving profile of the detector, and 4) surface specular reflection. Although deriving an analytical relationship is possible in such cases for a planar boundary within the diffusion regime [12,39], the measurement can typically be expressed as a scaled version of the adjoint fluence solution, i.e. Φ r d , r j = αG r j , r d . The scaling factor α can be numerically calibrated by two sets of MC simulations. Also note that ∇Φ r d , r j = −α∇G r j , r d because the two gradients have opposite directions. Thus, Eqs. (10-11) can be ultimately re-written as For a typical tissue-like domain (n = 1.37) on a flat surface, α is found to have a typical value of 8.47 when R is measured, and 12.21 when J n is measured.

Solving the DOT inverse problem
After computing the sensitivity matrices from either the adjoint or perturbation MC, the DOT inverse problem typically requires solving the below equation where J + is the pseudo-inverse of the sensitivity matrix J µ a and/or J µ s , ∆µ is the vector of the perturbed absorption ∆µ a and/or scattering ∆µ s coefficients in all regions. The measurement perturbation ∆Φ is formulated with both unperturbed, Φ 0 , and perturbed measurements, Φ , under either the Born The Born approximation is valid when the perturbation is small compared to the background properties, whereas the Rytov approximation assumes that the perturbed field varies slowly. The latter often provides more accurate results in modeling biological tissues [35]. Due to the highly scattering nature of near-infrared light in the tissue, Eq. (16) is known to be ill-posed [2]. As a result, one must apply regularization to stabilize the solution. The standard Tikhonov regularization is often used. When there are more measurements than unknowns, i.e. over-determined, solving the below equation is more efficient [40]: and that for the underdetermined cases (more unknowns than measurements) is where λ is the regularization parameter, L is the regularization matrix, I is an identity matrix.

Photon replay algorithm
Based on Eqs. (8-9), the main challenge for building Jacobians with pMC is to find an efficient way to calculate the weighted average scattering event countsp and photon trajectory lengths L , both in time and space. A simple approach is to allocate a memory buffer with twice the numbers of voxels or elements, B, multiplied by the number of time gates, T, for each simulated photon, to storep Ω j , T andL Ω j , T during the propagation. However, the drawbacks of this method are apparent. First, the memory for storing 2B×T values needs to be pre-allocated for each detector. When the number of detectors is large, which is typical for DOT applications, or multiple photons are simulated simultaneously, such as GPU-based MC, the memory usage becomes unrealistic, especially for GPUs where memory utility is highly restrictive. Secondly, we do not know in advance if a photon will arrive at the detector. In many cases, only a small portion of the photons are captured, resulting in significant redundant calculations. The "photon replay" method was inspired by the "reseeded" pMC method proposed by Sassaroli [31]. By taking advantage of the deterministic pseudo-random number generator (RNG) sequence followed by a seed, one only needs to store the photon's initial RNG state at launch-time when it is captured by a detector. By reinitializing the photon with the saved RNG state, the photon will repeat its trajectory as in the baseline simulation and is guaranteed to be detected again at the same position. We therefore propose a photon "replay" algorithm combining the reseeded MC with the Jacobian formulations in Eqs. (8)(9), with a schematic shown in Fig. 1.
The photon replay algorithm is performed in two steps. First, a baseline Monte Carlo simulation is performed where a large number of photons with random seeds are launched. If a photon is captured by one of the specified detectors, its arrival time t , final weight w , RNG seed value s at launch, and the detector index are stored. After the number of photons reach the target, a second Monte Carlo simulation, i.e. the photon replay, is performed for each detector. This time, we initialize each new photon with the stored seed value s k , from the k th detected photon in the baseline simulation, so that they are guaranteed to be captured again. During its re-propagation, instead of depositing photon weight losses as in the baseline MC, we utilize the stored detected photon weight w k and deposit either weighted trajectory lengths w k L j ("wl" mode) or weighted scattering numbers w k p j ("wp" mode) at any given discretized region (Ω j ) and time gate (binned by photon's total time-of-flight t k ), as described in Eqs. (6)(7). After all detected photons are replayed, the weighted average valuesL r j , t andp r j , t are generated and used to compute the temporal and spatial resolved Jacobians according to Eqs. (8-9).

Improved replay efficiency in wide-field and hyperspectral imaging
With the development of spatial light modulators (SLMs), structured light illumination such as spatial frequency domain imaging (SFDI) and camera based single-pixel detection strategies have gained increasing popularity [28]. This novel system design has shown great potential to improve data acquisition speed as well as signal-to-noise ratio (SNR) for DOT/FMT, making full-body small-animal imaging more efficient. Because the SNR of the Jacobians from replay largely depends on the number of detected photons, a single-pixel detection over a large field-of-view (FOV) can potentially benefit more from replay compared to point detectors.
Typically, dozens of detection patterns belong to the same class are used, such as 36 patterns in the quantized low-frequency base and 64 patterns in the Hadamard base [41]. Since they share the same detection area, we can run only one replay MC to generate Jacobians for all detection patterns with any given source configuration. To achieve this, the only additional data that need to be stored is a photon's exiting bilinear position (x k , y k ) in the detector aperture. Then, when depositing w k L j and w k p j according to Eqs. (6-7), we simply loop over all detection patterns and modify w k to w k · w i d , where w i d is the corresponding "detector weight" , defined as the i th detection pattern intensity (0 to 1) at (x k , y k ). This approach is well suited for emerging methodologies in DOT/FMT such as compressive-sensing based or adaptive DOT where "optimal" patterns can be obtained either theoretically [42] or experimentally [43].
Another scenario where we can leverage the benefits of photon replay is multispectral or hyperspectral imaging, which provides enriched dataset at different wavelengths for unmixing various types of chromophores. The optical properties, either µ a or µ s , are usually slightly different as the wavelength changes, so we have to perform separate MC simulations at each wavelength if adjoint MC were to be used to build Jacobians. With the proposed replay method, we only need to run one baseline simulation and one replay for all wavelength channels, with the assumption that µ s is a constant over the measured wavelength range. Both of our published MC simulators, MCX [21,44] and MMC [19,45], utilize the mBLL method. Therefore, as long as µ s remains constant, we can exactly repeat the photon trajectory with the stored seeds and recalculate the detected photon weight w µ a k for a new µ a value using Eq. (1), without needing to rerun the baseline simulation [32]. Applying the rescaled w k in the replay will obtain the Jacobians of different wavelengths as proposed in [24].

Validation of replay-derived Jacobians
We first compare the Jacobians calculated by the adjoint method (aMC) and photon replay for a single source-detector pair in a homogeneous domain. All simulations are performed with our GPU-accelerated MC simulator "MCX" [21] using an NVIDIA GTX TITAN V GPU. The domain has a volume of 60×60×60 mm 3 and a voxel size of 1×1×1 mm 3 , with the optical properties µ a = 0.005 mm -1 , µ s = 1 mm -1 , g = 0 and n = 1.37. A pencil beam light source towards +z direction is placed at [29.5, 29.5, 0] mm and a detector with a radius 2.5 mm is located at [29.5, 39.5, 0] mm. To match the configurations of the two simulations, the light source for the adjoint MC simulation is set to a uniform disk source with 2.5 mm radius at the detector position, where the photons are launched uniformly within the disk towards -z direction. The constant α in Eqs. (14)(15)) is determined to be 8.47 through calibration.
For aMC, 10 8 photons were launched for both the forward and adjoint simulations, each with a runtime of ∼ 1.90 s. For the replay approach, the baseline simulation with 10 9 photons took 22.6 s, in which around 7.30×10 6 photons were detected. The replay steps to obtain the "weighted trajectory length" (wl) and "weighted scattering number" (wp) took 0.39 s and 0.30 s, respectively. All runtimes reported are the average value of 5 repeated tests. To reduce stochastic noise, we apply Anscombe transform on the raw outputs, i.e. G r j , r s , G r j , r d in aMC andL Ω j , p Ω j in replay, before filtering them with a 3-D Gaussian kernel with σ = 1, followed by an exact inverse Anscombe transform [46,47].
In Fig. 2, we show the absorption and scattering Jacobians from the adjoint approach in (a-c), those from the replay approach are in (d-f), along with contour plot comparisons between the two methods in (h-i). Distributions of weighted average pathlengthsL Ω j (same as the replay adjoint Jacobian by definition) and weighted average scattering numbersp Ω j /µ s Ω j in (d, g), respectively. All plots are in log-scale with base 10 extracted at plane x = 29.5 mm. Several remarks can be made from Fig. 2. First, the distributions ofL r j in Fig. 2(d) and p Ω j /µ s Ω j in Fig. 2(g) are very similar to each other. This can be explained by the following relationship: the average distance traveled by a single photon between two scattering events can be calculated as l = L/p , and the photon "mean free path" happens to be the inverse of scattering coefficient 1 µ s (assuming g = 0) in an mBLL MC photon simulation. Therefore, the accumulated L Ω j andp Ω j /µ s Ω j values for a collection of photons are also very close numerically at the same spatial location. Secondly, the overall magnitude of the scattering Jacobian is about 1-2 orders of magnitude lower than the absorption Jacobian. This is expected because the scattering Jacobian is calculated by subtracting images in (g) by (d), according to Eq. (9). For the same reason, the scattering Jacobian is much nosier than the absorption Jacobian, as the stochastic noise is relatively amplified with the smaller mean value.
Finally, in the scattering Jacobian, only a small region near the source-detector is negative while in majority of the space, it has a positive value. This is also expected. Mathematically,  15) the sign of J µ s Ω j is determined by the inner product of these two vectors, so we can see the points with negative scattering sensitivity values are inside the circle with the diameter of |r s − r d | . Physically, for photons launched from the source, a higher scattering coefficient along or close to the source-detector path tends to raise the possibility of deflecting photons away from the detector, thus reducing the total detected photon weight; on the other hand, a higher scattering coefficient far from the source-detector path tends to reflect photons back to the detector, thus increasing the total detected photon weight. Note here that, although g = 0 is used in this example, aMC and replay Jacobians are also expected to match when g is non-zero.
In addition, we also show the time-resolved absorption Jacobians J µ a Ω j , T , built with the replay method for a transmission measurement setup, in Fig. 3. The size of the phantom is 60×60×30 mm 3 with the same optical properties as above. The source remains at [29.5, 29.5, 0] mm and the detector locates at [29.5, 39.5, 30] mm. We recorded over 5 ns with 0.1 ns time step after launching 10 9 photons in the MC simulation, where the baseline and replay MC with 7.46×10 5 took 29.4 s and 0.11 s, respectively. We see that J µ a in an early time-gate ( Fig. 3(a)) has a significantly narrower span compared to a late time gate with the same accumulated photon weights W (T) , shown in Fig. 3(b), due to the ballistic photons with fewer scatterings [14]. Moreover, in Fig. 3(c), we compare the change of temporal point-spread functions (TPSFs) predicted by pMC and replay Jacobians, for a 20×20×10 mm 3 inclusion with δµ a = 10 −4 mm -1 at the center of the phantom (20 ≤ x, y ≤ 40 mm, 10 ≤ z ≤ 20 mm). The TPSFs derived from pMC (as Eq. (3)) and the replay Jacobian, calculated as J µ a · ∆µ a , match almost exactly. This is not surprising as both were derived from the same set of detected photons.

Sample reconstructions using the replay Jacobians
In this section, we test the reconstruction performance of both absorption and scattering Jacobians obtained from photon replay with simulated data. The tested slab domain contains 40×40×20 isotropic 1×1×1 mm 3 voxels. The optical properties for the background medium are µ a = 0.01 mm -1 , µ s = 1 mm -1 , g = 0 and n = 1.37. Although g is close to 0.9 in many biological tissues, in the diffusion regime, we used the equivalent settings, i.e. g = 0 and µ s as the reduced scattering coefficient (µ s ), to speed up MC computation. Two 8 mm diameter tubes are placed side-by-side with their axes 10 mm below the surface and 16 mm apart. We varied the optical properties of the two cylindrical inclusions and ran multiple simulations. The light source is a collimated Gaussian beam with a waist radius of 2 mm, and 8×8 raster scanning positions are applied to the phantom To generate the absorption and scattering Jacobians, we launched 10 9 photons from every source position in the baseline simulation and performed photon replays for each detector. On an NVIDIA TITAN V GPU, the average run-time for each baseline simulation was 9.63 s, where ∼ 4.1×10 7 photons on average were detected; the runtimes for replaying 36 detectors were 10.55 s and 7.20 s for the "wl" and "wp" steps, respectively, resulting in a total run-time of ∼ 29.2 minutes. The raw outputs were denoised as shown in Section 4.1 before constructing the full Jacobian of 2,304 (source-detector pairs) and 32,000 columns (voxel count). Meanwhile, we also generated 3 sets of simulated CW measurements (Φ 0 , Φ 1 and Φ 2 ), where Φ 0 : the medium is homogeneous with background optical properties; Φ 1 : the left tube has 3 times the background absorptionμ a = 0.03 mm -1 and Φ 2 : the right tube has twice the background scatteringμ s = 2 mm -1 . For each measurement set, 10 8 photons were used for simulations at each source position, taking an average of 1.05 s per simulation.
The inverse problems for absorption and scattering perturbations were solved based on Eq. (18) and optimal regularization parameters were chosen based on an L-curve approach. In Fig. 4, we show the reconstructed µ a cross-sections with Born approximation in (a-c) and those with Rytov approximation in (d-f). The absorption and scattering reconstructions (positive component) on z = 10 mm and y = 20 mm planes are displayed in the first and second columns separately, and the comparison between the ground truth and reconstructed mean values along y dimension on z = 10 mm plane are plotted in (c,f). Note that only one of µ a or µ s was perturbed in each sample reconstruction; the case where both µ a and µ s are perturbed is more complicated, normally requiring time-resolved measurements, matrix rescaling or other techniques, and is thus beyond the scope of this paper.
From Fig. 4, with either approximation, we have successfully reconstructed the true locations of the inclusions. However, the results from the Rytov approach appears to be more accurate, as evident when comparing Fig. 4(f) with (c). We also notice that the reconstructed scattering perturbation is noisier than absorption, as shown both on the surface and interior. This is expected because the scattering Jacobian is noisy as shown in Section 4.1. To suppress artefacts and obtain more accurate images, one can apply regularizations, adopt more advanced inverse solvers or utilize a priori structural information, which again are not the focus of this paper. We have demonstrated in silico that building Jacobians and solving inverse problem via photon replay are feasible for a typical DOT problem with many source-detector pairs.

Replay for hyperspectral wide-field DOT
Recently, we reported a novel hyperspectral time-resolved wide-field imaging system, capable of collecting in vivo small-animal whole-body data within 14 minutes [30]. One digital mirror device (DMD) is fiber-coupled with a pulsed laser to generate structured illumination patterns and a second DMD is coupled with a spectrophotometer-based photomultiplier tube detector (PMT) to enable single-pixel detection. In this section, we show that one can use the photon replay approach to efficiently generate Jacobians for this type of hyperspectral DOT system.
A liquid phantom of size 80×50×22.5 mm 3 was prepared as the imaging target. The background medium contains 0.8% Intralipid, 0.008‰ Epolight 2735 (Ep), and 0.024‰ India Ink (Ink). Two tubular inclusions with 5 mm radius were suspended 10 mm below the surface, as shown with the black dash lines in Figs. 5(a-b). Both were filled with the same Intralipid concentration as the background, while the left one is 2.25× higher in Ep and the right one is 1.83× higher in Ink concentration. Therefore, both tubes have the same µ s as the background but different µ a values, and their ratio with respect to the unit concentrations is 1.23.
The experiment was performed under a transmission configuration, where the field-of-view (FOV) was set to 40×30 mm 2 for both illumination (bottom) and detection (top). A total of 36 quantized low-frequency (QLF) illumination patterns and detection patterns were used in this experiment. These patterns was chosen for their high SNR at low photon counts [48].
CW data from 5 spectral channels, with the center wavelength ranging from 750 to 770 nm, were selected for unmixing two absorbers. Wavelength dependent optical properties of the background and two absorbing materials were calibrated for every channel. The µ s value remained approximately constant at 0.84 mm -1 , while µ a of background medium increased from 0.015 mm -1 to 0.016 mm -1 , µ a of 0.008‰ Ep increased from 7.2×10 -3 mm -1 to 8.6×10 -3 mm -1 , and µ a of 0.024‰ Ink decreased from 9.5×10 -3 mm -1 to 9.1×10 -3 mm -1 . We assume that all µ a values change linearly with the wavelength, as shown in Fig. 5(c).
In this example, we used MMC [45] for both the baseline and replay calculations using 36 CPU nodes in parallel on the Blue Gene/Q cluster, each consisting of a 16-core 1.6 GHz A2 processor at the Center for Computational Innovations (CCI) at Rensselaer Polytechnic Institute (RPI). The phantom mesh consisted of 17,073 nodes and 100,241 elements [49]. The baseline MC with 10 8 photons and µ a = 0.015 mm -1 took 16.8 minutes on average on the CPU cluster, where 5.3×10 6 photons were collected by the detector. With the technique introduced in Section 3.2, a single replay MC session created Jacobians for all 36 detection patterns in 15.7 minutes. Furthermore, under a constant scattering coefficient, we ran five replay simulations for different wavelength channels. Therefore, the total simulation time is 16.8 + 15.7×5 = 95.3 min.
Rytov approximation was again used to compute the measurement vector, and an optode calibration method was adopted [50] to reduce artifacts caused by fluctuations of pattern intensities in the experiment. The inverse problem was solved with a MATLAB built-in solver CGS with the default tolerance [51]. The reconstructed C ep and C ink at 50% isovolume are shown in Figs. 5(a-b). Despite distortions along the z-axis -an expected result of the fixed projection angle -the two absorbers were successfully differentiated with accurate locations and low surface artefacts. We calculate the reconstructed absorber concentrations as the mean of elements larger than 50% of the maximum value, and define the crosstalk of Ep as max(C ink( Ω ep)) max(C ep( Ω ep)) , whereC ep andC ink are normalized to their own maximum and Ω ep is the ground truth volume of Ep (vice versa for Ink). From Fig. 5(d), we found that with measurements from more wavelength channels, the crosstalk between Epolight 2735 and India Ink kept decreasing (from 100% to 50.4% and 18.0%), whereas the concentration ratio between two absorbers kept approaching the ground truth (123%). More details about further improving resolution/ratio and reducing crosstalk can be found at [48].

Discussions and conclusions
In this section, we discuss the pros and cons of the replay and adjoint approaches. First, we discuss the accuracy of the Jacobians from the two approaches. The replay Jacobians was derived based on pMC, built upon the direct connection between the individual detected photon data and the actual measurement changes. By contrast, the adjoint method relies on the reciprocity of light [27]. Thus, when sufficient number of photons are detected, we consider the replay to be more accurate than the adjoint method. We recognize that several factors may contribute to the derivations between the two Jacobians. Firstly, the adjoint method requires assuming an imaginary source that is located at the detector position. Isotropic or pencil-beam sources have been widely used as the adjoint sources, but the light emission profiles from these source forms are obviously different from the light reception profile of common detectors, such as an optical-fiber. To accurately simulate an adjoint source, a spatially and angularly distributed light source must be used, which is not commonly available. Secondly, the formulation of the adjoint Jacobian depends on the actual type of the measurement, such as fluence, flux or diffuse reflectance. Choosing different measurement quantities can result in different scaling factors, α , as we discussed in Section 2.2. In reality, the scaling factor due to the chosen measurement is effectively removed as part of the data calibration [52]. Next, we want to compare the two methods in terms of computational efficiency. If we follow the procedures described above, if launching M photons at each of the N s sources and N d non-overlapping detectors, the run-time for the adjoint method satisfies t ad ∝ [(N s + N d ) M] ; that for the replay method satisfies t re ∝ [(1 + f ) N s M] , where f is the fraction of the photons being detected in the baseline simulation. However, this comparison does not consider the differences of the two Jacobians in terms of SNRs due to their differences in the number of photons involved. Assuming a shot-noise model, we found that the SNR of the adjoint Jacobian is proportional to √ 2M while that for the replay is proportional to f M/N d . Therefore, to produce a Jacobian that matches the SNR of the adjoint computation, the replay method needs to launch 2N d f times more photons in the baseline simulation. This leads to an equivalent replay time of In the cases where the detected photons are shared among the detectors, such as in the single- with the assumption that the memory operation cost is neglectable compared to photon propagation. Similarly, in multi-/hyper-spectral imaging with only absorption perturbations, the detected photons can be replayed once for all N λ spectral channels, leading to a new running time of 2N d N s (1+ f )M f g d N λ . Lastly, in applications where iterative reconstruction is performed, the baseline simulation is only needed once at the first iteration if the replay method is used. The forward model outputs and Jacobians of the subsequent iterations can be computed rapidly using only the detected photons, yielding an overall t re ∝ 2N d N s (1+K f )M f g d N λ for K iterations. In comparison, the adjoint method requires repeated simulations at all sources and detectors in all iterations, giving t ad ∝ (N s + N d ) K M .
In Fig. 6, we show 3 sample configurations to demonstrate the pros and cons of the replay algorithm in terms of computational expense. The contour plots of the runtime ratios between the replay and the adjoint methods for building Jacobians, i.e. t re /t ad , are presented, with a range of source and detector numbers 1 ≤ N s , N d ≤ 20 . In Fig. 6(a), we show a case with single wavelength (N λ = 1), non-overlapping point detectors (g d = 1) and f = 0.1 . In Fig. 6(b), we show the runtime ratios for a multispectral single-pixel camera detection system with g d = N d , N λ = 10 and f = 0.1 . In Fig. 6(c), we show an example of iterative reconstruction (K = 10) for a hyperspectral (N λ = 10) point-detector (g d = 1) system with f = 0.3 . From Fig. 6(a), it appears that if the Jacobian SNR is the major concern, the replay method performs poorly in terms of speed compared with the adjoint method in point-source/detector configurations: when N s = N d = 10 , the replay takes 100-120 times longer than the adjoint to build a Jacobian with same SNR. In fact, the larger the source and detector numbers, the more efficient is the adjoint method. However, according to Fig. 6(b), when more detected photons are shared between detection channels, either in the form of spatial patterns or spectral channels, the replay method can be more efficient under the condition that N s N d < N λ 1+ f (2−N λ ) . Even for point-detection systems, utilizing iterative reconstruction and multispectral data can also make the replay method more effective compared to the adjoint, as suggested by Fig. 6(c).
In summary, we have proposed a novel approach named "photon replay" to directly and conveniently build time-resolved Jacobians for both absorption and scattering perturbations for diffuse optical tomography. The replay Jacobians were validated with the traditional adjoint approach and show efficiency in several scenarios, such as hyperspectral and single-pixel detection systems. We also demonstrated that the absorption Jacobians from replay can be practically useful for solving real-world DOT reconstructions; however, the scattering Jacobian suffers from stochastic noise despite large photon counts and additional denoising. We are currently working on new strategies to improve the quality of the scattering Jacobian.
The "replay" feature has been implemented and tested in both our voxel-and mesh-based MC platforms. The updated software, MCX and MMC, are freely available at http://mcx.space/. We strongly believe that these tools could contribute to a wide spectrum of explorations in diffuse optical imaging.

Disclosure
The authors declare that there are no known conflicts of interest related to this article.