Evolution strategies for optical tomographic characterization of homogeneous media

: The reconstruction problem in diffuse optical tomography can be formulated as an optimization problem, in which an objective function has to be minimized. Current model-based iterative image reconstruction schemes commonly use information about the gradient of the objective function to locate the minimum. These gradient-based search algorithms often find local minima close to an initial guess, or do not converge if the gradient is very small. If the initial guess is too far from the solution, gradient-based schemes prove inefficient for finding the global minimum. In this work we introduce evolution-strategy (ES) algorithms for diffuse optical tomography. These algorithms seek to find the global minimum and are less sensitive to initial guesses and regions with small gradients. We illustrate the fundamental concepts by comparing the performance of gradient-based schemes and ES algorithms in finding optical properties (absorption coefficient m a , scattering coefficient m s , and anisotropy factor g ) of a homogenous medium.


Introduction
In recent years considerable progress has been made towards the development of tomographic imaging systems that rely on strongly scattered near-infrared (NIR) light.[1][2][3] In this novel medical imaging technique, commonly referred to as diffuse optical tomography (DOT), one attempts to reconstruct the spatial distribution of optical properties (e.g., absorption coefficient µ a (r), scattering coefficients µ s (r), and scattering anisotropy factor g) within the body from surface measurements of transmitted light intensities.The technology for making such measurements on human subjects is readily available and has been applied in a variety of non-clinical and clinical pilot studies.[4][5][6][7] The development of algorithms that accurately and efficiently transform these measurements into cross-sectional images of the internal optical properties remains a major challenge.
Most of the currently employed reconstruction techniques in DOT are model-based iterative image reconstruction schemes (MOBIIR).[8][9][10][11][12][13][14][15][16][17] These schemes rely on a model of photon propagation in tissue to predict expected detector readings.All algorithms in References 8-17 employ as a forward model the diffusion equation, which is an approximation to the equation of radiative transfer.While the diffusion equation is easier to solve, it considers only the two parameters µ a and µ s ' = (1-g)µ s.Use of the equation of radiative transfer is required when, besides µ a and µ s ', the anisotropy factor g is to be included f as an independent variable.[18,19] Therefore, depending on what equation is used, the input parameters for the forward model are either given by the vector ξ(r) = (µ a (r), µ s '(r)) or ξ(r) = (µ a (r), µ s (r), g(r)) with r being a position inside the medium under consideration.Starting with an initial guess ξ 0 (r) the forward model generates an initial prediction of detector readings P s,d (ξ(r)) for each specific source s and detector d.These predictions are then compared to physical measurement data M s,d using an appropriately defined objective function Φ. Typically the χ 2 -error norm, which is given by  (1) is used.The parameter η s,d is a normalization constant, which for example can be set to M s,d (ξ ξ ξ ξ).In this case the error norm minimizes the sum of the squared percentage difference between measured and theoretical data.The challenge remains to find efficient ways of updating the initial guess ξ 0 (r) such that the differences between predicted and measured data become smaller and the value of the objective function decreases.
A host of different updating schemes have been devised over the past 8 years.[8][9][10][11][12][13][14][15][16][17][18] Of increasing interest are gradient-based iterative image reconstruction (GIIR) algorithms.[19][20][21][22][23] These codes make use of dΦ/dξ(r), which is the gradient of the objective function with respect to the spatial distribution of the optical properties.In the gradient approach the image reconstruction problem is interpreted as a nonlinear optimization problem, in which the distribution ξ min (r) that minimizes the objective function is sought.These schemes have somewhat alleviated the computational burden of the image reconstruction problem in DOT.However, it is well known that gradient-based minimization methods are prone to become trapped in a minimum close to the initial guess.They also have difficulties escaping from parameter regions in the search space with gradients close to zero.In general, gradient methods are not well suited to find the global minimum.
To overcome problems related to gradient-based schemes, we investigate in this work the use of so-called evolution-strategy (ES) algorithms in DOT.ES algorithms borrow from ideas in biological evolution theory to find global minima of objective functions.[24][25][26] They don't make use of gradient information during the optimization process and are therefore not sensitive to areas with gradients close to zero.In general, ES algorithms have four major components.(see also Fig. 1) Firstly, a parent population is randomly chosen (initialization).Each of the N members of the parent population is characterized by its phenotype consisting of n variables.In the next step, pairs of parents are formed, which produce a fixed number of offspring so that a total of M new individuals are available.The offspring are generated by so-called recombination and mutation rules.Finally, the fitness of each member of the offspring population is evaluated (selection).The members with the highest fitness become the new parent generation and go on to generate offspring by recombination and mutation.
To demonstrate how the concepts of ES algorithms can work in DOT we focus in this study on the simple yet significant problem of finding the optical properties of a homogenous medium based on measurements at the boundary of the medium.While this appears to be the simplest of all possible reconstruction problems, it surprisingly has attracted very little attention in the past.Indeed we are not aware of a single publication that addresses this problem.Currently there does not exist a unique and widely accepted method to determine optical properties of homogenous samples.It is noteworthy that neither the National Institute of Standards (NIST) in the US nor any other comparable organization in the world supplies standards with known optical properties for diffuse optical tomography.This poses a particular problem for researchers in optical tomography who want to validate their reconstruction codes.Scientists often use Intralipid suspensions to validate their algorithms.However, over the last 15 years several groups have measured the optical properties of Intralipid with different outcomes.[27][28][29][30] Flock et al [30] summarizes these studies by f giving the means and standard deviations of all works combined.They find that the optical properties of 10% Intralipid at 633nm are µ a = (0.027 ± 0.154) cm -1 , µ s = (144 ± 9) cm -1 , and g = (0.75 ± 0.18).Given these huge standard deviations, especially for µ a and g, it is clear that code validation is problematic, and that the accurate determination of optical properties of homogeneous media is still a challenging problem in itself.

Methods
We compare gradient-based MOBIIR schemes with evolution-strategy-based MOBIIR schemes.Both approaches use the same forward model to predict detector readings and the same definition of the objective function.The forward model used in this study is an upwindfinite-difference, discrete-ordinate code that solves the equation of radiative transfer.[19] This algorithm allows specifying the absorption coefficients µ a (r), the scattering coefficient µ s (r), and the angle-dependent scattering phase functions.We use in this work the widely applied Henyey-Greenstein phase function where g is the already mentioned anisotropy factor, which characterizes the angular distribution of scattering.[31] We apply Fresnel boundary conditions that take the refractiveindex mismatch at the air-medium boundary into account.For more details concerning the code, we refer readers to Reference 19.The forward model is used to numerically obtain predictions of the detector readings, P s,d (ξ ξ ξ ξ).These predictions depend on the source position, the detector position, and the optical properties ξ ξ ξ ξ = (µ a (r), µ s (r), g).The predicted data is compared to measured data using the objective function defined in Eq. 1.The goal is to find the minimum of this objective function.

Gradient-Based Optimization Scheme
To minimize Φ the gradient of the objective function γ γ γ γ o = dΦ(ξ ξ ξ ξ o )/dξ ξ ξ ξ at the initial guess ξ ξ ξ ξ o is calculated.To do this we use in this work a technique called adjoint differentiation.
Adjoint differentiation provides an efficient means to calculate the gradient of a numerically evaluated function in a time comparable to one forward calculation.[19,20,22,23] The gradient information is then used within a Polak-Ribieri conjugate-gradient scheme to find the minimum of Φ. [32,33] This approach consists of an inner and an outer iteration (Fig. 2).
During the inner iteration a line minimization along a search direction d k is performed according to ξ ξ α where α is a real number representing the step size.The inner iteration consists of several forward calculations in which the parameters ξ ξ ξ ξ are updated until the minimum along this direction is found.Once the minimum is found, a new gradient is calculated at this minimum (outer iteration) and another line-minimization along a new direction d k+1 is performed.In the Polak Riberie conjugate gradient scheme this new direction is calculate as follows: is smaller than a user defined value ε.A more detailed description of the GIIR algorithm used in this work can be found elsewhere.[19] f

Evolution Strategies(ES)
Evolution strategies do not require the gradient calculation and the search for a minimum is not performed along certain directions.To illustrate the concept we have focused on a simple but important case of determining the optical properties of a homogeneous medium.In this case we deal with only three unknowns, ξ ξ ξ ξ = (ξ 1 = µa, ξ 2 = µ s , ξ 3 = g).We start the search process by randomly choosing three ξ ξ ξ ξ with values of 0.1 < µ a < 1, 2.0 < µ s < 12, and 0 < g < 1.This is our parent population.The vectors (ξ 1 , ξ 2 , ξ 3 ) k = (µ a , µ s , g) k , k=1,2,3, represent the phenotypes of each member of the parent population.
Next we need to generate the offspring generation that correspond to new vectors ξ ξ ξ ξ.In our case, this is done by randomly pairing 12 couples from the parent population.The following six different pairs of parents are equally likely to be chosen: From each pair of parents one child is generated in a two-step process.
In the first step properties of the two parents are combined in a procedure generally referred to as recombination.In our case we chose a simple recombination rule by just averaging the optical properties of both parents.
In the second step, the average optical properties are further modified by a process referred to as mutation.We define the mutation rule as where y is a normally (Gaussian) distributed random number, which is obtained from with ν 1 and ν 2 being independent and uniformly distributed between 0 and 1. [34,35] The values of σ i where set to σ i = 1/3 ξ i,old .Therefore the optical properties of the offspring are Gaussian distributed around the average optical properties of the parents.The width of the Gaussian distribution is given by 1/3 of the actual value of µ a , µ s , and g respectively.Other values for σ and other distributions are also possible [24,26] but have not been explored by us at this point.
Once the offspring are generated, the fitness off all members of the offspring generation is evaluated to select the three fittest members as the new parent generation.The selection is done by performing a set of forward calculations using the finite-difference transport code.The input parameters for the forward code are source and detector positions, and the optical properties (µ a , µ s , g) i that characterize each member.The results of the forward calculation are used to determine the value of the objective function as given in Eq. 2. The lower the value of the objective function, the "fitter" is that member of the offspring population.Only the three children that result in the lowest objective-function values survive, and they become the new parents for the next generation.This procedure is iterated until the optical properties of the next generation converge within 2%.The flowchart of the ES scheme is displayed in Fig. 1.A schematic comparison of the gradient-based scheme and the ES schemes is depicted in Fig. 2. f Fig. 1.Basic scheme of the evolution strategy employed in this work.The parent population is initialized by randomly choosing three parents, p1, p2, p3, each with different optical properties µ a , µ s , and g.During the recombination process 12 parent pairs (pi,pj) 1 to (pi,pj) 12 are randomly generated.For each pair the average optical properties are calculated (Eq. 5 recombination rule) and one child (c1 to c12) is produced by mutation in which optical properties are randomly drawn from a Gauss distribution centered at the average optical properties of the parents (Eq.6).In the selection process the fittest members of the offspring population are determined by evaluating the objective function for each child (Eq.2).The three children with the lowest objective function become the three new parents for the next iteration.

Results
To illustrate the use of ES schemes in optical tomography we consider the following example.A 4cm x 4cm homogeneous medium is discretized with a spatial resolution of 0.1 cm resulting in a 40x40 finite-difference grid.The optical properties of the medium are µ a = 0.6 cm -1 , µ s = 6.0 cm -1 , and g = 0.5.Synthetic measurements M s,d were generated for 2 source positions and 32 detectors (Fig. 3), using the aforementioned optical parameters and the transport-equation-based forward model.[19] Fig. 3. Setup and measurement geometry for the numerical model.Synthetic data were generated for 2 sources (open circles) and 32 detectors (gray circles) with a time-independent transport-theorybased algorithm.
Using different sets of optical properties (µ a , µ s , g), we generated predicted data P s,d and calculated the value of the objective function Φ (Eq.2).The absorption coefficients µ a ranged from 0.1 to 1.0 cm -1 and g values ranged from 0.3 to 0.8.The µ s values were chosen so that 1 cm -1 ≤ µ s ' = (1-g) µ s ≤ 6 cm -1 in each image.The results are displayed in Fig. 4, where values of Log(Φ) are shown as a function of µ a and µ s for six different g values.For each gvalue we observe an elongated "blue" valley in which the objective function is smallest.The minimum within the valley shifts from the lower left quadrant to the upper right quadrant as g is increased.For g = 0.3 the minimum is at (µ a = 0.53 cm -1 , µ s = 4.37 cm -1 => µ s ' = 3.06 cm -1 ), and for g = 0.8 the minimum is at (µ a = 0.76 cm -1 , µ s = 23.4cm -1 => µ s ' = 4.68 cm -1 ).(The strong influence of the g value on the position of the minimum is noteworthy.Diffusionequation-based codes that consider only µ s ' cannot distinguish between pairs of g and µ s values that result in a same µ s ' value.Since the µ s ' range is the same in all 6 maps in Fig. 4, using a diffusion model to predict detector readings and calculate the objective function would yield 6 identical maps!)The global minimum, with Φ = 0, is at µ a = 0.6 cm -1 , µ s = 6.0 cm -1 , and g = 0.5 (Fig. 4c).However, as can be seen from Fig. 4 many combinations of optical properties (µ a , µ s , g) lead to similar low values.Assuming that one does not know where the true minimum is, the questions of what algorithm is best suited to find the global minimum arises?
To answer this question, we first employed the gradient-based scheme starting with three different initial guesses (µ a , µ s ), while keeping the anisotropy factor fixed at g = 0.5.The reconstruction scheme found different pairs of (µ a , µ s ) values for each initial guess.Starting from (µ a = 0.8 cm -1 , µ s = 7.0 cm -1 ) the recovered values were (µ a = 0.52 cm -1 , µ s =6.96 cm -1 ), starting from (µ a = 0.2 cm -1 , µ s = 4 cm -1 ) the algorithm found (µ a = 0.59 cm -1 , µ s = 6.04 cm -1 ), and starting from (µ a = 0.4 cm -1 , µ s ' = 10 cm -1 ) the code determined values of (µ a = 0.39 cm -1 , µ s = 9.9 cm -1 ). Figure 5 shows how the value of the objective function monotonically decreases for all three initial guesses during the first 10 outer iterations.It can also be seen that starting from (µ a = 0.8 cm -1 , µ s = 7.0 cm -1 ) the value of the objective function decreases considerably after the first iteration, but remains almost constant thereafter.Starting from µ s = 6.0 cm -1 µ a = 0.6 cm -1 g = 0.5 f (µ a = 0.2 cm -1 , µ s = 4 cm -1 ) the objective function reaches a plateau value after 7 iterations.Starting from (µ a = 0.4 cm -1 , µ s = 10 cm -1 ), which is a point inside the valley, the value of the objective function changes very little. .Slices are taken for different g values, which results in Log(Φ) versus µ a and µ s maps.The minimum of Log(Φ) is at µ a = 0.6 cm -1 , µ s = 6.0 cm -1 , g = 0.5.Note that the x-axis is chosen so that the reduced scattering coefficient µ s ' = (1-g) µ s ranges from 1 to 6 cm -1 in all figures.It can be seen that many combinations of µ a , µ s , g lead to similar low values in the log(Φ).f Fig. 5. Objective function (Eq.2) as a function of outer iterations (conjugate-gradient scheme) and number of generations (evolution-strategy scheme).For the evolution algorithm only the member with the smallest objective function for each generation is shown.The time required to complete all calculations for one generation equals the time it takes to complete one outer iteration within ±10%.
Figures 6 and 7 display the path of the iterations on the surface of the objective function in a 3-dimensional and 2-dimensional graph, respectively.Fig. 7 is identical to Fig. 4c, except that in Fig. 7 the µ a -axis has not been stretched.Therefore the units on both axes are the same, which further emphasizes the narrow character of the valley.It can be seen that the first iteration step in the gradient-based algorithm takes the parameters along the gradient direction directly into the elongated "blue" valley.Once inside the valley only points to the left of the minimum at (µ a = 0.6 cm -1 , µ s = 6.0 cm -1 ) are significantly improving and get close the minimum.Starting from (µ a = 0.8 cm -1 , µ s = 7.0 cm -1 ) the search basically stops as soon as the valley is reached.Starting from a point inside the valley at (µ a = 0.4 cm -1 , µ s = 8.0 cm -1 ) only negligible improvement towards the minimum is observed.In general we found that the gradient-based scheme can find the real minimum only for a limited range of initial guesses.In particular, initial guesses that are located in the lower left quadrant of Fig. 4c fare the best, while initial guesses located in the upper right quadrant perform the worst.This can be understood by taking a closer look at the objective function along the bottom of the valley (Fig. 8).The gradient within the valley is steeper to the left of the minimum than to the right.Furthermore, as can be seen in Figs.4c, Fig. 6, and Fig. 7, the gradient below the valley is in general larger than the gradient above the valley.Therefore starting at a point below the valley to the left (lower left quadrant) the search algorithm will encounter steeper gradients than starting from a point above the valley to the right (upper right quadrant).The steeper the gradient, the better the gradient based algorithm will perform.
For the ES-based algorithm we randomly chose 3 initial sets of (µ a , µ s ) representing the initial parents generation.Each subsequent offspring generation had 12 members, which is equivalent to 12 sets of (µ a , µ s ).We found that for almost any pair of three initial parents we were able to recover the global minimum within 5% after less than 10 generations.Only in the pathological case when the randomly chosen 3 initial sets of (µ a , µ s ), where within 10% of each other would the algorithm not converge to the true minimum.Choosing the same  and 7, the values of all sample points in the ES scheme (blue diamonds).
The time it takes to complete 10 generations of the ES scheme is on the same order as finding a minimum using 1 initial guess and the gradient-based scheme.In the ES scheme the objective function is evaluated 120 times during 10 generations.The gradient-based scheme requires about the same number of objective function evaluations, since one outer iteration requires about 10-15 inner iterations.Each inner iteration requires one forward To assure that a global minimum has been found using a gradient-based optimization scheme one could perform several minimizations starting from different initial guesses and compare the final values of the objective function.In the examples studied so far, we have found that this always takes longer than using the ES algorithm.Evolution-strategy algorithms do not require more memory than gradient-based schemes.Once the forward calculations for different sets of optical properties are done, only the values of the objective functions for these sets of data need to be stored to perform the recombination, mutation, and selection.In gradient based schemes that use adjoint differentiation for the calculation of the gradient all intensities as calculated by the forward model need to be stored.[19,22] This typically requires more memory than storage of only the value of the objective function.
Fig. 8: Values of the objective function at the bottom of the elongated, curved (blue) valley in Fig. 6.See also blue valleys in Fig. 4c and 7. Note that in Fig. 4c, 6, and 7 Log(Φ) is displayed while here Φ is plotted.It can be seen that the gradient of the objective function is much steeper to the left of the minimum at µ s = 6 cm -2 than to the right of the minimum

Summary and Outlook
We have introduced the concept of evolution-strategy for diffuse optical tomography.Unlike currently widely employed gradient-based reconstruction algorithms, evolution-strategy codes don't make use of gradient information during the reconstruction process.Interpreting the reconstruction process as a minimization problem, gradient schemes often get trapped in local shallow valleys, in which the gradient is very small.Therefore these schemes have to rely on a good initial guesses close to the true minimum to be successful.Evolution-strategy algorithms promise to overcome this problem, since they are less dependent on good initial guesses and sample the solution space more effectively.Using the example of finding the correct optical properties of a homogenous medium, we have shown that an evolutionstrategy algorithm finds the global minimum more reliably than a conjugate gradient-based reconstruction scheme.
Applying evolution-strategy-based schemes to heterogeneous media, which contain a larger number of unknowns, will result in computationally more intensive operations.For example the convergence speed of an evolution strategy is commonly described by the rate of progress ϕ.The rate of progress is the ratio of the number of successful mutations to the total f number of mutations.The maximum rate of progress ϕ max is inversely proportional to the number of unknown variables n with ϕ max ∝ 1/n using Rechenberg's sphere model [24, page 127-141].That means if the number of variables is increased the maximum rate of progress is decreased, which leads to fewer successful mutations and therefore to a lower convergence speed.In general that means that more generations are required, which in turn leads to a larger number of forward calculations.Furthermore, larger sets of parents are necessary, which will further slow the optimization process.The particular performance of evolutionstrategies in optical tomographic problems that involve a larger number of unknowns remains to be studied.It appears likely that some hybrid method that employs gradient-based techniques as well as evolution-strategies may be most successful.Finally, it should be mentioned that other global minimization techniques exist that are similar to the evolution-strategy (ES) algorithm presented in this work.These other approaches are commonly referred to as evolutionary programming (EP) and genetic algorithms (GAs).The main difference between these three types of approaches (ES, EP and GA) lies in the way the recombination, mutation, and selection rules are formulated and weighted in the search algorithm.Furthermore, GAs represent the object variables (e.g.optical properties) in bit strings that can be thought of as genotypic level of information.In contrast, ES algorithms work on a phenotypic level represented by real-valued vectors.A detailed discussion of the differences among these various approaches and their potential benefits for diffuse optical tomography is beyond the scope of this work, and the interested readers may be directed to References 24-26.However, we believe that the present study clearly encourages further exploration into the many different aspects of global minimization schemes and their potential use in diffuse optical tomography.

Fig. 2 .
Fig. 2. Comparison of gradient-based reconstruction algorithms (a-left) and evolution-strategy algorithms (b-right) for diffuse optical tomography.The lower index k indicates different updates and generations.The upper indices n and m indicate different parents and children.

Fig. 4 .
Fig. 4. Contour-map representations of 3-dimensional slices through the 4-dimensional objective function (Eq.2).Slices are taken for different g values, which results in Log(Φ) versus µ a and µ s maps.The minimum of Log(Φ) is at µ a = 0.6 cm -1 , µ s = 6.0 cm -1 , g = 0.5.Note that the x-axis is chosen so that the reduced scattering coefficient µ s ' = (1-g) µ s ranges from 1 to 6 cm -1 in all figures.It can be seen that many combinations of µ a , µ s , g lead to similar low values in the log(Φ).