Reverse differentiation and the inverse diffusion problem

doi:10.1016/S0965-9978(97)00005-7

Advances in Engineering Software

Volume 28, Issue 4, June 1997, Pages 217-221

https://doi.org/10.1016/S0965-9978(97)00005-7 Get rights and content

Abstract

In this paper the task of solving the inverse photon diffusion problem in human tissue will be discussed. In such problems the optical properties of the tissue are to be determined from a knowledge of measured values of boundary output fluxes. A finite element model is proposed for the solution process and the problem is posed as a least squares optimisation problem. An important and costly part of optimisation processes is the calculation of gradients of the merit function. It has been shown that for such problems the reverse differentiation process can be implemented very efficiently with significant savings in both computation time and store. In this paper we show that the finite element optimisation process with reverse differentiation is ideally suited to the solution of the inverse diffusion problem. © 1997 Published by Elsevier Science Limited

Section snippets

INTRODUCTION

The finite element method can be used to solve the direct diffusion photon transport problem in human tissues.[1] In that problem the optical properties of the tissue are assumed known and the outward flux on the boundary calculated for a given isotropic point light source. In the reverse problem measurements of the output flux are made and the problem is to calculate the values of the optical properties that predict outputs that most closely match the measurements. The reverse problem is posed

THE DIRECT PROBLEM

To solve the direct problem by the finite element method the region A is covered by a mesh of elements. We choose to use simple triangular elements in which all variables u, D and μ are assumed to be linear functions of their values at the nodes of the triangles. In the direct problem the values of q, D and μ are assumed known at each node and the values of u at each node required. The special form of the boundary condition Eq. (2)then determines the boundary flux at the boundary nodes.

The

THE INVERSE PROBLEM

The inverse problem is to be solved as an optimisation problem. The optimisation variables are the 2n unknowns D and μ at each node. To evaluate these, more than 2n items of data are required. In each experiment the flux D∂u/∂n is measured at each of the n_B boundary nodes. If the measurements are taken for n_Q different distributions of the light source q then we require $(n_{B} −1)n_{Q} ≤2n$ If A is a square with n_S nodes on each side then n = n_S² and n_B = 4(n_S − 1) so we may take $n_{Q} ≥ 2n^{2}_{S} 4n_{S} −5 .$ For a more

THE GRADIENT CALCULATION

The gradient vector ΔE consists of 2n terms as it involves differentiating with respect to each of the unknowns D and μ at each of the n nodes. It is therefore essential that this be done efficiently. Examination of the steps of the function evaluation in Section 3indicates clearly that performing the Choleski decomposition once and solving all n_Q sets of equations are roughly equal operations and dominate the operations in the other steps. Using numerical approximations by central differences $∇E$

OPTIMISATION ALGORITHM

The optimisation problem Eq. (6)corresponding to the relatively coarse mesh in Example 2 already has 96 variables and 380 terms in the sum of squares. Both the number of variables and the number of terms will increase as the mesh is refined. Algorithms that require the storage of the Hessian or the Jacobian matrices or their approximations are therefore not appropriate. In these circumstances two popular algorithms would be a preconditioned conjugate gradient method and a trust region method

CONCLUSION

In this paper a very efficient method for calculating the gradient vector of the inverse diffusion problem has been presented based on an analytic form of the reverse differentiation method. It enables the full gradient vector to be obtained in fewer than three times the number of operations needed to evaluate the function without the storage overheads associated with the automatic approach.

References (10)

S.R. Arridge et al.
A finite element approach for modelling photon transport in tissue
Medical Physics
(1993)
Griewank, A. On automatic differentiation. In Mathematical Programming: Recent Developments and Applications, ed. Iris,...
Bartholomew-Biggs, M.C., Brown, S., Christianson, B. & Dixon, L.C.W. The efficient calculation of gradients, Jacobians...
Case, K.M. & Zweifel, P.F. Linear Transport Theory. Addison Wesley, New York,...
Ishimaru, A. Wave Propagation and Scattering in Random Media. Academic Press, New York,...

There are more references available in the full text version of this article.

Cited by (35)

GPU-accelerated adjoint algorithmic differentiation
2016, Computer Physics Communications
Citation Excerpt :
This vectorization of the adjoints can lead to substantially smaller tapes because the memory overhead is reduced. Furthermore, it allows integration of iterative linear solvers, which may not be automatically differentiable otherwise [14,17]. GPUs provide massive computational power by running thousands of light-weight threads in parallel [18].
Many scientific problems such as classifier training or medical image reconstruction can be expressed as minimization of differentiable real-valued cost functions and solved with iterative gradient-based methods. Adjoint algorithmic differentiation (AAD) enables automated computation of gradients of such cost functions implemented as computer programs. To backpropagate adjoint derivatives, excessive memory is potentially required to store the intermediate partial derivatives on a dedicated data structure, referred to as the “tape”. Parallelization is difficult because threads need to synchronize their accesses during taping and backpropagation. This situation is aggravated for many-core architectures, such as Graphics Processing Units (GPUs), because of the large number of light-weight threads and the limited memory size in general as well as per thread. We show how these limitations can be mediated if the cost function is expressed using GPU-accelerated vector and matrix operations which are recognized as intrinsic functions by our AAD software. We compare this approach with naive and vectorized implementations for CPUs. We use four increasingly complex cost functions to evaluate the performance with respect to memory consumption and gradient computation times. Using vectorization, CPU and GPU memory consumption could be substantially reduced compared to the naive reference implementation, in some cases even by an order of complexity. The vectorization allowed usage of optimized parallel libraries during forward and reverse passes which resulted in high speedups for the vectorized CPU version compared to the naive reference implementation. The GPU version achieved an additional speedup of $7.5 \pm 4.4$ , showing that the processing power of GPUs can be utilized for AAD using this concept. Furthermore, we show how this software can be systematically extended for more complex problems such as nonlinear absorption reconstruction for fluorescence-mediated tomography.
Program title: AD-GPU
Catalogue identifier: AEYX_v1_0
Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AEYX_v1_0.html
Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland
Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html
No. of lines in distributed program, including test data, etc.: 16715
No. of bytes in distributed program, including test data, etc.: 143683
Distribution format: tar.gz
Programming language: C++ and CUDA.
Computer: Any computer with a compatible C++ compiler and a GPU with CUDA capability 3.0 or higher.
Operating system: Windows 7 or Linux.
RAM: 16 Gbyte
Classification: 4.9, 4.12, 6.1, 6.5.
External routines: CUDA 6.5, Intel MKL (optional) and routines from BLAS, LAPACK and CUBLAS
Nature of problem: Gradients are required for many optimization problems, e.g. classifier training or nonlinear image reconstruction. Often, the function, of which the gradient is required, can be implemented as a computer program. Then, algorithmic differentiation methods can be used to compute the gradient. Depending on the approach this may result in excessive requirements of computational resources, i.e. memory and arithmetic computations. GPUs provide massive computational resources but require special considerations to distribute the workload onto many light-weight threads.
Solution method: Adjoint algorithmic differentiation allows efficient computation of gradients of cost functions given as computer programs. The gradient can be theoretically computed using a similar amount of arithmetic operations as one function evaluation. Optimal usage of parallel processors and limited memory is a major challenge which can be mediated by the use of vectorization.
Restrictions: To use the GPU-accelerated adjoint algorithmic differentiation method, the cost function must be implemented using the provided AD-GPU intrinsics for matrix and vector operations. Unusual features:
GPU-acceleration.
Additional comments: The code uses some features of C++11, e.g. std::shared ptr. Alternatively, the boost library can be used.
Running time: The time to run the example program is a few minutes or up to a few hours to reproduce the performance measurements.
Algorithmic differentiation of numerical methods: Second-order adjoint solvers for parameterized systems of nonlinear equations
2016, Procedia Computer Science
Adjoint mode algorithmic (also know as automatic) differentiation (AD) transforms implementations of multivariate vector functions as computer programs into first-order adjoint code. Its reapplication or combinations with tangent mode AD yields higher-order adjoint code. Second derivatives play an important role in nonlinear programming. For example, second-order (Newton-type) nonlinear optimization methods promise faster convergence in the neighborhood of the minimum through taking into account second derivative information. The adjoint mode is of particular interest in large-scale gradient-based nonlinear optimization due to the independence of its computational cost on the number of free variables. Part of the objective function may be given implicitly as the solution of a system of n parameterized nonlinear equations. If the system parameters depend on the free variables of the objective, then second derivatives of the nonlinear system's solution with respect to those parameters are required. The local computational overhead as well as the additional memory requirement for the computation of second-order adjoints of the solution vector with respect to the parameters by AD depends on the number of iterations performed by the nonlinear solver. This dependence can be eliminated by taking a symbolic approach to the differentiation of the nonlinear system.
Second-order tangent solvers for systems of parameterized nonlinear equations
2015, Procedia Computer Science
Forward mode algorithmic differentiation transforms implementations of multivariate vector functions as computer programs into first directional derivative (also: first-order tangent) code. Its reapplication yields higher directional derivative (higher-order tangent) code. Second derivatives play an important role in nonlinear programming. For example, second-order (Newton-type) nonlinear optimization methods promise faster convergence in the neighborhood of the minimum through taking into account second derivative information. Part of the objective function may be given implicitly as the solution of a system of n parameterized nonlinear equations. If the system parameters depend on the free variables of the objective, then second derivatives of the nonlinear system's solution with respect to those parameters are required. The local computational overhead for the computation of second-order tangents of the solution vector with respect to the parameters by Algorithmic Differentiation depends on the number of iterations performed by the nonlinear solver. This dependence can be eliminated by taking a second-order symbolic approach to differentiation of the nonlinear system.
Hybrid Optical Imaging
2014, Comprehensive Biomedical Physics
In this chapter, hybrid optical imaging approaches for noninvasive imaging are discussed. Due to high scattering of optical photons in most tissues, optical in vivo imaging of deeper tissues is mostly restricted to diffuse imaging approaches, resulting in challenging and ill-posed reconstruction problems. Anatomical information from complementary modalities, such as microcomputed tomography and magnetic resonance imaging, is useful for image analysis but may also provide information for improved FMT reconstructions. Multimodal imaging with positron emission tomography or single-photon emission computed tomography can be used to validate FMT devices or reconstruction methods using dual-modality probes. Furthermore, the aspects of mouse handling and image fusion in sequential and longitudinal scanning are discussed.
Parallel programming of gradient-based iterative image reconstruction schemes for optical tomography
2004, Computer Methods and Programs in Biomedicine
Citation Excerpt :
This is an obvious necessity of any parallel scheme and was testes thoroughly during the developing process. Therefore, the quality of our gradient-based image reconstruction is not subject to discussion at this point, and the reader with interest in quality of optical tomographic imaging is referred to the previously cited publication [37–56] that address this problem. Fig. 6 shows the dependency of vt(npe) on the number of processors for all three schemes as well as the idealized case of γ=1.
Optical tomography (OT) is a fast developing novel imaging modality that uses near-infrared (NIR) light to obtain cross-sectional views of optical properties inside the human body. A major challenge remains the time-consuming, computational-intensive image reconstruction problem that converts NIR transmission measurements into cross-sectional images. To increase the speed of iterative image reconstruction schemes that are commonly applied for OT, we have developed and implemented several parallel algorithms on a cluster of workstations. Static process distribution as well as dynamic load balancing schemes suitable for heterogeneous clusters and varying machine performances are introduced and tested. The resulting algorithms are shown to accelerate the reconstruction process to various degrees, substantially reducing the computation times for clinically relevant problems.
Optical tomography using the time-independent equation of radiative transfer-Part 2: Inverse model
2002, Journal of Quantitative Spectroscopy and Radiative Transfer
Optical tomography is a novel imaging modality that is employed to reconstruct cross-sectional images of the optical properties of highly scattering media given measurements performed on the surface of the medium. Recent advances in this field have mainly been driven by biomedical applications in which near-infrared light is used for transillumination and reflectance measurements of highly scattering biological tissues. Many of the reconstruction algorithms currently utilized for optical tomography make use of model-based iterative image reconstruction (MOBIIR) schemes. The imaging problem is formulated as an optimization problem, in which an objective function is minimized. In the simplest case the objective function is a normalized-squared error between measured and predicted data. The predicted data are obtained by using a forward model that describes light propagation in the scattering medium given a certain distribution of optical properties.
In part I of this two-part study, we presented a forward model that is based on the time-independent equation of radiative transfer. Using experimental data we showed that this transport-theory-based forward model can accurately predict light propagation in highly scattering media that contain void-like inclusions. In part II we focus on the details of our image reconstruction scheme (inverse model). A crucial component of this scheme involves the efficient and accurate determination of the gradient of the objective function with respect to all optical properties. This calculation is performed using an adjoint differentiation algorithm that allows for fast calculation of this gradient. Having calculated this gradient, we minimize the objective function with a gradient-based optimization method, which results in the reconstruction of the spatial distribution of scattering and absorption coefficients inside the medium. In addition to presenting the mathematical and numerical background of our code, we present reconstruction results based on experimentally obtained data from highly scattering media that contain void-like regions. These types of media play an important role in optical tomographic imaging of the human brain and joints.

View all citing articles on Scopus

View full text