Enhanced model iteration algorithm with graph neural network for diffuse optical tomography

Diffuse optical tomography (DOT) employs near-infrared light to reveal the optical parameters of biological tissues. Due to the strong scattering of photons in tissues and the limited surface measurements, DOT reconstruction is severely ill-posed. The Levenberg-Marquardt (LM) is a popular iteration method for DOT, however, it is computationally expensive and its reconstruction accuracy needs improvement. In this study, we propose a neural model based iteration algorithm which combines the graph neural network with Levenberg-Marquardt (GNNLM), which utilizes a graph data structure to represent the finite element mesh. In order to verify the performance of the graph neural network, two GNN variants, namely graph convolutional neural network (GCN) and graph attention neural network (GAT) were employed in the experiments. The results showed that GCNLM performs best in the simulation experiments within the training data distribution. However, GATLM exhibits superior performance in the simulation experiments outside the training data distribution and real experiments with breast-like phantoms. It demonstrated that the GATLM trained with simulation data can generalize well to situations outside the training data distribution without transfer training. This offers the possibility to provide more accurate absorption coefficient distributions in clinical practice.

The inverse problem of DOT is severely ill-posed due to the high scattering of photons in tissues and limited boundary measurements.To obtain a stable and effective solution, regularization is required in inversion to improve the convergence [10,11].When perturbations in optical properties exceed the Born approximation limit, non-linear iteration-based approaches are commonly used to obtain optical properties [11,12].Newton-type methods and their variants exhibit high fidelity in the inverse problem of DOT [13].Multi-modality strategies have also been widely studied [14].A complementary modality can provide priori information to improve the reconstruction.
In recent years, with the advent of artificial intelligence, particularly deep learning, deep neural network-based approaches have been proposed for DOT [15][16][17][18][19][20].The most straightforward application of deep learning in DOT is to directly establish a functional mapping relationship between boundary measurements and the spatial distribution of optical parameters via a fully connected or convolutional neural network (CNN), known as end-to-end deep neural networks (ETE-DNNs) [16][17][18].Yoo et al. trained a neural network for absorption coefficient in DOT by inverting the Lippman Scwhinger equation, which is the pseudoinverse of the nonlinear operator for DOT model [16].In [17], measurements and optical coefficients were mapped to the feature space by a codec and then bridged together by a "singular value operator", which can be considered as a learned regularization.This eliminates the need to choose prior regularization.Another type of deep learning method is model-based iterative learning approach (or deep unrolling method), essentially intertwining learned components with the model equations of DOT [19,20].Compared to ETE-DNNs for non-linear mapping, this approach has the advantage of being less dependent on data quantity due to the incorporation of physical models and prior information.For example, Mozumder M et al. have employed the combination of Gauss-Newton algorithm and CNN to solve the reconstruction problem [19].The CNN was utilized to reset the changes in absorption at each iteration and accelerate the reconstruction process.Deep learning can enhance the stability and uniqueness of solutions to the inverse problem of DOT through strong priors trained by data-driven methods, allowing classical analytical reconstruction methods to be used in combination with neural network methods.
In some tomography imaging studies, tetrahedral elements are commonly used for discretizing the imaging domain [19,21,22].The sizes of the tetrahedral elements are not always the same.Since CNNs cannot guarantee the translation invariance in tetrahedral mesh data, it is not appropriate to employ CNNs on tetrahedral mesh directly.When implementing CNNs on tetrahedral mesh, interpolation or other effective methods are required to convert the data on the tetrahedral element into the voxel [19].Therefore, it is inevitable that certain errors will occur during the interpolation.It should be noted that the structure of tetrahedral element is a natural graph structure.The graph neural network (GNN) has the ability of representing the tetrahedral element as a graph.Naturally, we proposed a reconstruction framework based on GNN for DOT in this study.It combined the iterative GNN with Levenberg-Marquardt Method, named as GNNLM.It extends the model-based learning directly to the tetrahedral mesh by representing the mesh as a graph using GNNs.The GNNLM incorporates GNNs into the traditional iterative progress for the inverse problem of DOT.It changes the combination of the current iterate µ a,k and its update ∆µ a,k by Levenberg-Marquardt (LM).A series of experiments have been designed to test the proposed method.
The outline of this paper is listed as follows.Section Two shows the proposed method.It includes a brief introduction of the inverse problem of DOT and the detailed descriptions of the method.A series experiments and results by the three methods are presented in Section Three.Finally, the conclusion and discussion are provided in Section Four.

Light propagation in diffusive medium
The propagation of near-infrared light through highly scattering media, such as biological tissues, is often modeled as a second-order partial differential equation [2].For the continuous-wave diffuse optical tomography (CW-DOT) model, the light propagation can be described using steady-state diffusion approximation to the radiation transfer equation (RTE) [2,23].The diffusion equation (DE) is given by where Ω denotes the domain, and q 0 (r) is an isotropic source term.φ(r) is the photon density at position r, µ a is the absorption coefficients, µ , s (r) = (1 − g)µ s (r) is the reduced scattering, D(r) = (µ a (r) + µ ′ s (r))/3 is the diffusion coefficient and g is anisotropic factor.The air-tissue boundary is represented by an index-mismatched type III condition (also known as Robin condition).The output light flux detected on the surface of the tissue is where ∂Ω represents the surface of the tissue.r i s , (i = 1, 2, • • • , I.) represents the i-th source position with the total number of I. r j d , (j = 1, 2, • • • , J.) represents the j-th detector position with the total number of J.

Inverse problem
We utilized the finite element methods (FEM) based software package NIRFAST to solve the CW-DOT inverse problem [22,24].And the problem is presented as Eq. ( 3) where F is the forward model.Due to the high ill-posedness of inverse problem, Tikhonov regularization is used to constrain the reconstruction, and Eq. ( 3) can be achieved as follows: where ρ is the regularization parameter.The Levenberg-Marquardt (LM) is generally applied at each iterative scheme as follows [10]: where J is the Jacobian.This approach is also known as a trust-region method [25].P. K. Yalavarthy et.al have provided the detailed discussion of this approach in Ref. [10].The advantage of using this method is in the simple choice of a regularization parameter.The limitations of this method include: (i).J T J must be positive definite.(ii).The initial guess should be close to the actual solution.(iii).The update equation does not solve the first-order condition unless ρ = 0. (iv).Since parameters are not involved in the minimization scheme, the inverse problem may be unstable.Even though J T J is not positive definite in DOT, the Levenberg-Marquardt (or its modified version) has been used successfully in a number of instances in DOT [10,[26][27][28][29].It involves the derivatives of the log of amplitude of the measurements with absorption coefficient and diffusion coefficient of reconstructed node in our study, and the structure of the Jacobian becomes J = ∂ ln(F)/∂ µ a .By the nature of the Jacobian in DOT, the number of rows is far fewer than the number of columns.Equation ( 5) is solved by the matrix inversion lemma [30], which changes the equation structure from This reduces the size of the inverse part, and saves a lot of time.In subsequent studies, the stopping criterion is that the difference of data-model misfit does not improve more than 6% compared with previous iteration for all cases.

Graph neural network with the Levenberg-Marquardt algorithm
A graph G is a data structure which contains nodes V and edges E, and it can be represented as G = (V, E).Graph neural networks are deep neural network frameworks for unstructured data types such as graphs.To consider an absorption distribution µ a and update ∆µ a , we defined a finite element mesh (FME) with M elements as graph data.The feature matrix H ∈ R M×f is defined over the nodes.The adjacency matrix A ∈ R M×M describes how the graph nodes are connected.It is sparse and only entries are nonzero where graph nodes i and j are connected, namely A ij = 1 [31].In our problem, the solution is defined on the FEM nodes, which would be the natural graph nodes.
The feature vector of the nodes is represented by X v , where v ∈ V. Graph neural networks utilize the structure of the graph and the feature vector X v to learn the representation vector H v of the nodes or the representation vector H G of the entire graph [32,33].It has been shown that the optimal number of layers for graph neural networks is generally 2-3 layers [34,35].In this work, we developed a deep learning framework based on model iteration for the inverse problem of DOT.The graph neural network is selected as the neural network framework, and the Levenberg-Marquardt (LM) method is served as its analytical iterative method.The graph neural network can directly operate on the FEM meshes to change the combination of the initial value µ a,k and its change ∆µ a,k of the current iteration.The analytical iterative method obtains the expression for the next iteration, where ∆µ a,k is calculated by the LM algorithm.This proposed method is called as the graph neural network with Levenberg-Marquardt (GNNLM) algorithm.After defining µ a,k and ∆µ a,k on a graph, the next iteration is calculated through a trained graph neural network and the edge connection relationship E between nodes.The process can be expressed as: where ∆µ a,k is calculated by the LM algorithm.µ a,k and ∆µ a,k are combined to form a feature matrix, which is the input of the graph neural network block.And the output is the absorption coefficient value µ a,k+1 , as shown in Fig. 1.The structure of the neural network block shown in Fig. 1(b) is consistent across each iteration in Fig. 1(a), but with different learned parameters.
value  , and its change  , of the current iteration.The analytical iterative method obtains the expression for the next iteration, where  , is calculated by the LM algorithm.This proposed method is called as the graph neural network with Levenberg-Marquardt (GNNLM) algorithm.After defining  , and  , on a graph, the next iteration is calculated through a trained graph neural network and the edge connection relationship  between nodes.The process can be expressed as:  ,+1 =   ([ , , , ],) (6) where  , is calculated by the LM algorithm. , and  , are combined to form a feature matrix, which is the input of the graph neural network block.And the output is the absorption coefficient value  ,+1 , as shown in Fig. 1.The structure of the neural network block shown in Fig. 1(b) is consistent across each iteration in Fig. 1(a), but with different learned parameters.
Fig. 1.The GNNLM method framework for solving the inverse problem of DOT.(a) the first two iterations of the GNNLM method, where the GNN block can be replaced with other graph neural network variants.(b) the k-th graph neural network block.In this graph neural network block, the input feature matrix  ,0 is the concatenation and merging of  , and  , , and the output  ,3 is  ,+1 .

Graph Convolution Levenberg-Marquardt algorithm
The spectral convolutions on graphs is defined as the multiplication of a signal  ∈ ℝ  with a filter   = () parameterized by  ∈ ℝ  in the Fourier domain: *  =      (7) where  is the matrix of eigenvectors of the normalized graph Laplacian matrix .We can understand   as a function of the eigenvalues of , denoted as   (). is a diagonal matrix of eigenvalues of , and  is the parameter of the filter.Hammond et al. suggested that   () can be well-approximated by a truncated expansion in terms of Chebyshev polynomials with -th order [36]: where  = 2    -,   is the largest eigenvalue of  and  is the identity matrix.Kipf and Welling proposed one of the most popular graph neural networks, graph convolutional neural network (GCN) [37].They limited the layer-wise convolution operation to K = 1 and further approximated   ≈ 2. Then Eq. ( 8) can be simplified to   *  ≈ ( -0.5  -0.5 ) (9) It is applied to DOT reconstruction and the layer is defined as: ,+1 =   -0.5  -0.5  ,  , (10) where  =  +  and  is the adjacency matrix. -0.5 denotes the inverse of the square root of each element and =  ∑    .The weight matrix  , ∈ ℝ   × +1 contains the trainable parameters.  is the dimension of the input features and  +1 is the dimension of

Graph convolution Levenberg-Marquardt algorithm
The spectral convolutions on graphs is defined as the multiplication of a signal x ∈ R N with a filter g θ = diag(c) parameterized by θ ∈ R N in the Fourier domain: where U is the matrix of eigenvectors of the normalized graph Laplacian matrix L. We can understand g θ as a function of the eigenvalues of L, denoted as g θ (Λ).Λ is a diagonal matrix of eigenvalues of Λ, and θ is the parameter of the filter.Hammond et al. suggested that g θ (Λ) can be well-approximated by a truncated expansion in terms of Chebyshev polynomials with K-th order [36]: where L = 2 ι max L − I, ι max is the largest eigenvalue of L and I is the identity matrix.Kipf and Welling proposed one of the most popular graph neural networks, graph convolutional neural network (GCN) [37].They limited the layer-wise convolution operation to K = 1 and further approximated ι max ≈ 2. Then Eq. ( 8) can be simplified to It is applied to DOT reconstruction and the layer is defined as: where Ã = A + I and A is the adjacency matrix.D−0.5 denotes the inverse of the square root of each element and D = diag f l is the dimension of the input features and f l+1 is the dimension of the output features.Finally, σ(•) denotes the nonlinear activation function.D−0.5 Ã D−0.5 H k,l is an aggregation of features within the neighborhood of each graph node.GCN layers only use a specific weighted average (described by D−0.5 Ã D−0.5 ) for aggregating information that is not learned.

Graph attention Levenberg-Marquardt algorithm
Graph convolutional neural networks aggregate the neighboring information of nodes with equal weights.To clarify the weight values of neighboring information of each node, Veličković et al. proposed a graph attention neural network based on the attention mechanism framework [38].It can aggregate the neighboring information of nodes with a learned weights.The attention coefficients c is computed by performing self-attention on the nodes a shared attentional mechanism a T ∈ R 2f l+1 .And further the LeakyReLU nonlinearity is applied to the attention coefficients as follows: Here, .T represents transposition and || is the concatenation operation.W ∈ R f l+1 ×f l is a linear transformation.To make coefficients easily comparable across different nodes, the Softmax function is used to normalize Eq. (11).The coefficient α computed by the attention mechanism is expressed as: The attention mechanism α is a single-layer feedforward neural network.The learning process is stabilized through multi-head attention, which is performed on the final layer of the network by averaging and on other layers by concatenation.

Networks structure and training setup
As with other model-based deep learning methods, there are two ways to train neural networks.One is to train the entire architecture of k max modules end-to-end, and the other is to train each neural network module sequentially [31].In this work, we adopt the sequential individual training method for two reasons.First, updating the neural network parameters through back-propagation requires the participation of a finite element solver, which is impossible in our experiments.
Second, integrating a finite element solver into a neural network would result in excessively long training times.Therefore, we follow the sequential method of training each module independently.In the GNNLM framework, the maximum number of iterations for the inverse problem of DOT is k max .Given a train set and a validation set, where the data format is the absorption coefficient µ a,0 (µ a,0 is the initial solution by LM) and the ground truth µ a,true pair.The input data of the subsequent network is the output of the previous network.A total of k max modules are trained.And the loss function is defined as the mean square error between the current input and ground truth: In this work, GCN and GAT are used for the GNN blocks, k max = 5 and the input feature of all networks is [µ a,k , ∆µ a,k ].The output is a graph with the same adjacency matrix as the input, but it has only one feature µ a,k+1 defined over the nodes.
GCN has three graph convolutional layers.Except the final output layer, the rest layers use ReLU activation functions and expand the feature dimension to 200 dimensions.The GAT consists of two graph attention layers.The first layer contains 8 different attention heads, which expands the feature dimension to 16 features with a ReLU non-linear activation function.In the last layer, the mean of all heads is taken and a one-dimensional result is output.This process changes the combination of the current iteration µ a,k and its variation ∆µ a,k .These features are encoded in the learned parameters GNN k .The training procedure is summarized in Algorithm 1 of Table 1 without the 'TEST' part.After training the parameter sets GNN k , the learned iterative reconstruction scheme is evaluated by applying the network at each iteration.This procedure is equivalent to Algorithm 1, starting by changing the train and validation sets to test set and skipping the 'TRAIN' part.for k = 0 : k max − 1 Calculate the variation ∆µ a,k according to Eq. ( 5) for all samples in the train and validation sets.
Concatenate the current absorption coefficient µ a,k and its change ∆µ a,k as the input of GNN-LM.
Train GNN k by minimizing Eq. ( 13).(TRAIN) Output result when the given threshold is reached.(TEST)

Simulation experiments
A simulation slab geometry with dimensions of 140 × 80 × 40mm 3 is utilized to generate the training data, with µ a = 0.004mm −1 and µ s = 1mm −1 .The 28 sources are placed in a 7 × 4 grid format on the one plane, with a spacing of 14mm (columns) in the x-direction and 13mm (rows) in the y-direction.The detector array is arranged identically to the source array on another plane, with each detector positioned opposite its corresponding source in a mirror arrangement.The positions of sources and detectors (SD) are shown in Fig. 2(a).Two different levels of discretization mesh have been employed to avoid the inverse crime [39].The forward mesh contains 281400 linear tetrahedral elements that are joined at 49506 nodes, while the inverse mesh contains 137113 linear tetrahedral elements that are joined at 24505 nodes.They are depicted in Fig. 2

(b) and (c).
A total of one hundred samples are simulated for the training set and an additional forty samples for the validation set.These samples are generated using the NIRFAST software package 13 (rows) in the y-direction.The detector array is arranged identically to the source array on another plane, with each detector positioned opposite its corresponding source in a mirror arrangement.The positions of sources and detectors (SD) are shown in Fig. 2(a).Two different levels of discretization mesh have been employed to avoid the inverse crime [39].The forward mesh contains 281400 linear tetrahedral elements that are joined at 49506 nodes, while the inverse mesh contains 137113 linear tetrahedral elements that are joined at 24505 nodes.They are depicted in Fig. 2(b) and (c).A total of one hundred samples are simulated for the training set and an additional forty samples for the validation set.These samples are generated using the NIRFAST software package with 1% random noise.The number of anomalies in each sample ranged from one to two, with diameters randomly set between 8 -16 and the absorption coefficients randomly distributed between 0.008 -1 and 0.035 -1 .The positions of the anomalies are also randomized.The network is trained using an initial learning rate of 0.001 and a batch size of one sample.The AdamW optimizer is employed and the learning rate is updated using cosine annealing warm restarts [40].Training is terminated after 200 epochs and the parameters that minimized the loss function error on the validation set were saved as the final model.
In our experiments, we set the regularization parameter for all three reconstruction methods (LM, GCNLM, GATLM) to be 0.01.The stopping criterion is defined as a data-model misfit difference that does not improve by more than 6% compared to the previous iteration.GCN and GAT are implemented using PyTorch Geometric (1.12.0) framework [41].with 1% random noise.The number of anomalies in each sample ranged from one to two, with diameters randomly set between 8 − 16mm and the absorption coefficients randomly distributed between 0.008mm −1 and 0.035mm −1 .The positions of the anomalies are also randomized.The network is trained using an initial learning rate of 0.001 and a batch size of one sample.The AdamW optimizer is employed and the learning rate is updated using cosine annealing warm restarts [40].Training is terminated after 200 epochs and the parameters that minimized the loss function error on the validation set were saved as the final model.
In our experiments, we set the regularization parameter for all three reconstruction methods (LM, GCNLM, GATLM) to be 0.01.The stopping criterion is defined as a data-model misfit difference that does not improve by more than 6% compared to the previous iteration.GCN and GAT are implemented using PyTorch Geometric (1.12.0) framework [41].

Phantom experiment
A real experimental phantom data are acquired from a compact fiber-free parallel-plane multiwavelength diffuse optical tomography system proposed by Wang et.al [42].It is a continuous wave mode system with a parallel-plane structure, and more detailed descriptions can be found in [42].
The phantom is a semi-cylindrical phantom which is similar to the morphology of breast.It composes of two homogeneous polyformaldehyde (POM) blocks with the same size.The SD setting is shown in Fig. 3(a).The diameter of the POM block is 128.5 mm and its thickness is 22.0 mm.The diameter of the two cavities is 7.5 mm, the depth is 10.0 mm, and the center-to-center distance is 40.0 mm.The corresponding position schematic diagram is shown in Fig. 3(b).These two cavities are used to add a mixture of Intralipid and Indian ink, whose optical parameters are different from the POM phantom.In the experiment, the wavelength of light is 660 nm.At this wavelength, the absorption coefficient of the POM phantom is µ a = 0.004mm −1 , and the absorption coefficients of the mixture in the two cavities are µ a,1 = 0.03mm −1 and µ a,2 = 0.01mm −1 respectively.We called them as Anomaly 1 and Anomaly 2. They have the same scattering coefficients, i.e. µ s = 1mm −1 .Here, the finite element mesh contains 114,603 linear tetrahedral elements and 22,134 nodes.

Quantitative evaluation metrics
Quantitative evaluation is performed using five evaluation metrics: mean square error (MSE), relative error (RE), range of absolute error (RAE), mean absolute error (MAE), number of iterations (NI), and time consumption (T).If the reconstructed values are equal to the ground truth, the MSE, RE, MAE, and RAE are 0.
MSE is the mean squared error between the recovered result and the ground truth, which is defined as the mean squared error of all nodes.Its formula is 3(b).These two cavities are used to add a mixture of Intralipid and Indian ink, whose optical parameters are different from the POM phantom.In the experiment, the wavelength of light is 660nm.At this wavelength, the absorption coefficient of the POM phantom is   = 0.004  -1 , and the absorption coefficients of the mixture in the two cavities are  ,1 = 0.03 -1 and  ,2 = 0.01 -1 respectively.We called them as Anomaly 1 and Anomaly 2. They have the same scattering coefficients, i.e.   = 1 -1 .Here, the finite element mesh contains 114,603 linear tetrahedral elements and 22,134 nodes.where µ i represents the reconstructed optical parameters of the ith node of the domain, µ i t denotes the real optical parameters of the ith node.N is the number of finite element nodes in the domain.
RAE, range of absolute error, is the sum of the absolute error of the max and the min values of optical parameters between the reconstructed result and the ground truth.The formula is defined as: MAE is the mean absolute error between the reconstructed optical parameters and the ground truth.The formula is

Results
The diffusion equation is solved by the finite element method using MATLAB (R2022a) and NIRFAST software package [23].The networks are implemented in Python (version 3.8) with Pytorch (version 1.11.0)library [43].The simulation experiments are run on a personal computer with Intel Core processor i5-3330 CPU @ 3.00 GHz and 4 GB of memory (RAM) GPU: GTX1660.

Noise tolerance study
A noise tolerance study is considered in this section.The spherical anomaly with a radius of 7mm is located at the center of the domain, with coordinates at (70.0mm, 40.0mm, 20.0mm).
The optical properties of the anomaly are µ a = 0.004mm −1 and µ s = 1mm −1 respectively.Three levels of random noise are added to the measurements, corresponding to 0%, 1%, and 2% of the measurements.Data with noiseless and 2% noise are outside the distribution of training data.The reconstructed absorption coefficient images of the LM, GCNLM and GATLM with different levels of noise are shown in Fig. 4. Visually, the reconstruction results of the GCNLM and GATLM are significantly improved compared to the LM.As noise increases, the reconstruction results of the LM exhibit more artifacts and lower reconstructed absorption coefficient values in anomaly area.Because there is 1% noise within the range of training data, the GCNLM and GATLM achieved their best results with 1% noise compared to 0 and 2% noise.However, the GCNLM bring about the artifacts in the reconstructed results under 0 and 2% noise, especially under 2% noise.In contrast, the GATLM could effectively suppress the artifacts under 0 and 2% noise in the reconstruction results.According to Table 2, the MSEs of GATLM method are the lowest for the two experiments outside the training data distribution (no noise and 2% noise).For the experiment with 1% noise, namely the experiment within the training data distribution, the MSE of GCNLM is the lowest.In all three experiments, the MAE and RAE values of GCNLM are both the lowest.The evaluation metrics of GCNLM are good for all levels of noise, and this means it achieves good absorption coefficient values which are close to the true distribution.The results obtained by GATLM are smoother and have lower absorption coefficient values compared to GCNLM.The number of iterations for GCNLM and GATLM at all levels of noise is 4 and 2 respectively, which are smaller than LM.These two methods reduce overall reconstruction time.
respectively.Three levels of random noise are added to the measurements, corresponding to 0%, 1%, and 2% of the measurements.Data with noiseless and 2% noise are outside the distribution of training data.a Note: The best results are in bold.

Influence of anomaly position and size
In this section, we investigate the influence of anomaly position and size on the reconstruction.The anomaly with the radius at 8 mm and 10 mm is considered.In addition, we also study the situation where the anomaly is located in (30.0mm, 20.0mm, 20.0mm), (70.0mm, 40.0mm, 20.0mm), and (100.0mm,25.0mm, 20.0mm) respectively.The other properties of the anomaly are the same as Section 3.3.1 1% Gaussian random noise is added to the measurements.Figure 5 shows the recovered results for the three methods in these cases.When the radius of single spherical anomaly is 8 mm, it is obvious that GCNLM and GATLM are significantly improved the reconstruction results compared to LM for three different positions according to Fig. 5(a).And the number of iterations for GCNLM and GATLM is 4 and 2 respectively.They are less than that of LM, reducing overall reconstruction time greatly.When the radius of the anomaly is 10 mm, the same conclusion can be obtained too.Regardless of which of the three positions the target is located in, GCNLM and GATLM have provided the improved results compared to LM.It further demonstrates that the proposed approaches can provide good results regardless of the location of the anomaly.
different levels of noise are shown in Fig. 4. Visually, the reconstruction results of the GCNLM and GATLM are significantly improved compared to the LM.As noise increases, the reconstruction results of the LM exhibit more artifacts and lower reconstructed absorption coefficient values in anomaly area.Because there is 1% noise within the range of training data, the GCNLM and GATLM achieved their best results with 1% noise compared to 0 and 2% noise.However, the GCNLM bring about the artifacts in the reconstructed results under 0 and 2% noise, especially under 2% noise.In contrast, the GATLM could effectively suppress the artifacts under 0 and 2% noise in the reconstruction results.According to Table 2, the MSEs of GATLM method are the lowest for the two experiments outside the training data distribution (no noise and 2% noise).For the experiment with 1% noise, namely the experiment within the training data distribution, the MSE of GCNLM is the lowest.In all three experiments, the MAE and RAE values of GCNLM are both the lowest.The evaluation metrics of GCNLM are good for all levels of noise, and this means it achieves good absorption coefficient values which are close to the true distribution.The results obtained by GATLM are smoother and have lower absorption coefficient values compared to GCNLM.The number of iterations for GCNLM and GATLM at all levels of noise is 4 and 2 respectively, which are smaller than LM.These two methods reduce overall reconstruction time.

Multi-anomalies reconstruction
Two spherical anomalies with coordinates at (40.0mm, 40.0mm, 20.0mm) and (80.0mm, 40.0mm, 20.0mm) are considered in this case.Both anomalies have a radius of 7 mm and an absorption coefficient of 0.001mm −1 .1% noise is added to the measurements.The recovered results are shown in Fig. 6, Fig. 7 and Table 3. From Fig. 6(b), the results by GCNLM and GATLM are improved and they are close to the true distribution.The reconstructed values of GCNLM are closer to the true values than GATLM, but the reconstructed distribution of Anomaly 2 is not as good as Anomaly 1. Figure 7 shows the values along the black dashed line in Fig. 6(b).It is clear that the reconstructed absorption coefficient distribution of GCNLM is closer to the ground truth.
According to Table 3, the MSE, MAE and RAE values of the GCNLM are the lowest, which is consistent with the results in Fig. 6(b).The number of iterations for GCNLM and GATLM is 4 and 2 respectively, which are less than LM.So they reduced the overall reconstruction time.
respectively.They are less than that of LM, reducing overall reconstruction time greatly.When the radius of the anomaly is 10mm, the same conclusion can be obtained too.Regardless of which of the three positions the target is located in, GCNLM and GATLM have provided the improved results compared to LM.It further demonstrates that the proposed approaches can provide good results regardless of the location of the anomaly.

Multi-anomalies reconstruction
Two spherical anomalies with coordinates at (40.0, 40.0, 20.0) and (80.0, 40.0, 20.0) are considered in this case.Both anomalies have a radius of 7 mm and an absorption coefficient of 0.001 -1 .1% noise is added to the measurements.The recovered results are shown in Fig. 6, Fig. 7 and Table 3. From Fig. 6(b), the results by GCNLM and GATLM are improved and they are close to the true distribution.The reconstructed values of GCNLM are closer to the true values than GATLM, but the reconstructed distribution of Anomaly 2 is not as good as Anomaly 1. Fig. 7 shows the values along the black dashed line in Fig. 6(b).It is clear that the reconstructed absorption coefficient distribution of GCNLM is closer to the ground truth.According to Table 3, the MSE, MAE and RAE values of the GCNLM are the lowest, which is consistent with the results in Fig. 6(b).The number of iterations for GCNLM and GATLM is 4 and 2 respectively, which are less than LM.So they reduced the overall reconstruction time.Fig. 7.The profiles along the black dotted lines in Fig. 6(b).The blue dash line is the ground truth, the green solid line is the result by GCNLM, the red dash-dot line is the result by GATLM, and the purple dotted line is the result by LM.

Reconstruction with heterogeneous backgrounds
We considered experiments with heterogeneous backgrounds in this part.Different levels of fat fraction (5%, 15%, 38%, 44%) are considered to simulate the heterogeneous backgrounds [44].The parameter setting of anomaly are the same as Section 3.1.1,except the noise level at 0.1%.The optical parameter of fat is set to   = 0.0025 -1 and   = 0.7 -1 .The recovered results for three methods are shown in Fig. 8 and Table 4. Obviously, there are some artificial artifacts in the recovered results by LM compared to GCNLM and GATLM.According to Fig. 8, GATLM has provided better results than GCNLM and LM under the four different densities.The quantitative results in Table 4 have also drawn this conclusion.It should be noted that the training of the GNNLM utilized phantoms with homogeneous background, and not with heterogeneous background.As such, the evaluation in this case was carried out using a "out of distribution" target.Fig. 8(d) is superior to Fig. 8(c) according to the image contrast, which indicates that GATLM has better generalization ability than GCNLM.
Ground truth GATLM GCNLM LM Fig. 7.The profiles along the black dotted lines in Fig. 6(b).The blue dash line is the ground truth, the green solid line is the result by GCNLM, the red dash-dot line is the result by GATLM, and the purple dotted line is the result by LM. a Note: The best results are in bold.

Reconstruction with heterogeneous backgrounds
We considered experiments with heterogeneous backgrounds in this part.Different levels of fat fraction (5%, 15%, 38%, 44%) are considered to simulate the heterogeneous backgrounds [44].The parameter setting of anomaly are the same as Section 3.1.1,except the noise level at 0.1%.The optical parameter of fat is set to µ a = 0.0025mm −1 and µ s = 0.7mm −1 .The recovered results for three methods are shown in Fig. 8 and Table 4. Obviously, there are some artificial artifacts in the recovered results by LM compared to GCNLM and GATLM.According to Fig. 8, GATLM has provided better results than GCNLM and LM under the four different densities.The quantitative results in Table 4 have also drawn this conclusion.It should be noted that the training of the GNNLM utilized phantoms with homogeneous background, and not with heterogeneous background.As such, the evaluation in this case was carried out using a "out of distribution" target.Figure 8(d) is superior to Fig. 8(c) according to the image contrast, which indicates that GATLM has better generalization ability than GCNLM. .The blue dash line is the ground truth, the green solid line is the result by GCNLM, the red dash-dot line is the result by GATLM, and the purple dotted line is the result by LM.

Reconstruction with heterogeneous backgrounds
We considered experiments with heterogeneous backgrounds in this part.Different levels of fat fraction (5%, 15%, 38%, 44%) are considered to simulate the heterogeneous backgrounds [44].The parameter setting of anomaly are the same as Section 3.1.1,except the noise level at 0.1%.The optical parameter of fat is set to   = 0.0025 -1 and   = 0.7 -1 .The recovered results for three methods are shown in Fig. 8 and Table 4. Obviously, there are some artificial artifacts in the recovered results by LM compared to GCNLM and GATLM.According to Fig. 8, GATLM has provided better results than GCNLM and LM under the four different densities.The quantitative results in Table 4 have also drawn this conclusion.It should be noted that the training of the GNNLM utilized phantoms with homogeneous background, and not with heterogeneous background.As such, the evaluation in this case was carried out using a "out of distribution" target.Fig. 8(d) is superior to Fig. 8(c) according to the image contrast, which indicates that GATLM has better generalization ability than GCNLM.better results than GCNLM from the quantitative indicators.Because almost the indicators of GATLM are better than that of GCNLM except for RAE2, NI and time.GCNLM iterated twice and GATLM iterated four times.Their iteration times are less than LM and their reconstruction time is also shorter than LM.

Phantom results
Note: The best results are in bold.GCNLM and GATLM have significantly improved the reconstruction results, and the results of GATLM are better than that of GCNLM considering the similarity to the true distribution.In Fig. 10, the profiles along the black dashed lines in Fig. 9(b) demonstrate that the reconstructed absorption coefficient distribution of GATLM is closer to the ground truth than that of GCNLM.Here, we employed the mean square error of each anomaly region (denoted as MSE 1 and MSE 2 ), range of absolute error of each anomaly region (denoted as RAE 1 and RAE 2 ) and the mean absolute error of each anomaly region (denoted as MAE 1 and MAE 2 ).According to Table 5, GATLM has provided the better results than GCNLM from the quantitative indicators.Because almost the indicators of GATLM are better than that of GCNLM except for RAE2, NI and time.GCNLM iterated twice and GATLM iterated four times.Their iteration times are less than LM and their reconstruction time is also shorter than LM.

Discussion
This paper proposes a model-based deep learning method for DOT by incorporating a graph neural network into the non-linear iterative process.The graph neural network method can operate directly on the finite element mesh, eliminating interpolation.The time for a single iteration of the LM with the added graph neural network is the same as that for only LM.
Through training on a small simulation dataset, the GNNLM can provide good reconstruction results in both simulation and real experiments, with fewer iterations and less total reconstruction time compared to LM.We implemented two types of graph neural networks in this work, namely GCN and GAT.The two networks exhibit strong robustness, since they are not sensitive to noise and model mismatch.This method also has good generalization

Conclusion
This work proposes a graph neural network LM algorithm which employs a graph data structure to represent the finite element mesh.Specifically, a graph neural network is utilized as the network architecture to operate directly on the finite element mesh.A graph neural network is

Discussion
This paper proposes a model-based deep learning method for DOT by incorporating a graph neural network into the non-linear iterative process.The graph neural network method can operate directly on the finite element mesh, eliminating interpolation.The time for a single iteration of the LM with the added graph neural network is the same as that for only LM.Through training on a small simulation dataset, the GNNLM can provide good reconstruction results in both simulation and real experiments, with fewer iterations and less total reconstruction time compared to LM.We implemented two types of graph neural networks in this work, namely GCN and GAT.The two networks exhibit strong robustness, since they are not sensitive to noise and model mismatch.This method also has good generalization.The network parameters learned from the training data generated by simulating a rectangular can still be used directly on breast-like shapes.GCN learns the entire graph so it has strong representation ability for the same graph structure.However, GAT only considers its neighboring nodes so it has strong generalization on different graph structures.In the simulation experiments within the training data, the recovered results by GCNLM have the lowest MSE, MAE and RAE values.And they are closer to the true distribution.But in the simulation experiments outside the training data, the recovered results by GATLM have lower error indicators.And they are closer to the true distribution.The same conclusion can also be obtained in the real phantom experiment.This means that GCNLM performs better for experiments within the training data distribution, while GATLM shows better generalization than GCNLM.

Conclusion
This work proposes a graph neural network LM algorithm which employs a graph data structure to represent the finite element mesh.Specifically, a graph neural network is utilized as the network architecture to operate directly on the finite element mesh.A graph neural network is added after each iteration of LM.Its input is the current iteration's absorption coefficient distribution and its variation.And its output is the recombined absorption coefficient distribution.Simulation experiments and real phantom experiments have verified the GCNLM and GATLM.The GCNLM performs best in the simulation experiments within the training data distribution.But in the simulation experiments outside the training data distribution, the GATLM exhibits superior performance.The real phantom experiments with breast-like imaging domains has the same recovered results.The GATLM has good generalization.Due to the difficulty in obtaining a large amount of real data, this generalization provides a possibility for clinical application.

Fig. 1 .
Fig. 1.The GNNLM method framework for solving the inverse problem of DOT.(a) the first two iterations of the GNNLM method, where the GNN block can be replaced with other graph neural network variants.(b) the k-th graph neural network block.In this graph neural network block, the input feature matrix H k,0 is the concatenation and merging of µ a,k and ∆µ a,k , and the output H k,3 is µ a,k+1 .

Fig. 2 .
Fig. 2. (a) The setup of position of sources (red crosses) and detectors (blue circles).(b) The mesh for the forward problem.(c) The mesh of the inverse problem.

Fig. 2 .
Fig. 2. (a) The setup of position of sources (red crosses) and detectors (blue circles).(b) The mesh for the forward problem.(c) The mesh of the inverse problem.

Fig. 3 .Fig. 3 .
Fig. 3. Geometric sketch of the phantom.(a) SD location.(b) The location of two anomalies.2.8 Quantitative evaluation metricsQuantitative evaluation is performed using five evaluation metrics: mean square error (MSE), relative error (RE), range of absolute error (RAE), mean absolute error (MAE), number of iterations (NI), and time consumption (T).If the reconstructed values are equal to the ground truth, the MSE, RE, MAE, and RAE are 0.MSE is the mean squared error between the recovered result and the ground truth, which is defined as the mean squared error of all nodes.Its formula is = ∑  =1   -   2 /(14) where   represents the reconstructed optical parameters of the ith node of the domain,    denotes the real optical parameters of the ith node. is the number of finite element nodes in the domain.RAE, range of absolute error, is the sum of the absolute error of the max and the min values of optical parameters between the reconstructed result and the ground truth.The formula is defined as:  = |() -(  )| + |() -(  )| (15) MAE is the mean absolute error between the reconstructed optical parameters and the ground truth.The formula is  = ∑   =1 |  -   |/ (16)3.Results

Fig. 4 .
Fig. 4. The recovered results of LM method, GCNLM and GATLM with varying noise levels.Note that the samples with 1% noise are used during the training.(a) Illustration of the slice at  = 20 of the phantom (the gray plane).(b) The ground truth at the slice of  = 20.(c) The reconstructed images for three levels of noise by the three methods.The top row is LM, the middle row is GCNLM and the last row is GATLM.

Fig. 4 .
Fig. 4. The recovered results of LM method, GCNLM and GATLM with varying noise levels.Note that the samples with 1% noise are used during the training.(a) Illustration of the slice at z = 20mm of the phantom (the gray plane).(b) The ground truth at the slice of z = 20mm.(c) The reconstructed images for three levels of noise by the three methods.The top row is LM, the middle row is GCNLM and the last row is GATLM.

Fig. 5 .Fig. 5 .
Fig. 5. Reconstruction results of the anomaly with different sizes and positions.1% noise is added in the measurements.(a) The recovered results for the radius of anomaly is 8.From left to right: the illustration of the distribution of the slice at  = 20, the ground truth, the results with LM, GCNLM and GATLM.(b) The recovered results for the radius of anomaly is 10.From left to right: the illustration of the Fig. 5. Reconstruction results of the anomaly with different sizes and positions.1% noise is added in the measurements.(a) The recovered results for the radius of anomaly is 8mm.From left to right: the illustration of the distribution of the slice at z = 20mm, the ground truth, the results with LM, GCNLM and GATLM.(b) The recovered results for the radius of anomaly is 10mm.From left to right: the illustration of the distribution of the slice at z = 20mm, the ground truth, the results with LM, GCNLM and GATLM.

Fig. 6 .
Fig. 6.Reconstruction results of two same sizes and   in the domain.The measurements add 1% noise.(a) Illustration of the distribution of the slice.The slice is located at the central xy-plane of the phantom.The blue sphere is anomaly 1 at left, and red sphere is anomaly 2 at right.(b) From left to right: the ground truth, reconstruction results with LM, GCNLM and GATLM.The black dash line in the middle of each image is the 1D cross section.

Fig. 6 .
Fig. 6.Reconstruction results of two same sizes and µ a in the domain.The measurements add 1% noise.(a) Illustration of the distribution of the slice.The slice is located at the central xy-plane of the phantom.The blue sphere is anomaly 1 at left, and red sphere is anomaly 2 at right.(b) From left to right: the ground truth, reconstruction results with LM, GCNLM and GATLM.The black dash line in the middle of each image is the 1D cross section.

Fig. 7 .
Fig.7.The profiles along the black dotted lines in Fig.6(b).The blue dash line is the ground truth, the green solid line is the result by GCNLM, the red dash-dot line is the result by GATLM, and the purple dotted line is the result by LM.

Fig. 8 .
Fig. 8.The reconstruction results of different densities.(a) Ground truth.(b)-(d) are the recovered results by LM, GCNLM, and GATLM of the slices at z=20mm.

Fig. 8 .
Fig. 8.The reconstruction results of different densities.(a) Ground truth.(b)-(d) are the recovered results by LM, GCNLM, and GATLM of the slices at z = 20 mm.

Figure 9 (
Figure 9(a) shows the recovered results through two slices, namely the yz plane with x = 17.0mm and the xy plane with z = 110.0mm.From Fig. 9(b), it is clear that the absorption coefficient reconstructed by LM is very low and the image is almost indiscernible.GCNLM and GATLMhave significantly improved the reconstruction results, and the results of GATLM are better than that of GCNLM considering the similarity to the true distribution.In Fig.10, the profiles along the black dashed lines in Fig.9(b) demonstrate that the reconstructed absorption coefficient distribution of GATLM is closer to the ground truth than that of GCNLM.Here, we employed the mean square error of each anomaly region (denoted as MSE 1 and MSE 2 ), range of absolute error of each anomaly region (denoted as RAE 1 and RAE 2 ) and the mean absolute error of each anomaly region (denoted as MAE 1 and MAE 2 ).According to Table5, GATLM has provided the

Fig. 9 (
Fig. 9(a) shows the recovered results through two slices, namely the yz plane with  = 17.0 and the xy plane with  = 110.0.From Fig. 9(b), it is clear that the absorption coefficient reconstructed by LM is very low and the image is almost indiscernible.GCNLM and GATLM have significantly improved the reconstruction results, and the results of GATLM are better than that of GCNLM considering the similarity to the true distribution.In Fig.10, the profiles along the black dashed lines in Fig.9(b) demonstrate that the reconstructed absorption coefficient distribution of GATLM is closer to the ground truth than that of GCNLM.Here, we employed the mean square error of each anomaly region (denoted as MSE 1 and MSE 2 ), range of absolute error of each anomaly region (denoted as RAE 1 and RAE 2 ) and the mean absolute error of each anomaly region (denoted as MAE 1 and MAE 2 ).According to Table5, GATLM has provided the better results than GCNLM from the quantitative indicators.Because almost the indicators of GATLM are better than that of GCNLM except for RAE2, NI and time.GCNLM iterated twice and GATLM iterated four times.Their iteration times are less than LM and their reconstruction time is also shorter than LM.

Fig. 9 .
Fig. 9. Reconstruction results of the real phantom experiment.(a)The schematic of two slices.Slice 1 is the yz plane at  = 17.0.Slice 2 is the xy plane at  = 110.0.(b)The first row is the absorption coefficient image under slice 1 and the second row is the absorption coefficient image under slice 2. From left to right: the ground truth, reconstruction results by LM, GCNLM and GATLM.The black dash lines in the middle of each image is the 1D cross section.

Fig. 9 .
Fig. 9. Reconstruction results of the real phantom experiment.(a)The schematic of two slices.Slice 1 is the yz plane at x = 17.0mm.Slice 2 is the xy plane at z = 110.0mm.(b)The first row is the absorption coefficient image under slice 1 and the second row is the absorption coefficient image under slice 2. From left to right: the ground truth, reconstruction results by LM, GCNLM and GATLM.The black dash lines in the middle of each image is the 1D cross section.

Fig. 10 .
Fig. 10.(a)The profiles along the black dotted lines of Slice 1 in Fig. 9(b).(b)The profiles along the black dotted lines of Slice 2 in Fig. 9(b).The blue dash line is the ground truth, the green solid line is the result by GCNLM, the red dash-dot line is the result by GATLM, and the purple dotted line is the result by LM.
. The network parameters learned from the training data generated by simulating a rectangular can still be used directly on breast-like shapes.GCN learns the entire graph so it has strong representation ability for the same graph structure.However, GAT only considers its neighboring nodes so it has strong generalization on different graph structures.In the simulation experiments within the training data, the recovered results by GCNLM have the lowest MSE, MAE and RAE values.And they are closer to the true distribution.But in the simulation experiments outside the training data, the recovered results by GATLM have lower error indicators.And they are closer to the true distribution.The same conclusion can also be obtained in the real phantom experiment.This means that GCNLM performs better for experiments within the training data distribution, while GATLM shows better generalization than GCNLM.

Fig. 10 .
Fig. 10.(a)The profiles along the black dotted lines of Slice 1 in Fig. 9(b).(b)The profiles along the black dotted lines of Slice 2 in Fig. 9(b).The blue dash line is the ground truth, green solid line is the result by GCNLM, the red dash-dot line is the result by GATLM, and the purple dotted line is the result by LM.

Funding.
National Natural Science Foundation of China (12271434, 61906154, 62271394); Youth Innovation Team of Shaanxi Provincial Department of Education (21JP123); National Key Research and Development Program of China (2016YFC0103800); Natural Science Basic Research Program of Shaanxi Province (2023-JC-JQ-57); Research Fund for Young Star of Science and Technology in Shaanxi Province (2023KJXX-035).

Table 1 . Pseudo-code of GNNLM Algorithm
1Training the graph neural network with Levenberg-Marquardt method Generate the train and validation sets with noise using NIRFAST.

Table 2 Error metrics for three levels of noise Note
: The best results are in bold.

Table 5 ,
GATLM has provided the

Table 4 . Error metrics for four different densities a
a Note: The best results are in bold.
The black dash lines in the middle of each image is the 1D cross section.

Table 5 . Error metrics with real phantom experiment a
a Note: The best results are in bold.Note: The best results are in bold.