Attention mechanism-based locally connected network for accurate and stable reconstruction in Cerenkov luminescence tomography

: Cerenkov luminescence tomography (CLT) is a novel and highly sensitive imaging technique, which could obtain the three-dimensional distribution of radioactive probes to achieve accurate tumor detection. However, the simplified radiative transfer equation and ill-conditioned inverse problem cause a reconstruction error. In this study, a novel attention mechanism based locally connected (AMLC) network was proposed to reduce barycenter error and improve morphological restorability. The proposed AMLC network consisted of two main parts: a fully connected sub-network for providing a coarse reconstruction result, and a locally connected sub-network based on an attention matrix for refinement. Both numerical simulations and in vivo experiments were conducted to show the superiority of the AMLC network in accuracy and stability over existing methods (MFCNN, KNN-LC network). This method improved CLT reconstruction performance and promoted the application of machine learning in optical imaging research.


Introduction
Cerenkov luminescence imaging (CLI) is a novel optical molecular imaging method based on the Cerenkov radiation of nuclides [1]. Combining the advantages of optical imaging and radionuclide imaging [2], CLI has high sensitivity and facilitates clinical transformation [3,4]. Recently, CLI has been widely used in preclinical and clinical research, such as apoptosis visualization [5], tumor resection [6], monitoring drug therapy [7]. However, CLI is a planar imaging method, which could not restore the internal distribution of radioactive probes [8]. Therefore, Cerenkov luminescence tomography (CLT) has been developed [9][10][11].
CLT matches the surface optical information with the structure information. Utilizing the efficient photon transmission model and image reconstruction algorithm, the three-dimensional (3D) distribution of specific molecular probes in biological tissue could be restored from the surface optical signals. CLT reconstruction consists of two main steps: forward problem solving and inverse problem solving [12][13][14]. To solve the forward problem, a mathematical model is conducted to describe the photon transmission process and establish the mapping from the 3D distribution of the source to the surface signal. The inverse problem solving is to restore

Machine learning for CLT reconstruction
CLT reconstruction methods based on machine learning use neural network to establish endto-end mapping and directly reconstruct the internal source. Thus, the deviation between the simplified photon propagation process and the real photon propagation process could be avoided. Parameters of the nonlinear mapping are directly obtained from statistical learning. The objective function is described as follows: min (1) where f s represents the reconstruction system. θ is the system weight which is updated by minimizing the mean square error. The system input is the surface photon intensity X and the output is the reconstructed source Y.

Structure of the AMLC network
AMLC network consisted of two main parts: the FC sub-network for the acquisition of coarse results and the LC sub-network for residual optimization (Fig. 1). Attention mechanism was introduced to optimize the residual between coarse results and targets. Specifically, surface photon density was first fed to the FC sub-network to obtain a coarse source. Then the residual was generated from the LC sub-network, and it was combined with the coarse result into the final reconstruction result. The whole CLT reconstruction procedure can be defined as: where f FC is the FC sub-network with weights θ 1 , while f LC represents the LC sub-network with weights θ 2 . The output of the FC sub-network is Y FC . The system input is the surface photon intensity X, and the output is the reconstructed source Y. A parameter λ is added to balance the two losses, which is set to 0.2 in our experiment. The FC sub-network used four fully connected layers to obtain the coarse results. The activation function was Rectified linear unit (ReLU).
The LC sub-network consisted of five locally connected layers. The number of neurons in each layer was consistent with the dimension of the coarse results. Therefore, the output nodes of each layer and the input nodes of the next layer could form a square matrix. AMLC network used photon intensity as a restriction. Each dimension in the coarse results recorded photon intensity. Based on the attention mechanism, important dimensions were selected as the main nodes for connection.
Specifically, the photon intensity in each dimension was summed from the coarse results and labels. A SoftMax function was applied to calculate the probability. Due to the huge cardinality, plenty of vertices were closed to zero. For the nonzero locations, the value was set to 1 to indicate that they were more important. After that, two sets of one-dimensional vectors were obtained with values of only 0 and 1. The vector from the coarse results was used to constrain the input nodes, while the vector from the labels was applied to constrain the output nodes. Nodes were disconnected only if both input and output values were 0. The attention matrix A is defined as: where a ij shows the relationship between the input and output nodes. V CRi represents the i th dimension of the vector calculated by the coarse results. V Lj indicates the j th dimension of the label vector. When V CRi and V Lj are both 0, nodes i and j are disconnected, otherwise they are connected.
In this study, the whole numerical mouse head model was composed of 11494 single discrete points, including 1965 coordinate points for the brain and 2723 coordinate points for the surface tissue. Based on the distribution of surface vertices, the number of neurons in the first layer of the FC sub-network was set to 2723. The mean square error was adopted as the loss function. AMLC network was trained for 400 epochs with a batch size of 128. The optimization function was Adam optimizer with a learning rate of 0.001, β 1 of 0.9, and β 2 of 0.99.

Optical parameters and simulation dataset
As a data-driven method, AMLC network needed plenty of samples for model training. Monte Carlo simulation was used as an alternative method to overcome the difficulty of obtaining a large amount of in vivo experimental data. As mentioned above, the finite element method was used to divide the numerical mouse head into a standard mesh. Three main organs were chosen from the segmented CT data: muscle, skull, and brain, to simulate photon propagation in the head. The absorption coefficient µa and the scattering coefficient µs were acquired from previous study in Table 1 [30]. With Monte Carlo simulation method, thousands of samples were collected to train the network. Each sample was obtained from a standard mesh. In the previous research, small sample size, single sample type and rough assembly affected the reconstruction results to some extent. In this study, we enlarged the sample size, changed the source shape, optimized the big-source generation strategy, and generated a dataset dominated by dual-source samples. 100,000 photons were assigned to each generated source sample. The photon wavelength was set to 650nm to ensure appropriate penetration and photon intensity [19,20]. Furthermore, different types of sources were assigned to increase data diversity. 439 centers of ellipsoid sources, 418 centers of cylinder sources, and 393 centers of cuboid sources were collected in the reconstruction permissible region ( Table 2). Simulations of each center were repeated for 4 times to ensure the randomness of photon propagation. In conclusion, 5000 single-source samples were obtained with three different shapes and 1250 different locations. Dual-source samples were assembled by single-source samples to improve the universality of CLT reconstruction. Different positions were selected and combined randomly to generate the composite sources. Specifically, each single-source sample had a corresponding central coordinate. In the reconstruction feasible region, two samples of single-source with different coordinates were randomly selected for combination. At the same time, a weighted sum of source intensities was performed to get the photon intensity of a dual-source sample. The assembly formulas of surface photon intensity and actual source of multi-source samples were as follows: where X mul represents the assembled surface photon and Y mul represents the assembled source. S S is the set of the generated single-source samples. Two single-source samples (n = 2) are selected randomly to assemble the dual-source sample. The structure of big-source sample was similar to that of dual-source. A single-source sample was selected as the center to assemble the nearest single-source samples into a whole [31]. Considering the randomness of central positions and sizes, only a certain quantity of the assembly was achievable. Therefore, another 500 different centers were generated to avoid the assembly errors. The details of sample parameters and sizes were shown in Table 3. In conclusion, we generated 5000 single-source samples, 7800 dual-source samples (3000 spheroid sources, 2400 cylindroid sources, and 2400 cuboid sources), and 2000 big-source samples. 80% of the dataset was used for training and 20% for testing. The simulation dataset was generated in Molecular Optical Simulation Environment (MOSE 2.3) [32].

Numerical simulation experiments
Based on the standard mesh, several numerical simulation experiments were carried out to evaluate the performance of AMLC network. MFCNN and KNN-LC network were used for comparisons. Simulation experiments of single-source, dual-source and big-source were designed to verify the location accuracy and morphology restorability. Furthermore, a group of sources at different depth was generated to explore the effect of depth on reconstruction. Also, we designed a dual-source distance experiment. The center distance was set at 2.5, 3.0 and 3.5 mm to evaluate the reconstruction performance in close sources. Besides, the anti-noise simulation experiment evaluated the robustness of AMLC network. Jupyter Notebook and Python 3.6 were used to train and test three networks. All the numerical simulation experiments were conducted on a personal computer with an Intel Core i7 CPU (4.00 GHz) and an NVidia GTX1080 Ti GPU.

In vivo experiments
Cell line U87MG-Luc-GFP was used to conduct the tumor model. 11 C-methionine ( 11 C-MET) was chosen as the probe to generate Cerenkov photons. The centrifuged cells (approximately 4 × 10 6 ) were mixed with an equal volume of phosphate buffered saline and injected into the mouse head through a micro-injector [30]. BLI was used to observe tumor growth every three days. The glioma model was applied to CLT in vivo experiments after one week of growth.
The MRI data (M3, Aspect Imaging, Israel) was obtained firstly. Then, in vivo CLI was obtained from a pentamodal imaging system [33]. Meanwhile, small animal PET (GENISYS4, Sofie Biosciences, USA) was applied to verify the radionuclide accumulation. Considering the extremely short half-life of 11 C-MET, the mouse was transferred to PET acquisition immediately. CT data was obtained from a micro-CT system to acquire the structural information of mouse head. Based on CT data, CLI data was mapped to the standard mesh as network input. In vivo experiments were carried out under 1% isoflurane -oxygen mixture gas anesthesia. All the animal experiments were performed under the guidelines of the Institutional Animal Care and Use Committee of the Fifth Affiliated Hospital, Sun Yat-sen University.
After in vivo imaging, glioma mouse model was sacrificed. First, green fluorescent protein (GFP) images were collected from the frozen sections of the mouse head. Then the frozen sections were stained with hematoxylin and eosin (H&E). Since cell line was labeled with green fluorescent protein (GFP), GFP images and H&E stained results were considered as the gold standard.

Evaluation index
The reconstruction results were quantitatively evaluated with the barycenter error (BCE) [31] and Dice index [30]. BCE was used to measure the location error between the reconstructed result and the actual source. The equation was shown as follows: where S is the set of the standard mesh vertex. c i represents the vertex coordinates and l i is the intensity. P r and P t represent the weighted photon intensity of reconstructed source and actual source, respectively. A smaller deviation is expected. Also, Dice was applied to assess shape recovery capability. The value of Dice is between 0 and 1. Large Dice represented high morphological similarity and excellent reconstruction performance. The equation was defined: where A represents the set of reconstructed sources and B represents the set of actual sources.

Results
In this section, numerical simulation and in vivo experiments were carried out to assess AMLC network in terms of location capability, morphology restorability, robustness, and in vivo feasibility. This section contained six main experiments, including single-source reconstruction, dual-source reconstruction, big-source reconstruction, anti-noise performance experiments, and in vivo experiments. Besides, two additional experiments were conducted to further explore the influence of distance and depth.

Single-source reconstruction
The quantitative analysis of the validation set was shown in Table 4. Three samples, which named model 1, model 2 and model 3 were selected from three different sources to show single-source reconstruction results (Fig. 2). 3D views and 2D cross sections reflected source shapes directly. In 3D view, the BCE of AMLC network and KNN-LC network were less than that of MFCNN in model 1 and model 3. Although KNN-LC network achieved accurate positioning, the morphology recovery was unsatisfactory compared with AMLC network in model 3. The model 2 was chosen to demonstrate that AMLC network also achieved an excellent result when MFCNN obtained small BCE. Similar results were also observed in 2D cross sections.
From Table 4, AMLC network showed the minimum mean BCE (0.37 mm) and the maximum mean Dice (0.81) among three methods. More specifically, AMLC network achieved the most accurate CLT reconstruction for model 1, with the minimum BCE (0.05 mm) and the maximum Dice (0.83). These qualitative and quantitative results proved AMLC network was more accurate in single-source reconstruction.
To further verify the superiority of AMLC network, the results of depth experiment were shown in Supplement 1 (Fig. S2). All three methods realized the source reconstruction at different depths. With the change of depth, the reconstruction results had different degrees of distortion. Quantitative and qualitative analysis show that AMLC network was more accurate and stable in depth experiment.

Dual-source reconstruction
Different from the single-source experiment, BCE was considered as the main evaluation index in dual-source reconstruction. The mean BCE of each sub-source in the validation set was calculated and shown in Table 5. The results were consistent with the former experiments. AMLC network obtained the minimum BCE. Another three samples were selected to show dual-source reconstruction results. Both 3D views and 2D cross sections were provided in Supplement 1 (Fig. S3). Quantitative and qualitative analysis show that AMLC network could achieve more accurate dual-source reconstruction.  To further evaluate the performance, we narrowed the gap between the two sources. The center distance was set as 2.5 ( Fig. 3(a)), 3.0 ( Fig. 3(b)) and 3.5 mm (Fig. 3(c)), respectively. As shown in Fig. 3, all three networks had completed the dual-source localization when the barycenter center gap was greater than 3.0 mm. However, when it came to 2.5 mm, it was hard to distinguish dual-source border from the results of MFCNN. When the center distance came to 3.0 mm, S2 source of KNN-LC network was biased. Quantitative analysis was shown in Fig. (d) and (e). The results of AMLC network obtained the minimum mean BCE. These comparisons further verified the superiority of AMLC network in close source reconstruction.

Big-source reconstruction
The mean BCE and Dice of validation set were calculated and shown in Table 6 to analyze the reconstruction ability of big-source. As shown in Fig. 4, a big-source sample was selected to show the reconstruction results directly. The results in Table 6 were consistent with the previous experiments. The mean BCE of AMLC network was 0.14 which showed a high positioning accuracy. At the same time, AMLC network also obtained the maximum Dice, which meant the morphology of the source obtained the best recovery. These comparisons showed that AMLC network achieved more accurate reconstruction.

Anti-noise performance experiment
Three different gradients of Gaussian noise were applied in surface photon intensity to assess the stability of AMLC network. Gauss1, Gauss2 and Gauss3 represented 5%, 10% and 15% Gaussian noise, respectively. A single-source sample was chosen to show the results in 3D view and 2D transverse sections, directly. Deviations could be observed in the histograms (Fig. 5). Although BCE increased and Dice decreased after adding noise, location and morphological information can be restored. These results demonstrated that AMLC network was robust.

In vivo CLT reconstruction
An orthotopic glioma model was used to assess in vivo feasibility of AMLC network. Multimodality in vivo imaging results were shown in Fig. 6. The reconstructed results were fused with the corresponding MRI data through image registration.  As shown in Fig. 6, both BLI and CLI provided the accurate tumor location. In addition to the head, CLI signals were also observed in the neck of the mouse during signal acquisition. All the results proved that in vivo CLI could obtain tumor localization, which provided the basis for CLT reconstruction.
Enhanced T1 MRI, GFP fluorescence image, and H&E stained image were compared to assess the reconstruction results of the three approaches. The merged CLT-MRI images showed high consistency. The Dice between frozen section and CLT reconstruction was listed in Table 7.
Compared with other methods, AMLC network obtained the maximum Dice of 0.69. All the observations demonstrated that AMLC network achieved accurate in vivo CLT reconstruction.

Discussion and conclusion
CLT is an emerging and highly sensitive imaging method which combines optical information with anatomical information. However, high time cost, complicated solving process, and significant approximation error seriously influence the application of traditional model-based CLT reconstruction methods. Neural networks such as KNN-LC network and MFCNN have achieved optical tomography reconstruction, which encourages us to explore novel applications of machine learning. In this study, an AMLC network was proposed to improve CLT reconstruction performance. Compared with the previous researches, we optimized the dataset and innovated the network structure to obtain better reconstruction results. The diversity of the dataset was improved by amplifying the number and type of the samples, and the attention mechanism was introduced to make the network flexible. Specifically, we generated 5000 single-source samples, 2000 big-source samples, and assembled 7800 dual-source samples. Compared with MFCNN method, the number of samples in the training set was expanded. And compared with KNN-LC network, the types of sources were enriched. A series of experiments were designed to evaluate its reconstruction performance. Experimental results verify the accuracy and stability of AMLC network in CLT reconstruction.
During the numerical simulations, the reconstruction results were evaluated from quantitative and qualitative perspectives. AMLC network got the highest Dice and the lowest BCE. Compared with MFCNN method, the reconstruction results were obviously improved in morphology restorability and location capability. These results demonstrated that the residual learning module optimized the sparse problem and improved the morphological reconstruction. Meanwhile, the lowest BCE indicated the superiority of AMLC network in location reconstruction. The constraint of AMLC network is the data itself, while KNN-LC network is constrained by location information. Once the value of k is determined, KNN-LC network is fixed. On the contrary, the structure of the LC sub-network based on attention matrix dynamically changes with data. Therefore, AMLC network might generalize better to various types of data.
In in vivo experiments, an orthotropic glioma model was constructed to further verify the superiority of AMLC network. The CLI result was similar to the BLI result, which provided a basis for in vivo 3D reconstruction. PET results were shown in various views to demonstrate that 11 C-MET was ingested by glioma. The signals in the necks of the model were probably caused by brown adipose tissue. As the mouse was under gas anesthesia, their muscles shivered from the cold. Brown adipose tissue mainly existed in the neck. Due to cold stimulation and high metabolism, there were more radionuclide probes and stronger CLI signals. Although all three networks had achieved location reconstruction, the result of MFCNN was obviously over-sparse. KNN-LC optimized it to some extent, but lost part of morphological information. AMLC network achieved the highest Dice of 0.69. It was believed that the combination of MRI, PET and AMLC network may improve the accuracy of tumor detection.
Although the well-trained AMLC network performs well in CLT reconstruction, some flaws still exist and restrict its application. AMLC network is a data-driven method and the quantity and quality of the dataset largely determine the reconstruction performance. Different data sets should match different types of tumors. Also, an extra optimization procedure is required to balance the errors between the standard mesh and the actual structure. Combined with model-based CLT reconstruction methods, the limitations might be overcome, which would be explored in our future work.
In conclusion, a novel AMLC network has been proposed to achieve more accurate and stable CLT reconstruction. Attention mechanism was introduced to optimize the residual between coarse results and targets. The well-trained AMLC network has shown excellent performance in both numerical simulations and in vivo experiments. To our best knowledge, it is the first study that combined attention mechanism with CLT reconstruction, which would promote the application of machine learning in optical tomography reconstruction.