Individual Prediction of Electric Field Induced by Deep-Brain Magnetic Stimulation With CNN-Transformer

Deep-brain Magnetic Stimulation (DMS) can improve the symptoms caused by Alzheimer’s disease by inducing rhythmic electric field in the deep brain, and the induced electric field is rhythm-dependent. However, calculating the induced electric field requires building a voxel model of the brain for the stimulated object, which usually takes several hours. In order to obtain the rhythm-dependent electric field induced by DMS in real time, we adopt a CNN-Transformer model to predict it. A data set with a sample size of 7350 is established for the training and testing of the model. 10-fold cross validation is used to determine the optimal hyperparameters for training CNN-Transformer. The combination of 5-layer CNN and 6-layer Transformer is verified as the optimal combination of CNN-Transformer model. The experimental results show that the CNN-Transformer model can complete the prediction in 0.731s (CPU) or 0.042s (GPU), and the overall performance metrics of prediction can reach: MAE =0.0269, RMSE =0.0420, MAPE =4.61% and R2=0.9627. The prediction performance of the CNN-Transformer model for the hippocampal electric field is better than that of the brain grey matter electric field, and the stimulation rhythm has less influence on the model performance than the coil configuration. Taking the same dataset to train and test the separate CNN model and Transformer model, it is found that CNN-Transformer has better prediction performance than the separate CNN model and Transformer model in the task of predicting electric field induced by DMS.


I. INTRODUCTION
T HE hippocampus is responsible for long-term memory storage, transformation, and orientation.Animal studies have shown that magnetic stimulation of the hippocampus can improve symptoms caused by Alzheimer's disease and depression [1], [2], [3], [4], [5].Deep-brain Magnetic Stimulation (DMS) is derived from transcranial magnetic stimulation (TMS), which adopts a pair of large coils to induce a rhythmic electric field deep in the brain to modulate electrical activity of neurons [6], [7], [8].The intensity of the induced electric field is important to reflect the effect of DMS.The current loaded by the DMS stimulator generally does not exceed 20A, and its induced electric field intensity in the brain is usually within 5V/m, which is much lower than the intensity of TMS, which generally exceeds 100 V/m [9] As a result, there are no reports of side effects at present [10].
The head tissues have different conductivity and relative permittivity in the magnetic field environment with different frequencies, which lead to the rhythm-dependent electric field induced by DMS [11].The anatomical structure of different human brains are different, which lead to the individual difference in the induced brain electric field intensity under the same magnetic field stimulation [9], [12], [13], [14], [15], [16].In order to obtain the individual effect of DMS, it is generally necessary to establish the brain voxel model and calculate the induced electric field by finite element method (FEM) [17], [18].However, this process is time-consuming.It usually takes several hours to build a head voxel model and several minutes to perform FEM [19], [20], [21].Therefore, the effect of DMS on individuals cannot be obtained in real time, which limits its future clinical application.Previous studies have shown that convolutional neural network (CNN) can be used to predict the electric field induced by TMS [22], [23], [24], [25], [26].The U-net model based on CNN can predict the spatial distribution of the electric field induced by TMS [22].The electric field induced by TMS is predicted by two factors: the subject's head MRI and the coil configuration [22], [23], [24].However, the DMS-induced electric field is rhythm-dependent [10], which leads us to consider not only the subject's head MRI and coil configuration, but also the stimulation rhythm.
CNN has excellent performance in extracting data features, but it cannot capture remote features due to the limitation of the size of the receptive field [27].The Transformer model is originally proposed for Natural Language Processing (NLP) [28], [29], [30], which takes advantage of the selfattention mechanism.To adapt to computational vision tasks, Transformer has been improved as a model of Vision Transformer (VIT) [31].The VIT compensates for the deficiency of CNN by its excellent long-distance feature capture and modeling ability.However, for small datasets, CNN models often perform better than Transformer models [32], [33].Therefore, recent studies have attempted to combine CNN and Transformer [34], [35], [36], so that the model structure can inherit the advantages of CNN and Transformer, and retain global and local features to the greatest extent.In this paper, a model of CNN-Transformer that combines CNN and Transformer is adopted to obtain induced electric field with a small data set.
We train CNN-Transformer as a predictor to predict rhythm-dependent electric field induced by DMS.First, a dataset is built for training and testing the model, which inputs are MRI, coil configuration and stimulation rhythm, and the outputs are hippocampal electric field and brain grey matter electric field.Then, the best combination of CNN and Transformer layers is obtained and the performance of CNN-Transformer model under this combination is tested.Finally, the prediction performance of CNN-Transformer model, CNN model and Transformer model on rhythm-dependent electric field are compared.Thus, it is verified whether the CNN-Transformer model is suitable for predicting the rhythm-dependent electric field induced by DMS.

A. The Stimulation Paradigm of DMS
The standard coil configuration of the DMS stimulator is shown in Fig. 1(a), where two stimulation coils with a diameter of 360mm are placed on the left and right sides of the head, 300mm apart.The current waveform loaded in the coil is shown in Fig. 1(b).The fundamental wave is a bipolar trapezoidal wave with a frequency of 1000Hz and an extreme intensity of ±20A.The rhythm is formed by periodic switching stimulation, and the stimulation rhythm is generally in the range of 10Hz to 100Hz.

B. Computation Model
The calculation of the rhythm-dependent electric field induced by DMS is based on Maxwell's equations as follows [37]: where ⃗ H is the magnetic field intensity, ⃗ J is the conduction current density, ∂ ⃗ D ∂t is the displacement current density, ⃗ E is the electric field intensity, ⃗ B is the magnetic flux density, and ρ is the free charge body density.Equation (1) shows that displacement current and conduction current can generate magnetic field.Equation (2) shows that the vortex source of the electric field is the time rate of change of the magnetic flux density.Equation (3) states that the divergence of B is zero, that is, the B line has no beginning and no end.Equation (4) shows that under time-varying conditions, the divergence of D is still equal to the free charge density of the point.In addition, there are three constitutive relations as follows [37]: where equation (5) shows the relationship between ⃗ D and ⃗ E through the relative permittivity ε.Equation (6) shows the relationship between ⃗ B and ⃗ H through the permeability µ.
Equation (7) shows the relationship between ⃗ J and ⃗ E through the conductivity σ .By simplifying and deducing Maxwell's equations, we obtain the electric field E induced by DMS as follows [11] and [12]: where ⃗ E is the induced electric field, ϕ is the scalar potential, ⃗ A is the magnetic vector potential of the applied magnetic field, and ∇ is the Hamiltonian operator.The induced electric field ⃗ E consists of two parts, −∂ ⃗ A ∂t is the primary electric field generated by the changing magnetic field, which is determined by the characteristics of the coil and the rate of change of the magnetic field; −∇ϕ is the secondary electric field, induced by the charge in the medium.As the scalar potential ϕ does not have an analytical solution, it needs to be approximated by FEM.According to Ampere circuital theorem and Gauss's law, the magnetic field partial differential equation and the EF partial differential equation in the electromagnetic field can be derived [37]: where µ is the relative permeability of each tissue, ε is the permittivity of each tissue, and ∇ 2 is the Laplace operator.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE I THE CONDUCTIVITY AND RELATIVE PERMITTIVITY OF HEAD TISSUES
It can be seen from the equations (9) that the induced electric field is not only determined by the applied magnetic field, but also related to the electrical conductivity and relative permittivity of the head tissues.
In this paper, FEM is used to calculate the induced electric field, and the process is shown in Fig. 2: (1) Mimics software (Materialise, Belgium) is used to segment and reconstruct the MRI data of 35 subjects.For each subject, the head structure is divided into six parts: scalp, skull, cerebrospinal fluid (CSF), grey matter, white matter and hippocampus.The conductivity and relative permittivity of the individual head tissues are shown in Table I, with reference to the online database of Italian National Research Council (http://niremf.ifac.cnr.it/tissprop/).Hypermesh 2019 software (Altair Engineering, United States) is adopted to mesh the 3D head model, and all the head models are divided into tetrahedral grid with side lengths less than 1mm.(2) ANSYS 2020R2 software (ANSYS, United States) is adopted for finite element calculation of the induced electric field.In the process of finite element calculation, the scalar potential between the scalp, the air and the adjacent head tissue satisfies the Neumann boundary condition [11], [12]: where ⃗ n is the normal vector.

C. Data Preparation
The MRI data in this study are obtained from Shanghai Ruijin Hospital.There are 35 subjects aged from 33 to 76, including 21 males and 14 females.Twenty patients with AD and 15 healthy volunteers are included.MRI data from 30 of these individuals are randomly selected for the training set, and MRI data from the remaining 5 individuals are used for the test set.
We build head models of 35 persons based on MRI data and calculate the electric fields induced by 210 stimulation combinations based on FEM.The information of the dataset is shown in Table II.We obtain all samples by the process shown in Fig. 3

TABLE II DATASET INFORMATION
and mean intensity of the brain grey matter electric field; (5) The maximum intensity and mean intensity of hippocampal electric field.The total number of samples is 7350.Of these, 6300 samples (training set of 30 persons) were used for training and 1050 samples were used for testing (test set of 5 persons).

D. CNN-Transformer Model
The framework of the CNN-Transformer model we adopted is shown in Fig. 3 The part of Transformer adopts the structure of VIT [31].Since the original Transformer cannot directly process image data, the image needs to be preprocessed.Take image x ∈ R H ×W ×C as an example, where H × W is the resolution and C is the number of channels.First, we split the image x into N two-dimensional patches: The size of each patch is P × P and the number of patches is N = H × W (P × P).Then patch embedding and position encoding are processed.Patch embedding flattens each patch Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
and makes a linear transformation to map the x p to the D-dimensional vector.At the same time, in order to preserve the spatial position information between the input image patches, a position encoding vector E positon is also needed to be added to the image patch embedding, which is shown as follow: where E ∈ R (P•P•C)×D denotes patch embedding and E positon ∈ R (N +1)×D denotes position encoding.In equation ( 9), a classification vector x class is added to the vector of length N , which is used to learn the category information in the training process.The Transformer encoder consists of several layers of multi-head Self-attention (MSA) blocks and multi-layer Perceptron (MLP) blocks.The MSA block of Transformer adopts the Softmax function to calculate the attention weights, and the MLP adopts the GELU activation function.The output of the l th layer can be expressed as follows: where L N refers to Layer Normalization.L N is added before each block and residual join is used after each block.
The CNN-Transformer model uses a multi-layer CNN as the feature extractor of the Transformer to generate the feature map of the input.The patch embedding is applied to the patches extracted from the CNN feature map instead of the patches extracted from the original image.Each layer of CNN is composed of three parts: two-dimensional convolution, regularization and RELU activation.
The hidden layer consists of three parts: fully connection, Dropout, and RELU activation, which is able to prevent overfitting.The output layer consists of four neurons and is used to fit four sets of outputs.
For the CNN-Transformer model, we set the inputs as MRI, coil configuration, and stimulation rhythm, and the outputs as the maximum and mean intensity of the hippocampal and brain grey matter electric fields.Firstly, the features of the MRI central slices from three different views are extracted through the CNN layer to generate feature maps.After patch embedding and position encoding, the feature maps are input into the Transformer layer for processing.It is then concatenated with coil configuration and stimulation rhythms to form feature sequences, which are then passed through the hidden and output layers to output the maximum and mean intensity of the hippocampal electric field and the brain grey matter electric field.Both types of data, coil configuration and stimulation rhythm, are single numbers rather than sequences, so using advanced models for feature extraction on them may reduce training efficiency.Referring to the CNN model for predicting the electric field induced by TMS [23], we concatenate the two separate values into the feature map processed by CNN and Transformer to improve the learning efficiency of the model.

E. Experimental Settings
The model in this study is designed based on Pytorch 1.12.0, the Adam optimizer is adopted to optimize the training, and the Mean Square Error (MSE) function is used as the loss function.
We adopt four indicators to evaluate the performance of the model in predicting the frequency dependent electric field induced by DMS: mean Absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), and coefficient of determination (R 2 ).The four metrics are calculated as follows: where y i is the intensity of the induced electric field obtained by FEM, ŷi is the intensity of the induced electric field predicted by the model, and ȳ is the mean of the intensity of n sets of electric fields.MAE and RMSE can intuitively reflect the error of the results predicted by the model.MAE reflects the true error, while RMSE takes the square of the error and then takes the square root, which magnifies the difference of large errors.Therefore, RMSE can reflect the size of its maximum error.MAPE is a combination of absolute and relative values, so it can reflect both the degree of deviation from the predicted value and the relative error of the predicted value.R 2 is a statistical measure used to evaluate the goodness of fit of a regression model.The meaning is the proportion of variance explained by the model to the total variance, and the value ranges from 0 to 1.As R 2 gets closer to 1, the model has a better fit to the data, meaning that the independent variable explains more variation in the dependent variable.
In order to keep the performance of the model unbiased, we adopt the method of 10-fold cross-validation.The dataset is equally divided into 10 subsets after shuffling the order.Training is performed 10 times for each combination of hyperparameters.Each time, 9 of the 10 subsets are used as the training set and the remaining one is used as the validation set until all subsets have been used as the validation set.The RMSE of the validation set is calculated after each training.The mean RMSE of 10 times is taken as the RMSE of this hyperparameter combination.Finally, the hyperparameter combination with the minimum RMSE is selected to train the data of all 10 subsets to obtain the optimal model.The data of the test set is independent of all the data of the training set and the validation set, avoiding the overlap of the data.The test set is used to evaluate the performance of the optimal model to prove its generalization.
Through the 10-fold cross validation method, we determined the training scheme of CNN-Transformer model as follows: the learning rate is set to 0.001, the Batch size is 256, the total number of iterations is 200, and the optimization algorithm is Adam.The hardware platform we used to perform our experiments is as follows.CPU: Intel(R) Xeon®Silver 4114; GPU: NVIDIA GeForce RTX 3070; Memory: 384GB.

III. RESULTS AND DISCUSSIONS A. Optimal Combination of CNN and Transformer Layers
In the CNN-Transformer model, the depth of both CNN and Transformer parts has a great impact on the performance of the model.Since deeper networks can learn more abstract and complex features, increasing the depth of the CNN can improve the performance of the model.However, the boost is not linear.As depth increases, the performance gains saturate, and too deep networks may also cause performance degradation.Similar properties hold for the depth of Transformer, increasing depth may improve performance but can also lead to overfitting or training instability.Therefore, choosing the optimal combination of CNN and Transformer layers is very important to improve the prediction performance of this model.
We test CNN layers ranging from 1 to 6 and Transformers layers ranging from 1 to 7, resulting in 42 combinations.Each combination of models is trained for 200 epochs with the training set data, and the R 2 of the model's prediction results on the test set is adopted as the evaluation metric, which are shown in Fig. 4. We find that R 2 tends to decrease after number of CNN layers exceeds 5 and the number of Transformer layers exceeds 6, and the combination of 5 CNN layers and 6 Transformer layers achieves the largest R 2 .As a result, we confirm that 5-layer CNN and 6-layer Transformer are the optimal combination.

B. Prediction Results and Error Analysis
In order to obtain the predictor, we train the CNN-Transformer model of the optimal combination of 5-layer CNN and 6-layer Transformer with the training set.The change of training loss with the number of iterations is shown in the Fig. 5, which loss function of MSE finally converges to the lowest value of 0.0012.IV.For both training and test sets, the MAE, RMSE and MAPE of the hippocampal electric field prediction results are lower than those of the brain grey matter electric field, indicating that the prediction error for the hippocampal electric field is lower.The R 2 of the prediction results for the hippocampal electric field are 0.9773 and 0.9690 in the training set and test set, which indicates that the predictor can predict 97.73% and 96.90% of the variation in the training set and test sets.And R 2 of the prediction results for the brain grey matter electric field are 0.9577 and 0.9402 in the training set and test set, which indicates that the predictor can predict 95.77% and 94.02% of the variation in the training and test sets.The results of MAE, RMSE, MAPE and R 2 prove that the prediction performance of the predictor for the hippocampal electric field is better than that of the brain grey matter electric field.Comparison of the results on the training and test sets confirms that the model generalizes well.Since the dielectric parameters of the hippocampus are the same as those of the brain grey matter, the difference in prediction performance for brain grey matter and hippocampus mainly depends on their volume differences.The volume of the whole brain grey matter is much larger than that of the hippocampus, which leads to the fact that the number of data features that need to

TABLE IV PERFORMANCE COMPARISON ON DIFFERENT DATASETS
be considered to predict the hippocampal electrical field alone is much smaller than that of the brain grey matter.Therefore, the prediction performance of hippocampus is better under the same training intensity.
Table V shows the comparison of time cost of FEM and CNN-Transformer model to obtain a set of induced electric fields.In order to calculate the induced electric field of a new individual, FEM needs to first build the head model of the individual, which takes several hours.The CNN-Transformer model takes 0.731s to predict the induced electric field of a

C. Effect of Input Parameters on Model Performance
For the same individual, the stimulation rhythm and coil configuration determine the output of the induced electric field.
In the test set, we test the predictor to predict hippocampal electric field and brain grey matter electric field for all combinations of subject with coil configuration under the same stimulation rhythm.The MAE, RMSE, MAPE and R 2 are then calculated as metrics of the performance of the predictor in that stimulation rhythm.Then we obtain the performance metrics of all stimulation rhythms, as shown in the Fig. 7(a)-(d).Similarly, in order to obtain the performance metrics of the predictor at different coil configuration, the predictor is tested to predict hippocampal electric fields and brain grey matter electric fields for all combinations of subject with stimulation rhythm at the same coil configuration.The MAE, RMSE, MAPE and R 2 are then calculated as the performance metrics of the predictor for this coil configuration.The performance metrics of the predictor for all coil configuration are shown in the Fig. 7(e)-(h).Comparing Fig. 7(a)-(h), we find that the performance metrics of MAE, RMSE, MAPE and R 2 of the predictor all fluctuate significantly with the change of coil configuration, and fluctuate less with the change of stimulus rhythm.The stimulation rhythm, as an input parameter, has significantly less influence on the predictor's performance than the coil configuration.It can be seen from Fig. 7(a)-(d) that as the stimulation rhythm changes, the fluctuation degree of the predictor's performance metrics of the brain grey matter electric field is close to that of the hippocampal electric field.However, it can be seen from Fig. 7(e)-(h), with the change of coil configuration, the fluctuation degree of the predictor's performance metrics of the brain grey matter electric field is far more than that of the hippocampal electric field.

D. Effect of Different Tissues on Model Performance
The hippocampus has the same dielectric parameters as the brain grey matter and can be considered as the same type of brain tissue during the calculation of the induced electric field.The difference in prediction performance is likely to depend on the volume difference between the tissues.
In order to verify whether the difference in prediction performance of the model is determined by the volume difference of brain tissue, we take this model to train the different tissues as grey matter electric field of the left and right hemispheres of the brain separately.By adopting the same hyperparameters for training, we obtained the prediction performance for the grey matter electric field in the left and right hemispheres of the brain compared with the hippocampus and the whole brain grey matter, as shown in Table VI, including MAE, RMSE, MAPE and R 2 .The prediction performance of the grey matter electric field in the left and right hemispheres of the model brain is very similar, and both of them are between the prediction performance of the whole brain grey matter electric field and the hippocampal electric field.The grey matter volume of the left and right hemispheres is about the same, only half the volume of the whole brain grey matter, and larger than the volume of the hippocampus.It proves that the difference in the prediction performance of different brain regions mainly depends on the difference of volume.

E. Performance Comparison With Other Models
Further, we contrast the prediction performance of CNN-Transformer with CNN model and Transformer model for rhythm-dependent electric field induced by DMS.To make a contrast with CNN-Transformer, we set the CNN model to Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

IV. CONCLUSION
In this paper, a CNN-Transformer model is adopted to predict the rhythm-dependent electric field induced by DMS.Compared to FEM, which takes hours to complete head modeling and calculation, CNN-Transformer model only takes 0.731s (CPU) or 0.042s (GPU).It greatly reduces the time cost and approximates being able to obtain the rhythm-dependent electric field induced by DMS in real time.
We build a dataset with a sample size of 7350 for training and testing the CNN-Transformer model.Using 10-fold cross validation, we obtain the optimal combination of hyperparameters: training loss function is MSE, learning rate is 0.001, batch size is 256, number of iterations is 200, and optimization algorithm is Adam.We use the test set to test the different combinations of CNN and Transformer layers, and then we find that the combination of 5-layer CNN and 6-layer Transformer is optimal.
We choose the hippocampal electric field and the brain grey matter electric field as the prediction targets.By comparing the metrics of MAE, RMSE, MAPE and R 2 , we find that the CNN-Transformer model has better prediction performance for the hippocampal electric field than the brain grey matter electric field.In addition, we find that the change of stimulation rhythm has a significantly lower impact on the model performance than the coil configuration.For predicting rhythm-dependent electric field, this property of the CNN-Transformer model is a great advantage.
The CNN-Transformer model is built with the CNN model as the feature extractor of the Transformer model.In order to verify whether the CNN-Transformer model is better than the CNN model or the Transformer model, we take the same dataset to train the CNN model and the Transformer model respectively.For both training and test sets, we find that the CNN-Transformer model performs better than both the CNN model and the Transformer model.It proves that CNN-Transformer is more suitable for predicting the rhythm-dependent electric field induced by DMS.
In future research, the prediction ability of the model for coil configuration should be improved.We will also increase the number of predicted targets to contrast more brain regions.

Fig. 1 .
Fig. 1.The stimulation paradigm of DMS.(a) The standard coil configuration of the DMS stimulator.(b) The current waveform loaded in the coil.
(a).The inputs of the dataset are: (1) central slices from coronal view, sagittal view and horizontal view of MRI; (2) coil configuration; (3) stimulation rhythm.The outputs of the constructed dataset are: (4) the maximum
(b).It is a hybrid of multi-layer CNN and Transformer.The feature extraction of individual MRI data is the key to predict the electric field induced by DMS for any individual.Referring to the TransUNet model for MRI image segmentation [27], we put the CNN before the Transformer, and actually utilize the CNN as the feature extractor of the Transformer.Treating local and global features separately makes the model design more flexible.The number of layers of CNN and Transformer can be individually adjusted as needed to adapt to predict the electric field of different brain regions.

Fig. 4 .
Fig. 4. R 2 with different combinations of CNN layers and transformer layers.

Fig. 5 .
Fig. 5.The change of training loss with the number of epochs.

Fig. 6 .
Fig. 6.Comparison of prediction and FEM results.(a) Brain grey matter electric field in training set.(b) Hippocampal electric field in training set.(c) Brain grey matter electric field in test set.(b) Hippocampal electric field in test set.
new individual based on CPU, and only 0.042s to predict the induced electric field based on GPU.It indicates that adopting the CNN-Transformer model is able to obtain the rhythm-dependent electric field induced by DMS in close to real time.

Fig. 7 .
Fig. 7. Effect of input parameters on performance metrics.(a) Effect of stimulation rhythm on MAE.(b) Effect of stimulation rhythm on RMSE.(c) Effect of stimulation rhythm on MAPE.(d) Effect of stimulation rhythm on R 2 .(e) Effect of coil configuration on MAE.(f) Effect of coil configuration on RMSE.(g) Effect of coil configuration on MAPE.(h) Effect of coil configuration on R 2 .

Fig. 9 .
Fig. 9. Comparison of the performance metrics of the models as the number of iterations changes.(a) Comparison of MAE.(b) Comparison of RMSE.(c) Comparison of MAPE.(d) Comparison of R 2 .
is used as the loss function of the training, the learning rate is 0.001, the batch size is 256, the number of iterations is 200, and the optimization algorithm is Adam.In the training process, the performance metrics (MAE, RMSE, MAPE and R 2 ) of the three models to the training set with the number of iterations are shown in Fig.9(a)-(d).As can be seen from Fig. 9, the amplitudes of MAE, RMSE, MAPE, and R 2 fluctuations with the increase of iterations are all much larger for CNN model than for CNN-Transformer model and Transformer model.This indicates that the CNN model converges slower than CNN-Transformer model and Transformer model in this task.After the number of iterations exceeds 150, all three models gradually tend to converge.We can find from Fig. 9 that for the three metrics of MAE, RMSE and MAPE: CNN > Transformer > CNN-Transformer.While for the metric of R 2 : CNN-Transformer > Transformer > CNN.It indicates that for the performance of the three models under the training set, CNN-Transformer model is better than Transformer model and CNN model.Then we test the performance of the three models with test set, and the results are shown in the Table VII.Through the comparison of MAE, RMSE, MAPE and R 2 in the Table VII, we find that the prediction performance for the test set is: CNN-Transformer > Transformer > CNN.It proves that the CNN-Transformer model also has the best generalization performance among the three models in this task.

TABLE VI PERFORMANCE
COMPARISON OF DIFFERENT TISSUES

TABLE VII PERFORMANCE
COMPARISON OF DIFFERENT MODELS