A Network with Composite Loss and Parameter-free Chunking Fusion Block for Super-Resolution MR Image

MRI is often influenced by many factors, and single image super-resolution (SISR) based on a neural network is an effective and cost-effective alternative technique for the high-resolution restoration of low-resolution images. However, deep neural networks can easily lead to overfitting and make the test results worse. The network with a shallow training network is difficult to fit quickly and cannot completely learn training samples. To solve the above problems, a new end-to-end super-resolution (SR) method is proposed for magnetic resonance (MR) images. Firstly, in order to better fuse features, a parameter-free chunking fusion block (PCFB) is proposed, which can divide the feature map into n branches by splitting channels to obtain parameter-free attention. Secondly, the proposed training strategy including perceptual loss, gradient loss, and L1 loss has significantly improved the accuracy of model fitting and prediction. Finally, the proposed model and training strategy take the super-resolution IXISR dataset (PD, T1, and T2) as an example to compare with the existing excellent methods and obtain advanced performance. A large number of experiments have proved that the proposed method performs better than the advanced methods in highly reliable measurement.


Introduction
MRI is a noninvasive imaging technology in vivo that uses the phenomenon of magnetic resonance to obtain molecular structure and thus information about the internal structure of the human body. MRI not only provides more information than many other imaging techniques in medical imaging, but it can also directly make cross-sectional, sagittal, coronal, and various oblique images of the body, which does not produce the artifacts in CT detection, does not require contrast injection, does not have ionizing radiation, and has less adverse efects on the body. MRI is very efective in detecting intracerebral hematomas, extracerebral hematomas, brain tumors, and other diseases. Of course, MRI has its shortcomings [1]. It is relatively slow, has less spatial resolution than CT, has motion artifacts, etc. Terefore, obtaining high-resolution MRI images has become the direction of current research.
High-resolution MRI can not only clearly show the relationship between tumor and surrounding tissues but also the anatomical structure of the brain. It has high application value in the early and middle stages of diagnosis [2].
However, the generation of high-resolution MRI images is odnften infuenced by many factors, such as hardware equipment, imaging time, the motion of the human body, and the efect of environmental noise. Terefore, in order to perform efective high-resolution restoration of the lowresolution images obtained by MRI, image superresolution is an efective and cost-efective excellent technique to improve the spatial resolution of MR images. Tis technique ofers the feasibility of a high signal-to-noise ratio and high-resolution reconstruction of low-resolution MRI images [3].
Te traditional SR algorithms include interpolationbased and reconstruction-based methods, which are generally difcult to reconstruct from the high-frequency detailed information of the image, more complicated to compute, and take longer time to reconstruct [4]. In order to solve these problems, scholars have applied deep learning to SR reconstruction in recent years and made a lot of breakthroughs, and nowadays, SR algorithms based on deep learning have occupied the mainstream position of SR algorithm research. In the feld of medical images, deep learning-based SR algorithms can obtain prior knowledge from medical image training set data and reconstruct lowresolution images into high-resolution images using neural networks based on this information.
In recent years, with the continuous development of deep learning [5][6][7][8], many advanced deep learning-based SR methods have emerged in the feld of SR image [9,10], enabling the performance and efciency of SR image to be continuously enhanced. Super-resolution convolutional neural network [11] and fast super-resolution evolutionary neural network [12] were pioneering works of deep learning in the feld of super-resolution reconstruction. Tey use a convolutional neural network (CNN) for super-resolution image reconstruction for the frst time. Subsequently, on the basis of this pioneering work, researchers proposed many new super-resolution image networks to further improve the model performance, such as deeply recursive convolutional network [13] and deep recursive residual network [14] based on recurrent neural networks and super-resolution using very deep convolutional networks [15]. FFTI [16] was a fne inpainting method which is an incomplete image inpainting method based on feature fusion and two-step inpainting. However, most of these methods were aimed at natural images and are not suitable for medical images.
Recently, many literature studies in the feld of medical images have also proposed many SR methods for medical images, such as [17][18][19][20][21]. However, unlike ordinary images, high-quality medical image datasets are relatively scarce, and most of the images are gray-scale images, and the images are relatively single. Using this data set to train a model with a deep network layer will easily lead to overftting and make the test result worse. A model with a shallow training network will be difcult to ft quickly and cannot learn the training samples completely. Terefore, SR medical images trained by a traditional network cannot meet the requirements of SR tasks.
Considering the above problems, in order to make a SR image model more suitable for medical image tasks, in this paper, we introduce residual learning and a parameter-free chunking fusion method to improve the above difculties. In the stage of feature extraction, residual learning is designed similar to the residual network [22] to acquire features, which uses layerNorm [23] in the transformer for reference. LayerNorm is also used in residual learning to make the training smoother and avoid the impact of variance differences between diferent batches. Subsequently, a parameter-free chunking fusion block is used to better fuse features and perform efective feature enhancement. In the module, the feature graph chunking is divided into n branches for diferent information transmission, and then the SimAM [24] is performed on each branch to enhance the features of diferent branches, and fnally the semantic information of diferent branches is integrated. SimAM can efectively enhance the feature on diferent branches and efectively integrate at the end. Moreover, SimAM has no parameters to learn and can improve the model performance without parameter training. In addition, in order to further accelerate model ftting and improve prediction accuracy, this paper proposes a composite loss to optimize the training strategy by combining perceptual loss, gradient loss, and L1 loss.
In order to solve the above problems, we have proposed corresponding solutions, to which the follow-up work mainly makes three contributions: (1) A parameter-free chunking fusion block (PCFB) model is proposed, which divides the feature map into n branches for parameter-free attention and then integrates the feature information of diferent branches, so as to better fuse features and perform efective feature enhancement, which can improve the expression ability of the feature map without adding parameters, thereby improving the accuracy. (2) A composite loss for our SR method is proposed which combines perceptual loss, gradient loss, and L1 loss. Te loss can further make the model pay attention to the impact of loss in diferent dimensions, thus enhancing the model's expressiveness.
(3) A new end-to-end SR method for MR images is proposed, where the methods contain PCFB and composite loss, which can improve SR method performance more efectively and avoid overftting.
Te rest of this paper is organized as follows: Section 2 introduces some related work in this paper. Te proposed methods and experimental results are described in detail in Sections 3 and 4, respectively. We conclude our thesis in Section 5.

Super-Resolution in Deep
Learning. With the development of deep convolutional neural networks (DCNN), research on super-resolution has made progress recently. For deep learning methods with SISR, fast response and reconstruction quality are important references for measuring super-resolution methods. Super-resolution convolutional neural network (SRCNN) [11] and fast super-resolution evolutionary neural network (FSRCNN) [12] were pioneering works of deep learning in the feld of superresolution reconstruction. Te two neural networks frst used bicubic interpolation to reduce and enlarge lowresolution images to obtain comparable super-resolution images. Ten convolutional neural network was frst introduced to achieve image reconstruction. In addition, the traditional SR method based on sparse coding can also be regarded as a deep convolutional network from the two networks, and compared with the traditional method, all sublayers in the two networks were optimized to give full play to the performance of each component. DRCN has a very deep recursion layer (up to 16 recursions), and recursive supervision and skip connections were further proposed by taking into account gradient disappearance/ explosion. For deep models, the residual structure exhibits excellent performance. Terefore, the residual structure is introduced into the super-resolution method to make up for the shortcomings caused by gradient disappearance and gradient explosion. Te deep super-resolution network (EDSR) [25] was inspired by the residual structure. Compared with the traditional residual structure, the residual blocks of EDSR discard unnecessary modules, thus constructing a multiscale depth super-resolution system (MDSR), which can reconstruct high-resolution images with diferent magnifcation factors in a single model. In addition, the SR robustness of images in complex scenes should also be focused on. A heterogeneous group SR CNN [9] contains multiple heterogeneous group blocks. Tese blocks increase the internal and external relations of diferent channels in a parallel way to cope with SR in complex scenarios. An enhanced super-resolution group CNN (ESRGCNN) [26] can fully fuse the correlation between wide channel features and retain the long-distance context dependence in the upsampling operation to obtain more accurate lowfrequency information. Further, in order to solve the common problems in image super-resolution algorithms, such as image edge blurring caused by redundant network structure, infexible selection of convolution kernel size, and slow convergence speed of training process, MFFN [27] used a lightweight fusion multilevel single image super-resolution method to achieve SISR.

Super-Resolution in Medical
Imaging. Te problem of super-resolution has been widely discussed in medical imaging. Due to limitations such as image acquisition time, low radiation dose, or hardware limitations, the spatial resolution of medical images is insufcient [28]. To solve this problem, Zhu et al. [29] proposed a method for arbitrary scale super-resolution (MIASSR) of medical images, where the method also combined meta-learning with GAN, which can be used for super-resolution at any magnifcation.
To get as many useful image details as possible, Bing et al. [20] proposed a SR method in medical imaging based on an improved generative adversarial network. Tis method can not only avoid the interference of high-frequency false information but also integrate the low-level feature constraints to train the model. Zhang et al. [21] proposed a fast medical image super-resolution method, in which subpixel convolution layer addition and mini-network replacement in the hidden layer were crucial to improving the speed of image reconstruction. Inspired by the super-resolution convolutional neural network method based on three hidden layers, Deeba et al. [18] proposed a wavelet-based microgrid network super-resolution method for medical images, where image restoration was speeded up by adding a subpixel layer to replace the small grid network on the hidden layer.

Attention Mechanism for Vision Tasks.
Attention has arguably become one of the most important concepts in the feld of deep learning. It was inspired by human biological systems, which tend to focus on unique parts when processing large amounts of information [30]. Liu et al. [31] proposed a multiattention domain module to weigh and reorganize the features; the channel and spatial domain information in the super-resolution method are efectively fused, and the quality of the super-resolution image is effectively improved. Wang et al. [32] proposed two new attention mechanisms: context-weighted channel attention and persistent spatial attention. Te proposed attention modulates rich features by suppressing useless features and enhancing features of interest in a channel and spatial manner. Liu and Chen [33] made the following improvements on the basis of the super-resolution universal reverse network (SRGAN). Firstly, they added the channel attention (CA) module to the SRGAN network and increased network depth to better express high-frequency features. Secondly, the old batch normalization layer is deleted to improve network performance. Finally, the loss function is modifed to reduce the infuence of noise on the image.

Overview.
In the image super-resolution task, our goal is to take the low-resolution (LR) image I LR ∈ R H×W×C as the input of the super-resolution model and generate the superresolution (SR) image I SR ∈ R H×W×C . While the general lowresolution image I LR is obtained by downsampling the ground-truth of the high-resolution image I HR ∈ R H×W×C . We expressed the super-resolution model as G and the parameter as θ G . Te super-resolution task can be expressed as the following formula: (1) In order to make I SR as similar to I HR as possible, it is necessary to optimize the model G with the loss function L, and fnally the optimal parameter θ * G is obtained. Te objective formula is as follows: Te proposed architecture of super-resolution is shown in Figure 1. Ten, the details are given about the feature extraction block, parameter-free chunking fusion block (PCFB), and image reconstruction block. Finally, the composite loss and the training strategy are introduced to enhance the model's expressiveness. First, if the normal ReLU activation function is used, when the feature x is less than 0, x will be suppressed to 0, and the feature information will be lost. Terefore, we use PReLU [34] (parametric rectifed linear unit) to replace ReLU. PReLU adds a learnable parameter on the basis of ReLU, which can adjust the activation function according to diferent experimental conditions. Te formula is as follows:

Network
where x represents the the feature map, a i ∈ [0, 1] is a learnable parameter. Second, if batch normalization (BN) is used, due to the diference in the mean and variance of data in the minibatch, unstable statistical data may be brought [35], and instance normalization [36] can avoid the above small batch problems. However, the work reported in [37] shows that adding instance normalization does not always bring performance improvement, and manual adjustment is required. Terefore, we introduce layer normalization (LN), which was used by relevant papers of transformer [23] in the early stages. Many recent SOTA methods [38][39][40] also use this normalization. LayerNorm is independent of the batch size, so it will not be afected by the above problems, and there are no parameters that need to be manually adjusted in the instance normalization. Terefore, LN is introduced to stabilize the training and improve the performance. Te normalization formula is as follows: where x represents the feature map, ϵ is a small constant, E[x] is mean, Var[x] is variance, and c and β is scale and shift. Te same normalization method is used as BN, but the diference is that LN normalizes each single batch rather than normalizing all batches together like BN.

Parameter-Free Chunking Fusion Block (PCFB).
In order to improve the propagation of feature information, Zhao et al. [41] designed module CSB to help the neural networks deal with hierarchical features with diferent attributes. Because CBF contains a large number of parameters that need to be learned and the ftting speed is slow, we propose PCFB that does not need to learn a large number of parameters on the basis of maintaining image quality. In PCFB, chunking and fusing are represented as channel splitting and channel merging, respectively. Te diference from CSB is that the size of the chunking is determined by the parameter n, where each input feature x is divided into n chunks, and each chunk x i is the size of H × W × (c/v). Subsequently, in order to carry out targeted feature enhancement for each block of data, SimAM is used to process features of diferent blocks, and SimAM does not need redundant parameters to be learned, so the number of model parameters will not be increased.
(1) Chunking and Fusing. Te input feature x can be divided into n chunks along the channel direction, and the dimension of each chunk is H × W × (c/v). It can be formally expressed as follows: where S(·) is the chunking function which split feature map x into n chunks x 1 , x 2 . . . x n . In contrast, M(·) is the fusing function, which can merge x 1 , x 2 . . . x n back to the original dimension use concat function.
(2) Parameter-Free Attention. Normally, spatial attention is often used for spatial information, while channel attention is often used for channel information to focus on feature information. However, in human eyes, spatial attention and channel attention coexist and jointly promote information selection in visual processing. Terefore, we need a threedimensional attention to focus on the features in each channel and spatial position, so a parametric 3D attention SimAM is used to enhance the features of diferent chunks in the paper. Te structure of the proposed method is shown in Figure 2.
SimAM evaluates the importance of each neuron by constructing an energy function e * t . Te lower the energy, the greater the diference between the neuron and surrounding neurons, and the higher the importance of features. Te energy function is as follows: where t is a neuron which means a pixel of feature map x, u, and σ represent the mean and standard deviation of the characteristic map, respectively, and λ is a hyper parameter. Terefore, the importance of neurons can be obtained by e * t . In addition, the attention mechanism can be realized by weighting the feature map through the sigmoid function. Te formula is as follows: where ⊗ means element-wise multiplication, and E is the energy matrix containing all e * t . Tis module does not introduce any additional training parameters, so it does not increase the original network parameters on the premise of improving performance.
(3) Parameter-Free Chunking Fusion Block. In order to better learn and enhance the features, we use equation (5) to obtain n chunks and then let each chunk pass through equation (8) alone for 3D weighted attention. Equation (6) is used to fuse them into the original size like equation (9). Te process is shown in Figure 1.

Image Reconstruction Block.
In order to change the image to the super-resolution size, the upsampling operation is required, and we build the image reconstruction part to realize it. As shown in Figure 1, image reconstruction includes 3 × 3 convolution, 1 × 1 convolution, PReLU, and PixelShufe [42] layers. Te main function of PixelShufe is to obtain highresolution feature maps by multichannel recombination of low-resolution feature maps. As shown in Figure 3, the feature mapping of the r 2 channels is recombined into the supersampling result of (H * r) × (W * r) of a single channel. Pixel shufe transforms the feature map from lowresolution space to high-resolution space.

Conventional Loss.
Most super-resolution methods use pixel loss to optimize the network. Pixel loss measures the pixel-wise diference between SR image and HR image, which contains L1 loss and L2 loss. Compared with L1 loss, L2 loss penalizes large errors but has a higher tolerance for small errors. In actual training, L1 loss [25,43] shows better convergence than L2 loss. Finally, a higher peak signal-tonoise ratio (PSNR) index will be obtained, so it is the most widely used loss function in the super-resolution feld. Te formula is as follows: However, since such pixel loss does not consider the image quality, such as edges, textures, and high-frequency details, which may be too smooth to maintain sharp edges to obtain visual efects.

Perceptual Loss.
In order to incorporate high-level feature loss on the basis of pixel loss, perceptual loss [44] is introduced. Te perceptual loss uses the pretrained VGG [45] network to extract the high-level features of the image and constructs the perceptual loss through the Euclidean distance between the HR image features and the SR image features to restore the perceptual quality of the image. Te formula of perceptual loss is as follows: where ϕ i (·) denotes the i-th layer output of the VGG model.

Edge-Aware Loss.
In order to combine the loss of image edge information on the basis of pixel loss, we further introduce edge-aware loss [46]. In edge-aware loss, edges of the SR image and HR image are extracted according to the edge extraction operator, and then the diference is Journal of Healthcare Engineering 5 calculated between the output and the label edge. In this paper, Laplacian operator is used to extract edge features. Te formula of edge-aware loss is as follows: where c i (·) denotes an edge extraction method based on Laplacian operator.

Our Composite Loss.
Our loss function uses L1 loss as the basic loss function, adds perceptual loss to avoid the loss of high-level features, and adds edge perceptual loss to further monitor the integrity of image edge information. Te formula is as follows: where α and β are hyper-parameters. We use our composite loss to optimize the proposed model, and the algorithm for training the model is shown in Algorithm 1.

Dataset.
Te IXISR dataset was constructed by Zhao et al. through further processing of IXI dataset [41], which contains three types of MR images: 81 T1 volumes, 578 T2 volumes, and 578 PD volumes. In this work, we take the intersection of these three types of MR images to obtain 576 3D volumes of each type of MR image. Tese 3D volumes are then trimmed to 240 × 240 × 96 (H × W × D) to ft the three scaling factors. For SISR, each 3D MR voxel is divided into 96 (H × W) gray-scale images. LR images are generated based on bicubic downsampling and K-space truncation. As for truncation degradation, HR images are frst converted to k-space by discrete Fourier transform (DFT) and then truncated along the height and width directions.

Implementation Details.
Our method is implemented by using the paddle framework. Similar to the previous work, in the IXISR [41] dataset, we use 70% of the images as the training dataset, 10% as the validation dataset, and 20% as the test dataset. Te size of the small batch is set to 16, and the parameter α in the loss function is set to 0.3, the parameter β is set to 0.1, and the parameter n is set to 2. We use a size of 24 × 24 randomly extracted from LR slices and the corresponding HR area. Data enhancement is simply achieved by random horizontal fipping and 90 degree rotation [25]. And millions of iterative trainings are conducted on the NVIDIA GeForce GTX 3090 GPU. We use Xavier initialization [47] and Adam optimizer for all model parameters and an initial learning rate of 0.001 for iterative optimization. Trough the optimization of Algorithm 1, a single iteration of the proposed model including all modules takes about one minute. Te space complexity depends on the number of parameters involved in the calculation. Specifcally, the representation of the number of parameters is refected in Table 1.

Evaluation Metrics.
For quantitative comparison, highly reliable metrics are introduced, such as root mean square error (RMSE), peak signal-to-noise ratio (PSNR), and structural similarity index (SSIM). Te calculated metric scores are derived from the comparison of the results I SR obtained by the super-resolution method and the highresolution image I HR .

Root Mean Square Error (RMSE)
where h ∈ [0, H − 1] and w ∈ [0, W − 1] together represent the position of the pixel in I HR and I SR .

Peak Signal-to-Noise Ratio (PSNR)
PSNR � 10 × log 10 where n is the number of bits per pixel value, which generally takes 8.

Structural Similarity Index (SSIM)
SSIM � 2μ HR μ SR + c 1 2σ (HR,SR) + c 2 where I SR is obtained by the super-resolution method and I HR is the high-resolution image, respectively; μ HR and μ SR are the average; σ HR and σ SR are the standard deviation; σ (HR,SR) is the covariance of HR and SR; and c 1 and c 2 are small constants.

Experimental Results.
In this paper, the expressiveness of diferent models is compared in the case of the IXISR dataset (PD, T1, and T2) of ×2 super-resolution. PSNR, SSIM, and RMSE are used to evaluate the expressiveness of the model. Subdatasets are used under two diferent sampling (bicubic degradation and truncation degradation) in the dataset. Bicubic downsampling is widely used by LR image generation simulation in SR images, where bicubic downsampling is used to downsample HR images and generate LR images. Truncation degradation is a process that simulates the real image acquisition process. Te LR image is obtained by k-space truncation, which means that the HR image is intercepted in frequency space for sampling. Tables 1 and 2, respectively, show the evaluation results of diferent models of PD, T1, and T2 datasets under the bicubic downsampling and truncation degradation methods. From Figures 4 and 5, we can see that our model has higher expression ability than other models. Compared with the two residual-based networks SRResNet and EDSR, our module adds PCFB, which helps to improve the performance of the model.

Ablation Studies.
Te proposed method is based on the improvement of SSResNet, so the ablation experiment will also be carried out around SSResNet. In Tables 3 and 4, we compare the number of parameters and the performance in PSNR, SSIM, and RMSE for all methods. Note that all results are the average values of PSNR, SSIM, and RMSE calculated from MR images on the same dataset. Te experimental results show that the proposed method improves the PSNR, SSIM, and RMSE of LR images obtained from BD and TD by 0.2 dB, 0.33 dB, 0.06 dB and 0.17 dB, 0.15 dB, 0.25 dB, respectively, compared with SRResnet, although the amount of parameters is only 0.01 MB lower. Tis shows that PCFB is more efective.
In order to evaluate the efectiveness of the composite loss we constructed, we performed ablation experiments with diferent loss functions on the PD data in the dataset, as shown in Table 5. Compared with L1 and L2 loss functions, the PSNR performance of our composite loss Back propagation update θ G according to gradient (zL/zθ G ).

Conclusion and Future Work
High-resolution MR images have smaller voxel sizes, providing clinical physicians with more accurate structural and textural details. However, generating high-resolution MR images usually incurs enormous costs. Image super-resolution is an efective and cost-efcient alternative technique for highresolution restoration of low-resolution images. In this work, we propose a novel end-to-end MR image superresolution method. First, we introduced a parameter-free block fusion block (PCFB) that can split the feature map into n branches for better fusion features without parameters. Second, a training strategy combining perceptual loss, gradient loss, and LI played an important role in accelerating model ftting and improving prediction accuracy. Finally, the proposed method is efective in the super-resolution task of MR images, improving model accuracy. Our future work needs to focus more on lightweight processing of the model to reduce the model's parameters while achieving the optimal model accuracy mentioned in the paper.

Data Availability
Te IXISR dataset used to support the fndings of this study are included within the article [41].

Disclosure
Mingyang Hou and Hongyi Wang should be considered as co-correspondents.

Conflicts of Interest
Te authors declare that they have no conficts of interest.  Journal of Healthcare Engineering 9