Multi-FusNet of Cross Channel Network for Image Super-Resolution

Image Super-resolution (SR) has gained considerable attention in artificial intelligence (AI) research and image-based applications. Recent deep learning-based SR models have demonstrated remarkable accuracy and perceptual quality in the resulting images. However, the computational cost and model parameters are the most challenging limitations in real-world applications. Additionally, designing an efficient and lightweight SR algorithm to improve the perceptual quality of the SR images is a critical issue. According to these considerations, we propose a Multi-FusNet of Cross Channel Network (MFCC) network by modeling a multipath residual network, named multi-RG, with cross-filtering fusion. Additionally, a pixel shuffling fusion technique is used to fuse low-level features into the up-sampled features of the multi-RG. The experimental results show the comparison of the proposed MFCC to the state-of-the-art SR models. The proposed method significantly reduces the number of network parameters (8.4 times compared to RCAN) while preserving the visual quality of the result and achieving the best PSNR value compared to the other state-of-the-art methods.


I. INTRODUCTION
The image Super-resolution (SR) technique aims to reconstruct a high-resolution (HR) image from the low-resolution (LR) input. The SR is considered an ill-posed problem. Therefore, many SR techniques have been developed for producing HR images. However, the perceptual quality and execution time of the SR model are two critical factors in designing an effective and robust SR model for real-world applications. A real-world problem in utilizing closed-circuit television (CCTV) [1] for intelligent monitoring [2], such as traffic congestion and accidents [3], home management security [4], face recognition [5], and social identification [6], has The associate editor coordinating the review of this manuscript and approving it for publication was Senthil Kumar . appeared because of the low-resolution (LR) images captured by CCTV cameras [7].
Many research topics, such as feature preservation in video coding [8], [9] and super-resolution [10], [11], [12], have been published to solve the low-resolution problem. Deep Convolutional Neural Networks (CNN) have recently become a powerful model for solving the ill-posed SR problem [13]. Dong et al. [14], [15] proposed a super-resolution algorithm using a three-layer deep convolutional neural network (SRCNN) that was linearly stacked together. In this shallow and straightforward architecture, the first layer is designed to extract the features from the LR input, the second layer is used for non-linear mapping from low-dimensional to high-dimensional features, and the final layer is responsible for aggregating the feature maps of the earlier layer to the final HR result. This SR network is trained using an end-to-end approach to minimize the Mean Squared Error (MSE) between the ground truth (GT) image and the reconstructed (SR) image. Following the SRCNN model, other models, such as Very Deep Super-resolution [16] (VDSR) and Deeply Recursive Convolutional Network (DRCN), achieved significant improvements by increasing the depth of their CNN architectures.
Inspired by the Residual Network architecture [17] (ResNet), some SR models have attempted to design deeper architectures. Lim et al. proposed a deep SR architecture called Enhanced Deep Residual Network [35] (EDSR) based on the residual concept. This SR model stacks residual blocks to design deeper networks (almost 165 convolutional layers) and achieves considerable improvement compared with earlier SR models. However, training such a deep trainable network, such as EDSR [35] is challenging, and the large number of parameters in a deep network is a significant obstacle to fast execution in real-world applications and hardware implementations.
On the other hand, some recent CNN-based models [23], [25], [28], [31], [32] utilize a multipath network architecture to solve the limitations of deep networks. In this approach, rather than using a single-path network, the multipath network operates in parallel. Based on this concept, the depth of the network decreases, whereas the performance of the SR model increases. Based on the multipath structure, Hui et al. proposed an Information Distillation Network (IDN) by designing cascaded network paths operating in parallel. This implies that the layers of the network do not require waiting for the calculations of previous layers. Additionally, the different types of extracted features from each path are mixed, which improves the SR model's operation time and the perceptual quality of the result. Although the SR models based on the multipath approach achieved acceptable performance and execution time, their results achieved low PSNR and SSIM values. Additionally, the low-level feature-sharing approach is used in some SR models [20], [21], [24], [33] to enhance the low-frequency information flow in the SR network structure. This low-level feature-sharing approach attempts to transfer the low-level features of the early CNN layer, to the latest layers and fuse the low and high-level features. Hence, this technique improves reconstruction quality by enhancing the sharpness of the SR image. Motivated by this technique, SR models such as the Feedback Network [20] (SRFBN), Adaptive Weighted Super-Resolution Network [24] (AWSRN), SelNet [33], and Attentive Auxiliary Features [21] (A2F) utilize the feature sharing approach. This idea demonstrates more effective enhancement of the lightweight architecture.
To address these issues, we propose an efficient lightweight SR model for image enlargement, with the following main contributions: 1) Proposing the multi-depth cross-channel network to obtain local pixel attention features from low-resolution images.
2) Investigating the doubling stage of the residual identity connection to retrieve merged features represented by lowlevel features.
3) Exploring low-level feature sharing to fuse low-level information with unsampled features enhances the model's reconstruction ability.
The remainder of this paper is organized as follows. Section II presents a review of related works. Section III explains the proposed method. Section IV discusses the experimental results, and Section V presents the conclusions.

A. LINEAR NETWORKS
Linear networks have simple architectures with a single path that linearly stacks the convolution layers to allow the information to flow in the network. These linear models are further categorized into early and late up-sampling designs. The early up-sampling design was inspired by SRCNN [14], [15], which operates based on up-samples of the LR image in the first stage and, then reconstructs the HR image. This model uses an early up-sampling design. This model [16] obtains a larger region of contextual information to improve the results. Additionally, they increased the depth of the VDSR network by stacking 20 convolutional layers. The FSRCNN model [37] uses the late up-sampling framework. This framework performs an up-sampling operation toward the end of the network to improve the computational cost. The architecture of the FSRCNN contains four convolution layers and one deconvolution layer at the end of the model to produce the upscaled image. The output result was reconstructed by combining residual learning with bicubic interpolation. This model was designed for rapid processing in real-time applications.

B. RESIDUAL NETWORKS
The Residual Network introduces skip connections into the neural network architecture. This idea attempts to focus on high-frequency information in a very deep network. The residual network concept is further categorized into single-stage and multi-stage residual networks. Inspired by the Residual Network architecture [17] (ResNet), the Enhanced Deep Super-Resolution (EDSR) was modified by removing Batch Normalization (BN) layers and ReLU activation. This model decreases the number of parameters while simultaneously improving the performance of the SR model. This method [18] proposes a cascading mechanism to improve the performance of the model and weight trade-offs. CARN uses the ResNet [17] architecture, and a cascading mechanism at the local and global levels is used to include features from all layers. The multi-scale residual network (MSRN) [19] has been proposed to address feature utilization and the adaptation of arbitrary scaling factor problems. This model can fuse the image features at different scales. This is the first multi-scale module based on a residual structure, which is very easy to train. The model shows superior performance compared to other state-of-the-art models on various benchmark datasets.
The Adaptive Weight SR Network (AWSRN) [24] was designed to resolve the heavy computational cost problem. This model consists of Local Fusion Blocks (LFB) designed with residual learning-based embryonic adaptive voluminous residual units (ARWU) and a local residual fusion entity (LRFU). Apart from the LFB, it also contains an adaptive weight multi-scale module (AWMS) to enhance the reconstruction layer. The AWMS is an important contributor to the design of lightweight network structures. The Inception Network [12] proposed an asymmetric residual architecture to reduce the number of parameters. They were inspired by the Inception network concept, Muhammad et al. [12] proposed Multi-Scale Inception Based Super-Resolution (MSISRD). In this SR model, the short and long feature information is directly extracted using a locally residual asymmetric convolutional block and an inception-based asymmetric convolutional block architecture by the model. The A2F model [21] utilizes additional features and a channel attention mechanism to improve the model's performance while reducing the weight of model. This study has proven that having fewer auxiliary features results in less high-frequency information and consequently decreases the accuracy of the SR model. In addition, the A2F model outperforms other state-of-the-art models on all scales and has a faster execution time. The FALSR method [23], Fast and Lightweight SR with Neural Architecture Search contributes to maximizing the balance between the image restoration and the models' weight. In the proposed model, an elastic search approach is used, which is based on a hybrid controller at both the micro and macro levels.
SFFN [39] proposed an efficient feature fusion block, along with lightweight and shallow residual blocks. This model efficiently fuses the features of different blocks and improves the model's performance and execution time. Moreover, they introduced an attention mechanism for reinforcing the useful cross-layer features of each channel. This lightweight SR model outperformed other state-of-the-art methods.

C. RECURSIVE NETWORKS
This design focuses on breaking the more significant SR problem into a simple smaller entity. The contributions of this network design are as follows. The DRCN model [25] is based on a recursive CNN containing almost 16 layers of recursion. This method improves the performance without increasing the parameters. The only drawback, that is the learning difficulty of this method, can be solved by recursive supervision or skipping connections. This model reduces the weight of the network by introducing recursion and skip connections. This reduced the training difficulty of the model.

D. PROGRESSIVE RECONSTRUCTION NETWORKS
The progressive reconstruction approach suggests a progressive network in the SR area to improve the SR results with larger scaling factors. Another benefit of the progressive approach is that the predictions are made in multiple sub steps. The Laplacian Pyramid Framework [28] (LapSRN) uses progressive up-sampling to reconstruct fast and accurate residuals of HR images. Some important limitations of previous state-of-the-art models, such as high computational cost, blurry images, and learning difficulty, were overcome by the LapSRN model because of the progressive approach in the architecture of the SR model. This method uses cascaded CNNs to predict the sub-band surplus in a rough-to-fine texture. The LapSRN method has 27 layers overall, takes LR as input, uses residual learning, and performs progressive reconstruction with a char bonnier loss function. The proposed method constructs high-quality HR images faster than other state-of-the-art methods. It also helps to remove the blurred kernels. The only problem with this model is that it does not hallucinate fine details over large scales.

E. MULTI-BRANCH NETS
The multi-branch architecture proposed a successful model for increasing the information flows between the network layers. To obtain diverse information and features from multiple scales, a multi-branch network architecture was used. This architecture obtains complementary information and merges them for better HR reconstruction. Information Multi-distillation Network [32] (IMDN) proposed a lightweight multi-branch architecture to solve the learning complexity limitation caused by the numerous convolutional layers. In this multi-branch architecture, a distillation block is designed to extract hierarchical features and combine them using cascaded Information Multi-distillation Blocks (IMDB). The IMDB blocks are formed from distillation blocks, and the fusion module extracts features at a coarse level, retaining partial information. It then aggregates them using the channel attention mechanism to improve the refined information (edges, corners, and textures).

F. ATTENTION BASED NETWORKS
To improve the performance of learning-based SR models, an attention-based technique was designed as an enhancement module to pay attention to specific varying features. A deep CNN with Selection Units [33] (SelNet) is motivated by CNN's linear mapping techniques. The Rectified Linear Unit (RLU) was used for linearly mapping the LR images, inspiring the creation of a non-linear unit known as the Selection Unit (SU). Because SU combines identity mapping and a sigmoid switching function, it has better control over the data passed through than ReLU [40]. The results show that the proposed network has a much lower computational complexity and outperforms the baseline model with only ReLU and state-of-the-art SR methods. Very Deep Residual Channel Attention Networks [34] (RCAN) utilizes the Residual Channel Attention architecture for their SR model. They designed residual in residual (RIR) architecture block consisting of Residual Group (RG) and a Residual Channel Attention (RCA) block. The RG structure uses a short skip connection as a residual component, whereas the RCA utilizes a long skip connection to target the LR feature components. Additionally, channel attention (CA) was introduced to affect the feature rescaling channel. Although RCAN produces high-quality SR results, the complexity of the architecture increases processing time [13], [36]. Channel Split Image Super-Resolution (CSISR) [41] improves the learning capability of the SR model with a novel channel attention mechanism. The proposed attention mechanism utilizes a combination of global average and standard deviation pooling along with the non-linear mapping layers. In addition, CSISR demonstrated an efficient and lightweight architecture to enhance computational complexity problems and outperformed other state-of-the-art models. Based on the dynamic residual attention (DRA) approach [42], the dynamic residual self-attention network (DRSAN) proposed a lightweight SR model. The proper weights for each residual path statistical investigation of the input image, and interrelation between residual paths boost the reconstruction capability of this model. Additionally, a residual self-attention (RSA) block was proposed to generate 3-D attention maps without additional parameters. In [43], the Information-Growth Attention Network (IGAN) has introduced a new type of attention mechanism called the ''information-growth attention.'' This attention mechanism focuses on features that have the potential for large information-growth capacity by analyzing the differences between the current features and the previous features within the network. The Context Reasoning Attention Network (CRAN) [44] adoptively adjusts the convolution kernel based on the global context. This model first extracts global context descriptors and, then introduces channel and spatial interactions to produce a context reasoning attention mask. In [45], a second-order attention network (SAN) used a trainable second-order channel attention (SOCA) module to rescale channel-wise features with second-order feature statistics. This approach results in more discriminative representations.

III. MULTI-FusNet OF CROSS CHANNEL NETWORK
The proposed SR network is designed based on a multipath residual architecture that provides a wider network rather than a deeper one, resulting in more efficient and faster execution. The architecture, named Multi-FusNet of Cross Channel Network (MFCC), consists of four main modules: feature extraction, Residual Group (Multi-RG), enlargement, and low-level fusing, as illustrated in Figure 1. The Multi-RG architecture has been designed by integrating the RCAN [34] with a multi-identical residual link. Due to the increase in the number of multiplicities (possible paths from the input to the output layer) in the proposed architecture, the information flow between RG blocks gradually increases, which helps reduce the computational complexity. The first convolution layer is the feature extraction module, which feeds low-level features to the RG blocks and low-level fusion module. The proposed cascading topology in the MFCC network is composed of three different paths, that form a multipath residual configuration.
Each Residual Group (RG) block consists of N stacked Residual Channel Attention (RCA) blocks and a short residual skip connection within the block. The first path consists of two stacked Residual Group blocks, whereas the second path has one Residual Group block cascaded with the first path. The third path of the model bypasses the low-level details of the earlier layer and fuses them with the up-sampled features of multi-RGs, as shown in Figure 1. The proposed SR model begins by feeding the LR input to a convolutional layer. The resulting features are then passed through three different paths, with a kernel size of k × k. To perform image enlargement, we utilize the pixel shuffle technique, which transforms low-level feature maps into different channels and shuffles the features to enlarge them. Although the RCAN model offers high accuracy, its large number of parameters results in slow execution times which makes its implementation in real-time applications challenging. To create a lightweight architecture, we reduce the number of residual groups in our model and incorporate a multipath residual network architecture. This design improves both accuracy and processing speed, resulting in a more efficient model compared to those that use a non-cascading (deep) architecture. To further improve our model's capability to extract sharp attribute details, we exploit the low-level features of early CNN layers and share them with the up-sampled features of our multipath residual network. Many SR models suffer from over-smooth degradation due to the lack of high-frequency details in the latest layer of CNN, leading to perceptually unpleasant images at large scales. By incorporating a pixel shuffling fusion technique, we can overcome this limitation and produce high-quality results. The following section describes the residual group block, multipath residual configuration, and pixel shuffle fusion method.

A. RESIDUAL GROUP
In this section, the Residual Group (RG), which is a robust feature extraction of low-resolution, is presented. The proposed Residual Group is constructed by applying identity residual connection at the edge of N sequences of the statistical Channel Attention network in [34], where N > 0. The output of RG is denoted by O RG , and defines as in Eq. (1).
where I RG represents RG input. W RG denotes the weight parameter in the Residual Group block, and F RCA N is the channel-wise feature of the Residual Channel Attention (RCA) block. The channel-wise feature from the RCA can be determined by Eq. (2) and (3).
where F RCA N and F RCA N −1 denote the outputs of N th and (N − 1) th Residual Channel Attention blocks, respectively. H RCA shows the corresponding operation function of the RCA. F RCA 0 is the output of the first Residual Channel Attention block. I RG shows the input of the first Residual Channel Attention block. The Channel Attention (CA) mechanism is a technique that uses the interdependencies among the feature channels. The CA technique leads to more focus on informative features in the SR model and consequently improves the image reconstruction capability of the model. More details on the CA mechanism and the corresponding operation function in the Residual Channel Attention block can be found in RCAN [34].

B. MULTIPATH RESIDUAL
As shown in Figure 1, the combination of Residual Group blocks under the multipath-residual architecture [46] is used in our model. Based on multipath residual evidence [47], a wider residual architecture significantly improves the accuracy and computation speed of the model compared with a deeper residual architecture. These improvements are related to the increasing multiplicity of the wider residual network. Multiplicity implies the number of possible paths from the input layer to the output layer. A sequence of two Residual Group blocks is utilized in the first path of our model, and one Residual Group block is employed in the second path. Eq.4 defines the multipath output of the proposed model.
where O 2 RG and O 1 RG denote the outputs of two Residual Group blocks in the first path and a Residual Group block in the second path, respectively. O MR denotes the output of the proposed multipath residual architecture.

C. PIXEL SHUFFLE FUSION
A low-level feature-sharing approach is employed to improve the sharpness of reconstructed results. Because the low-level features of the early layer contain more high-frequency information, sharing them improves the challenging weakness of the SR model in recovering the sharp attributes of the lines and edges. Simultaneously, it preserves our model against over-smoothing degradation.
Our model utilizes the pixel shuffle fusion approach to bypass the low-frequency features of the early layer of the VOLUME 11, 2023 56291 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. SR network to up-sampled features. The proposed model employs the pixel shuffle [48] to up-sample the image. Based on the feature sharing concept, the features of the early layer are also up-sampled by the pixel shuffle model and fused with the up-sampled features of the multipath residual network, as shown in Figure 1.
where, PS 1 and PS 2 represent the pixel shuffle up-sampling on the multipath residual network and low-level features sharing, respectively. W MR and W p represent the convolution operations of the multipath residual output and up-sampled low-level features, respectively. The pixel shuffle can be mathematically expressed as Eq.6.
where PS(U ) is the output, a is the scale factor, i, j, show pixel coordinates. c is the channel position. To modernize the final SR image, the up-sampled high-frequency details of the early layer are fused with the up-sampled features of the multipath residual model.
where PS Fus denotes the fusing pixel shuffle result and, W SR defines the last convolution operation to produce the final SR result. Utilizing the proposed fused approach improves the capability of our model to recover the sharp attributes of images and improves the perceptual quality of results by preventing an over-smoothing problem.
where SR and y are the result and the reference image, respectively, n and m are parameters related to the training dataset.

IV. EXPERIMENTAL RESULTS
In this section, several experiments were conducted to validate the performance of our model. First, the hyper-parameter settings of the proposed model are explained. The experimental results and analysis are then demonstrated.

A. PARAMETER SETTINGS
To train our SR model, we utilized the DIV2K [49] dataset, which included 800 images. For testing our SR model, five standard benchmark datasets including Set5 [49], Set14 [50], B100 [51], Urban100, and Manga109 were considered. For the degradation models, we apply Bicubic Interpolation (BI) was used in our experiments. Y-PSNR, and Y-SSIM were used to evaluate SR model accuracy. For data augmentation, 800 training images were randomly applied with three rotations such as 90, 180, 120 degrees, and horizontal flipping. We extracted 16 in 48 × 48 test LR color patches to get the input for each training batch. We train our model with an ADAM optimizer, and the parameter values of β 1 , β 2 , and ϵ are 0.9, 0.999, and 10 −8 , respectively. Initially, we set the learning rate to 10 −4 and reduced it to half every 2 × 10 5 iterations. The models were implemented using PyTorch on a Titan Xp GPU. The other combinations differed based on the number of RCA and Residual Groups. Regarding lightweight architecture, the Residual technique provides a wider network architecture and optimizes network parameters and accuracy. Residual Networks [52] optimize residual blocks by expanding the residual information to a broader network architecture. This network architecture improves processing speed by removing some parts of the sequential block from a parallel block. It was also proven that the results showed better accuracy and convergence.

B. EXPERIMENTAL RESULTS AND ANALYSIS
According to the graph, RG has a negligible impact on quality. Therefore, only one RG block is used to reduce the number of parameters. However, if the number of RCAs reduce, there is a considerable decrease in the PSNR value. Regarding our contribution to the design of a lightweight SR architecture, it is important to ensure that the network parameters should be constrained to less than 2 million.
As shown in Figure 2, the PSNR decreases gradually as we move from the baseline architecture to lighter network architectures. Considering the constraint of network parameters for a lightweight model, combining 20 RCA with one RG indicates the optimum trade-off between the minimum number of parameters and the maximum PSNR for our lightweight network. In addition, by checking the effectiveness of RCA and RG, we found that the PSNR drops faster than decreasing the number of RG layers if we reduce the number of RCA layers. Moreover, by reducing the number of RCA layers, the performance of the parameter is reduced less than by decreasing the number of RG layers. Figure 2 presents a comparison between the baseline model and various extensions of the RCA and Residual Group (20 RCA with one RG). The PSNR values, which were calculated using the Set5 dataset with a scale factor of two, are shown. The models were trained for 600 epochs by using a fixed seed. The baseline architecture, with 20 RCA and 10 Residual Group (RG) configuration has a total of 15 million parameters and, achieved the highest PSNR. Among the other lightweight extensions of RCA and RG, a configuration with 20 RCAs and two Residual Network achieved 38.16 PSNR.
It's worth noting that this network configuration (shown in Figure 1) reduced the network parameters by 88%, while only decreasing the accuracy by only 0.09 dB compared to the baseline. Another benefit of this extension is that the 20 RCAs  VOLUME 11, 2023 and two Residual Networks with upscaling converge faster than the other extensions, reaching convergence at the 417th epoch. It is noticeable that the upscale parameter in this table refers to the up sampled low-level feature sharing, as demonstrated in Figure 1. Our experiment shows that the edges of the SR images recovered more clearly using this configuration. The proposed network architecture reduces the number of parameters by up to eight times compared to RCAN, with comparable quality. Table 2 presents comparisons of our proposed model with other state-of-the-art models on scaling factors of ×2, ×3, ×4, and ×8. The evaluation was performed using five benchmark datasets: Set5, Set14, B100, Urban100, and Manga109. Moreover, the comparison includes the number of parameters and multi-adds of each model. The best performance is denoted by red numbers and the second-best performance is represented by blue numbers. Based on Table 2, it can be seen that our model outperforms other in terms of 56294 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.  achieving higher performance with a reasonable number of parameters and multi-adds. Specifically, in comparison to the second-best PSNR at scale factor of 2, our model provides an improvement of 0.05 dB, 0.07 dB, 0.02 dB, 0.16 dB, and 0.16 dB for Set5, Set14, B100, Urban100, and Mango109, respectively.
Our model demonstrates a notable improvement in PSNR for upscale factors of 3× and 4×, with an accuracy of 0.3 dB and 0.26 dB compared to the second best on the Manga109 dataset, respectively. For the most challenging scale of 8×, our model exhibits superior performance on all datasets, except for the SSIM of the Set14 dataset. Nonetheless, our performance on SSIM is the second-best and differs only slightly (0.0002) from the AWSRN [24] model. Figure 4 shows the statistical analysis of our model, focusing on the PSNR and SSIM performance on all datasets across different scaling factors. The charts also provide a comparison with four other lightweight SR models: MSRN [19], IMDN [32], AWSRN [24], and A2F-L [21]. According to the statistical analysis of PSNR and SSIM, our model outperforms VOLUME 11, 2023 56295 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.    other lightweight state-of-the-art models for all scale factors. Figure 5, Figure 6, and Figure 7 demonstrate the visual comparisons of our proposed model at scale ×4 with other state-of-the-art models, including VDSR [16], IMDN [32], LapSRN [28], AWSRN [24], CARN [18], and MSRN [19]. The demonstrated patches were added according to the network parameters of the models.   Figure 6 shows the visual comparison of ''148026'' belonging to the B100 dataset at scale ×4. The MFCC model achieved the highest PSNR and SSIM compared with other state-of-the-art models. Additionally, the high-frequency details of the tiny lines were recovered more effectively without any over-smoothing degradation. Figure 7 shows a visual comparison of ''img037'' belonging to the Urban100 dataset at scale ×4. Our lightweight MFCC model effectively produces an SR image that closely resembles the ground-truth (GT) image. Conversely, the results obtained from the other models exhibit shortcomings when reconstructing a sharp image. Our model achieved the highest PSNR and SSIM values.
The values (PSNR and SSIM) indicate our model's superiority over the other models. Figure 8, and Figure 9 display the visual comparisons of our proposed model at scale ×8 with other state-of-the-art models, including LapSRN [28], MSRN [19], and RCAN [34]. Figure 8 illustrates a visual comparison of ''img093'', which belongs to the Urban100 dataset. Compared to the other models, our lightweight model reconstructs the high-frequency details of lines, similar to the GT image. The PSNR and SSIM of our resultant image show significant improvement compared to other state-ofthe-art models. Figure 8 compares the results of ''Kyokugen-Cyclone'' image which belongs to Manga109 dataset. The PSNR and SSIM of the proposed model are the highest. The other models could not reconstruct the parallel lines located at the top of the selected patch and merged the lines. In contrast, our model shows a robust ability to produce tiny edges at this scale. Figure 3 compares the number of parameters and the performance of different SR models at a scale of ×4 on the Set5 dataset. Several state-of-the-art approaches including VDSR [16], LapSRN [18], DRCN [25], SelNet [33], CARN [18], IMDN [32], A2F-L [21] and AWSRN [24] were chosen to analyze our model's performance. As demonstrated in the graph, our MFCC model has the highest PSNR (32.42 dB), while the number of parameters in our model is 2.15 million.

V. CONCLUSION
This study proposed a lightweight single-image superresolution model based on constructing Residual Group blocks on a multipath residual architecture (MFCC). Utilizing a multipath residual network increases the efficiency of the proposed lightweight model. In addition, we addressed the lack of low-frequency details by employing the pixel-shuffle fusion method. Based on this approach, the low-frequency details of the early layer are up-sampled and bypassed into the up-sampled features of the multipath residual network. The high and low-frequency information of these layers are fused, which improves the line and edge reconstruction capability of the proposed model. The experimental results on five benchmark datasets demonstrate that our lightweight MFCC model outperforms other state-of-the-art models, particularly on a scale of ×8.
In future work, the proposed model will be tuned to achieve the optimum parameters and then implemented on FPGAs to support real-world applications.