An Infrared and Visible Image Fusion Algorithm Method Based on a Dual Bilateral Least Squares Hybrid Filter

: Infrared and visible images of the same scene are fused to produce a fused image with richer information. However, most current image-fusion algorithms suffer from insufﬁcient edge information retention, weak feature representation, and poor contrast, halos, and artifacts, and can only be applied to a single scene. To address these issues, we propose a novel infrared and visual image fusion algorithm based on a bilateral–least-squares hybrid ﬁlter (DBLSF) with the least-squares and bilateral ﬁlter hybrid model (BLF-LS). The proposed algorithm utilizes the residual network ResNet50 and the adaptive fusion strategy of the structure tensor to fuse the base and detail layers of the ﬁlter decomposition, respectively. Experiments on 32 sets of images from the TNO image-fusion dataset show that, although our fusion algorithm sacriﬁces overall time efﬁciency, the Combination 1 approach can better preserve image edge information and image integrity; reduce the loss of source image features; suppress artifacts and halos; and compare favorably with other algorithms in terms of structural similarity, feature similarity, multiscale structural similarity, root mean square error, peak signal-to-noise ratio, and correlation coefﬁcient by at least 2.71%, 1.86%, 0.09%, 0.46%, 0.24%, and 0.07%; and the proposed Combination 2 can effectively improve the contrast and edge features of the fused image and enrich the image detail information, with an average improvement of 37.42%, 26.40%, and 26.60% in the three metrics of average gradient, edge intensity, and spatial frequency compared with other algorithms.


Introduction
The aim of infrared and visible image-fusion technology is to extract and integrate features from images captured by different sensors using specific algorithms, thereby generating a complementary image that contains both rich detailed features of the visible image and target information of the infrared image [1,2]. This technology has a wide-ranging impact on economic development, involving applications in fields such as national defense, military [3], intelligent transportation [4], and power grid operation [5,6]. These applications require advanced electronic components and control systems, such as high-performance sensors, image processors, and communication modules, which jointly promote technological progress and economic development.
Currently, among the numerous infrared and visible image fusion algorithms, the widely adopted approach is the image-fusion method based on multiscale transformation [7,8]. However, this method does not consider spatial inconsistencies well, which can lead to distortions and artifacts near the edges [9]. Moreover, its transformation and inverse transformation processes are time-consuming and complex, requiring a large amount of memory space and computational resources [10,11]. To overcome these problems, image fusion is performed using an edge-preserving smoothing filter [12], which can effectively avoid artifacts and spatial inconsistencies generated by multiscale transformations by decomposing the image into base and detail layers through filtering [13].
Ch et al. [14] proposed an image-fusion algorithm based on high-order discrete wavelet components and guided filters, which can effectively smooth and enhance spatial information, but halos still exist near the edges. Singh et al. [15] proposed an image-fusion method based on a discrete wavelet transform and a bilateral filter to better preserve the edges of the image, but it reduces the contrast of the infrared image in the fused image. Li et al. [16] proposed an image-fusion algorithm based on a fast approximate bilateral filter and local energy features, in which the algorithm utilizes a fast approximate bilateral filter to decompose the source image five times, obtaining a base layer and several detail layer images. However, the algorithm preserves edge features more efficiently at the expense of time and reduces halos.
As the development of deep learning has progressed in fields such as object detection and image restoration, various deep-learning-based algorithms have also been applied to image fusion. These algorithms can extract significant features from source images, such as convolutional neural networks (CNNs) [17] and generative adversarial networks (GANs) [18]. Liu et al. [19] proposed a CNN-based fusion algorithm, that works with the result of the last layer as image features, and the network structure is too simple, resulting in the loss of useful information. To solve the issue of information loss caused by the increase of convolutional networks, residual networks (ResNets) [20] and dense convolutional networks (DenseNets) [21] make full use of and process deep features to ensure that more useful information can be retained during feature extraction. However, the image-fusion algorithms of these methods require the manual design of fusion layers and do not achieve true automatic image fusion. To alleviate the complexity of engineering design, recently proposed approaches utilize end-to-end network models for image fusion, such as Fusion2Fusion [22], ZMFF [23], and SwinFuse [24]. However, these methods cannot fully utilize the complex characteristics and face challenges in effectively preserving the original details.
In summary, although the smoothing filtering algorithms can provide edge protection, they do not sufficiently enable the extraction of image features. Additionally, if only the deep information of the image is extracted using the convolutional network during the fusion process [25], it can effectively improve the similarity of the fused image. However, this ignores important edge information, leading to drawbacks such as insufficient extraction of edges or textures in the fused image. To better extract the feature information of source images, enhance image contrast, and effectively preserve image edges, this paper is inspired by the mixed filtering proposed by Liu et al., and proposes an infrared and visible image fusion algorithm based on second-order bilateral-least-squares mixed filtering [26]. In this algorithm, a quadratic bilateral-least-squares mixed filtering model is used to decompose the source images into detail layer 1. The amplified detail layer 1 and the source image are then fused using a second decomposition to obtain detail layer 2 with sharper boundaries and more edge information, as well as a base layer. Multiple feature layers are extracted using ResNet50 and a high-quality saliency map is generated for weighted fusion with the base layer. Additionally, an adaptive weighted method based on the structure tensor is used to fuse the two detail layers obtained from the two decompositions. Finally, different combinations of the fused base layer, detail layer 1, and detail layer 2 are used to obtain the desired fusion image for multiple scenes. The contributions of this work are summarized as follows.
(1) This work proposes an infrared and visible image fusion algorithm based on a dual bilateral-least-squares hybrid filter. The hybrid filter achieves spatial consistency, edge preservation, and texture smoothing to solve halo artifacts around the edges and reduce noise in the fusion of infrared and visible images. Meanwhile, this paper provides two different layer combination methods, which can better adapt to the engineering application requirements. (2) The proposed image-fusion method applies a two-stage decomposition method with a bilateral-least-squares hybrid filter to obtain enhanced edge details from the source image. The adaptive weighting strategy based on the structure tensor is more effective in preserving the contour features present in the detail layer. (3) The proposed method utilizes the residual network's strong feature preservation and extraction capabilities to fuse the base layer, which contains many pieces of fundamental information in the image. This approach effectively retains essential information in the base layer and enhances the image's feature similarity. (4) The effectiveness and adaptability of a novel dual bilateral-least-squares hybrid filter (DBLSF)-based method for fusing infrared and visible images have been validated through experiments on 32 TNO datasets. The results indicate that Combination 1 can better maintain the image edge information and integrity while reducing the loss of important information features from the source images. On the other hand, Combination 2 is more proficient in enhancing contrast and edge features.
The remaining sections of the paper are presented as follows. Section 2 provides a brief overview of the fundamental principles behind the bilateral filter and least-squares filter. Section 3 outlines the proposed image-fusion method. Section 4 describes the experimental results and analysis. Lastly, Section 5 presents the conclusions.

Principle of the Bilateral-Least-Squares Filtering Algorithm
The bilateral-least-squares filtering algorithm is a hybrid filtering method that combines the ideas of bilateral filtering and the least-squares method, considering both the spatial and pixel value domain correlation. It effectively enhances image contrast while preserving edge information. Specifically, bilateral filtering is a non-linear filtering method that preserves edge information while removing noise, while the least-squares method is a mathematical optimization technique that uses the best-fitting of the curve to minimize the error between predicted and true values.

Bilateral Filter
A key issue in image smoothing filtering is effectively preserving edge information, as it greatly impacts the quality of the resulting fused image. Bilateral filters use functions composed of spatial and color information to effectively smooth images while preserving edge information [27]. For a given input image, with s as the central point and t as the image of any point in s's neighborhood N(s), the output image after being processed by a bilateral filter is denoted as u s , as shown in (1).
where G σ s and G σ r are spatial and Gaussian kernel functions, respectively. σ s represents the spatial proximity factor and grayscale similarity factor. G σ s (s − t) represents the spatial distance between point t in neighborhood N(s) and other points, while G σ r (g s − g t ) represents the difference in the gray value.
When it comes to image smoothing and edge preservation, bilateral filtering may cause gradient reversal and halo effects, and it is difficult to effectively remove strong speckle noise. In contrast, weighted least-squares filtering only performs smoothing filtering in flat areas, achieving the effect of noise removal and edge preservation.

Least-Squares Filter
The least-squares method is a mathematical optimization technique that seeks the best-fitting curve to minimize the error between predicted and true values. The weighted least-squares filter is a filtering method based on the least-squares method, which can efficiently extract background information and texture details from different spatial scales of the source image [28]. In the weighted least-squares filter, the weight values in the weight matrix depend on the local characteristics of the signal, such as the slope, curvature, and second-order derivative of the signal. By adjusting the weight matrix, a balance between noise suppression and edge protection can be achieved during the filtering process, and the quality of image filtering is further improved. The image g s of a center point s can be obtained by minimizing the following objective function: where the first minimizes the difference between u s and g s ; the second term is smoothed by minimizing the partial derivative of u s ; λ is a regular factor; and ω x,s and ω y,s are smoothing weights.
The ω x,s and ω y,s value can be derived by the following formulas: where is the logarithmic luminance channel of the input image g s ; that is, = log(g s ). The parameter α determines the sensitivity of the gradient of and ε (usually 0.0001) is a small constant to prevent division by zero in constant regions of image g s . Taking the derivative of the objective function (2) and setting it to zero yields the large sparse linear system represented by (4): In (4), where A x and A y are diagonal matrices containing ω x,s and ω y,s , respectively. D x and D y are forward difference operators, while D T x and D T y are backward difference operators. Therefore, L g is a nonhomogeneous Laplacian matrix with five points.
The linear system of equations in (4) can be utilized to derive a vector u that minimizes (2). The solution to this quadratic optimization problem, subject to linear constraints, can be expressed as follows: u s = (I + λL g ) −1 g s However, computing the nonhomogeneous Laplacian matrix in (4) incurs a high computational cost and L g causes (5) to be solved in the image domain, resulting in a very large inverse matrix. Therefore, setting ω x,s = ω y,s = 1 in (2) can turn it into an unweighted equation, as shown in (6).
Since (6) is the unweighted least-squares method and L g is the Laplacian matrix in (5), the smallest unique vector solution can be obtained in the Fourier domain. Then, the output image u s can be obtained by the following formula: where F (·) and F −1 (·) are the fast Fourier transform (FFT) and inverse fast Fourier transform (IFFT) operators, respectively. F (·) and F (·) are complex conjugates. F (1) represents that the fast Fourier transform of the δ function is always 1. Addition, multiplication, and division operations are all pointwise operations. The advantage of unweighted least-squares filtering is the fast and efficient computation of the vector solution using FFT and IFFT operators. However, it lacks an edgepreserving smoothing operator, leading to halo artifacts in the filtered image.

Bilateral-Least-Squares Filtering Algorithm
This paper proposes a new method based on the approach presented in [24], called bilateral least-squares filtering (BLF-LS), for smooth images. This method first uses bilateral filtering to smooth the image gradient and then embeds the smoothed image gradient into a least-squares framework, effectively smoothing the image while better preserving its edges. Specifically, the smoothing framework is implemented as follows: where, when λ is sufficiently large, the gradient of the output u s , i.e., ∇u * , will approach f BLF (∇g * ), * ∈ {x, y}.
where f BLF (∇g * ), * ∈ {x, y} represents smoothing of the gradient of the input image g s in the x and y axes directions using bilateral filtering, and F BLF−LS (·) represents smoothing of the input image using BLF-LS filtering.
Combining bilateral filtering and least-squares filtering can achieve complementary effects, because bilateral filtering can preserve the structure of image information well but may lose a lot of shadow distribution, while least-squares filtering can preserve the shadow distribution of reference information well but may lose the edge structure and detail information of image information. Therefore, combining the two filtering methods can better preserve the edge structure and detail information of the image while preserving the image structure information and shadow distribution, thus achieving a better image filtering effect.

Dual Bilateral-Least-Squares Hybrid Filtering Model
The bilateral-least-squares hybrid filtering model is a filtering algorithm used for image denoising. This algorithm combines the advantages of bilateral filtering and leastsquares filtering, and can effectively remove noise from images while preserving the details and edge information. In this way, the dual bilateral-least-squares hybrid filtering (DBLSF) model can simultaneously meet the requirements of image smoothing and denoising and can be applied to many image processing tasks, such as digital image processing, computer vision, and robot vision.

Image-Fusion Model
Given the input visible image I V and infrared image I I , the filtering models in this paper are employed to decompose the visible and infrared input images into the base layer B * and detail layer D * , * ∈ {V, I}. The fusion weighting strategy in the proposed model is chosen based on the different contrast and detail features of the decomposed base layer and detail layer. The base layer is deeply extracted with a residual network (ResNet50) to obtain base layer features, and the corresponding weight map is calculated by local L1-norm and average operation. The detail layer adopts an adaptive weighting strategy based on structural tensor for fusion. The fusion algorithm presented in this study follows a flowchart, as depicted in Figure 1, which involves combining the fused base layer and detail layer to obtain the fused image F. Two combinations are defined based on the combination of the detail layer and the base layer, namely Combination 1 (C1), which includes only detail layer 1 and the base layer, and Combination 2 (C2), which includes detail layer 1, detail layer 2, and the base layer. by local L1-norm and average operation. The detail layer adopts an adaptive weigh strategy based on structural tensor for fusion. The fusion algorithm presented in this s follows a flowchart, as depicted in Figure 1, which involves combining the fused layer and detail layer to obtain the fused image F. Two combinations are defined base the combination of the detail layer and the base layer, namely Combination 1 (C1), w includes only detail layer 1 and the base layer, and Combination 2 (C2), which incl detail layer 1, detail layer 2, and the base layer.  The base layer B  is obtained by filtering the input image with BLF-LS. The d layer D  is obtained by subtracting the basic layer B  from the input image I  calculation formula is shown below.

Fusion Rules
The basic layer contains low-frequency content, including a large amount of information about the image, and represents the overall appearance of the imag smooth areas. Therefore, effectively extracting the features of the basic layer and pre ing a large amount of information from the source image can improve the similarit tween the fusion image and the source image. In this paper, DBLSF is used to decom the source image into a basic layer and a detail layer, as shown in Figure 2. The base layer B * is obtained by filtering the input image with BLF-LS. The detail layer D * is obtained by subtracting the basic layer B * from the input image I * . The calculation formula is shown below.

Fusion Rules
The basic layer contains low-frequency content, including a large amount of basic information about the image, and represents the overall appearance of the image in smooth areas. Therefore, effectively extracting the features of the basic layer and preserving a large amount of information from the source image can improve the similarity between the fusion image and the source image. In this paper, DBLSF is used to decompose the source image into a basic layer and a detail layer, as shown in Figure 2. To obtain more detailed information, this paper enhances the source image and obtains the base layer through (12).
where  is an adjustable parameter, which is used to amplify the detail layer and added to the source image to obtain the enhanced image. Finally, the base layer of the source image is enhanced using (13): In (12), when  is 0, it represents the original basic layer. Setting different values of  to enhance the contrast of the base layer is necessary to ensure its correlation and visual perception with the original basic layer. The enhanced basic layer was compared with the original basic layer through objective evaluation metrics such as average gradient (AG), structural similarity (SSIM), visual information fidelity (VIF), and correlation coefficient (CC). The decomposed images of infrared and visible obtained are shown in Figures 3 and 4, respectively, and corresponding objective evaluation results with varying To obtain more detailed information, this paper enhances the source image and obtains the base layer through (12).
where α is an adjustable parameter, which is used to amplify the detail layer and added to the source image to obtain the enhanced image. Finally, the base layer of the source image is enhanced using (13): In (12), when α is 0, it represents the original basic layer. Setting different values of α to enhance the contrast of the base layer is necessary to ensure its correlation and visual perception with the original basic layer. The enhanced basic layer was compared with the original basic layer through objective evaluation metrics such as average gradient (AG),  Figures 3 and 4, respectively, and corresponding objective evaluation results with varying parameters are shown in Table 1.
Finally, the base layer of the source image is enhanced using (13): In (12), when  is 0, it represents the original basic layer. Setting different values of  to enhance the contrast of the base layer is necessary to ensure its correlation and visual perception with the original basic layer. The enhanced basic layer was compared with the original basic layer through objective evaluation metrics such as average gradient (AG), structural similarity (SSIM), visual information fidelity (VIF), and correlation coefficient (CC). The decomposed images of infrared and visible obtained are shown in Figures 3 and 4, respectively, and corresponding objective evaluation results with varying parameters are shown in Table 1.
In (12), when  is 0, it represents the original basic layer. Setting different values of  to enhance the contrast of the base layer is necessary to ensure its correlation and visual perception with the original basic layer. The enhanced basic layer was compared with the original basic layer through objective evaluation metrics such as average gradient (AG), structural similarity (SSIM), visual information fidelity (VIF), and correlation coefficient (CC). The decomposed images of infrared and visible obtained are shown in Figures 3 and 4, respectively, and corresponding objective evaluation results with varying parameters are shown in Table 1.   Based on the results in Table 1, it was found that, when α > 1, the smoothed basic layer showed a significant increase in AG, but noise severely affected the visual perception, and the correlation coefficient between the original basic layer and the enhanced basic layer was reduced. Therefore, a value of α = 1 was chosen, which provided a visually pleasing and highly correlated basic layer, and effectively improved the overall brightness and contrast of the image.

ResNet50-Based Fusion
The enhanced base layer B contains important image details, as well as the primary contrast and brightness information. To better preserve the background features of the source image and improve the similarity between the fused image and the source image, ResNet50 trained on ImageNet was used to obtain deep features. Then, a multi-layer fusion strategy was employed to obtain a weight map. Finally, the weight map and the base layer were fused. The process of base layer fusion is illustrated in Figure 5.
The fusion strategy using ResNet50 for extracting deep features is described in detail below. ResNet50 consists of 5 convolutional blocks and the output feature maps can be represented as φ i * , i ∈ {1, 2, 3, 4, 5}.
where Φ i (·) represents the process of extracting the feature map by the i-th convolutional block in ResNet50. The more convolutional blocks the input image passes through, the more abstract image features are extracted, generating deeper and higher-semantic feature maps. Therefore, in this study, we chose the deep feature maps φ i * outputted from the i = 3, 4, 5 convolutional blocks. Among them, the feature maps φ 3 * and φ 4 * with i = 3, 4 are saved during the extraction of deep feature maps φ 5 * with i = 5, rather than using different networks for extraction (as in Li et al. [19]), thus saving time.
basic layer was reduced. Therefore, a value of  = 1 was chosen, which provided a visually pleasing and highly correlated basic layer, and effectively improved the overall brightness and contrast of the image.

ResNet50-Based Fusion
The enhanced base layer B contains important image details, as well as the primary contrast and brightness information. To better preserve the background features of the source image and improve the similarity between the fused image and the source image, ResNet50 trained on ImageNet was used to obtain deep features. Then, a multi-layer fusion strategy was employed to obtain a weight map. Finally, the weight map and the base layer were fused. The process of base layer fusion is illustrated in Figure 5. The fusion strategy using ResNet50 for extracting deep features is described in detail below. ResNet50 consists of 5 convolutional blocks and the output feature maps can be represented as The workflow diagram for generating deep feature maps using ResNet50 is shown in Figure 6 and the corresponding time consumption is listed in Table 2, with the optimal The workflow diagram for generating deep feature maps using ResNet50 is shown in Figure 6 and the corresponding time consumption is listed in Table 2, with the optimal values highlighted in bold. The initial weightsφ i * for calculating the feature mapφ i * using local L1-norm and averaging are computed using the following formula: where · 1 represents the L1-norm, (p, q) represents the coordinates within the region, and t = 2 is used in this paper.
Electronics 2023, 12, x FOR PEER REVIEW 9 values highlighted in bold. The initial weights ˆi   for calculating the feature map using local L1-norm and averaging are computed using the following formula:  represents the L1-norm, (p, q) represents the coordinates within the reg and t = 2 is used in this paper.  . Therefore, it is necessary to perform bicubic inte lation on the initial weight w to adjust it to the size of the input base layer and then ca late the final weight map. The calculation formula is as follows:   Due to the use of residual networks in extracting deep features, the number of channels M in the feature map varies with the number of convolutional blocks i, with the following relationship: M = 64 × 2 i−1 . Therefore, it is necessary to perform bicubic interpolation on the initial weight w to adjust it to the size of the input base layer and then calculate the final weight map. The calculation formula is as follows: The fusion of the input base layer B F with the weight mapφ i * , (i ∈ {3, 4, 5} and * ∈ {V, I}) obtained is calculated as shown in (18): Finally, the maximum modulus method is used to fuse the base layer. The maximum value of the base layer is calculated by (19).

Structure Tensor-Based Fusion
The gradient values of an image are closely related to its visual effects and, the larger the gradient values of the image, the more obvious its fine texture and edge features [29]. Using a structure tensor-based adaptive weighting strategy [30,31] can better preserve the spatial information and detail layer features of the image. As shown in Figure 2, the decomposed detail layer contains high-frequency content, mainly including edge, contour, and sharp detail information. To construct the structure tensor gradient S * , * ∈ {V, I} of the input detail layer D * , the partial derivatives D * ,x and D * ,y along the x and y axis are computed. The specific calculation formulas are as follows: The matrix S V and S I are both symmetric positive definite matrices, so they can be decomposed as: where λ * 1 and λ * 2 are two non-negative eigenvalues of matrix S * , and their corresponding eigenvectors are η * 1 and η * 2 , respectively. Assuming that the larger eigenvalue is λ * 1 and the smaller one is λ * 2 , a larger λ * 1 value indicates stronger edge intensity at that pixel in the image. Meanwhile, in terms of the Frobenius norm, the fused image should preserve the properties of the original images and its S * should be closest to S * , that is, S * = QΛ Q T , where Λ is the following matrix: According to the eigenvector η * 1 corresponding to the larger eigenvalue λ * 1 , it represents the gradient direction of the pixel point, which has the most obvious gray level changes and stronger edge intensity, indicating richer spatial information. Therefore, the weight matrices W V and W I for the detail layers of the visible and infrared images can be, respectively, calculated using the proportion of the larger eigenvalue λ V1 and λ I1 . Finally, the fused detail layer can be calculated using the weighted method, which is expressed as where D F is the fused detail layer, D V and D I are the detail layers of the visible and infrared images, respectively, and .* represents the dot product, which means multiplying the numerical values at the same coordinate positions of two image matrices the same size.

Experiment Results and Analysis
The experimental simulation platform utilized in this study consists of a notebook computer with an Intel(R) Core(TM) i7-8750H processor operating at 2.20 GHz and with 8.00 GB of RAM. The programming environment employed is MATLAB R2021b and the operating system of the computer is 64-bit Windows 10.

Algorithm Comparison and Parameter Settings
The effectiveness of the proposed fusion algorithm was evaluated through experiments conducted on infrared and visible images from the TNO image-fusion dataset [32]. To compare the performance of the proposed method, Combination 1 (C1) and Combination 2 (C2) with 13 existing infrared and visible image fusion algorithms, including MDLatLRR [33], MGF [34], and multiscale transform enhancement (MST-TE) [35], were tested. The experimental parameters for these 14 algorithms were set based on their original papers. In the proposed algorithm of this article, small values of σ s and σ r may lead to gradient reversal, while larger values of σ s and σ r enhance the smoothing ability, but may generate halos. In this article, we set σ s1 = 2, σ r1 = 0.04, σ s2 = 2, σ r2 = 3σ r1 = 0.12 to preserve the regions with large intensity differences and to better retain the edge information of the images, thus improving the smoothing ability of the algorithm.

Fusion Results
As there are too many fusion image results of the infrared and visible light, we cannot display all of them. Therefore, this paper selects three sets of the TNO image-fusion dataset (the 14th image of Nato_camp_sequence, soldier-behind_smoke_3, and Kaptein_1123) to show the output results. Figure 7 illustrates the three sets of infrared and visible images, and Figures 8-10 demonstrate the resulting output for each set. Red rectangular boxes are used to help distinguish the quality of the fused images.  (26) where F D is the fused detail layer, V D and I D are the detail layers of the visible and infrared images, respectively, and .* represents the dot product, which means multiplying the numerical values at the same coordinate positions of two image matrices the same size.

Experiment Results and Analysis
The experimental simulation platform utilized in this study consists of a notebook computer with an Intel(R) Core(TM) i7-8750H processor operating at 2.20 GHz and with 8.00 GB of RAM. The programming environment employed is MATLAB R2021b and the operating system of the computer is 64-bit Windows 10.

Algorithm Comparison and Parameter Settings
The effectiveness of the proposed fusion algorithm was evaluated through experiments conducted on infrared and visible images from the TNO image-fusion dataset [32]. To compare the performance of the proposed method, Combination 1 (C1) and Combination 2 (C2) with 13 existing infrared and visible image fusion algorithms, including MDLatLRR [33], MGF [34], and multiscale transform enhancement (MST-TE) [35], were tested. The experimental parameters for these 14 algorithms were set based on their original papers. In the proposed algorithm of this article, small values of s  and r  may lead to gradient reversal, while larger values of s  and r  enhance the smoothing ability, but may generate halos. In this article, we set

Fusion Results
As there are too many fusion image results of the infrared and visible light, we cannot display all of them. Therefore, this paper selects three sets of the TNO image-fusion dataset (the 14th image of Nato_camp_sequence, soldier-behind_smoke_3, and Kaptein_1123) to show the output results. Figure 7 illustrates the three sets of infrared and visible images, and Figures 8-10 demonstrate the resulting output for each set. Red rectangular boxes are used to help distinguish the quality of the fused images.

Subjective Evaluation
For the first set of experiments depicted in Figure 8, all 14 methods demonstrated the ability to effectively combine the details of the infrared images with the context information of the visible images. However, images (a) and (e) contain more noise, resulting in a visually blurry image and relatively poor visual effects. Image (b) has overall brightness that is too high, losing important information in the visible image and containing more noise. In the case of images (c), (f), and (i), the comparison between the objects and the background is relatively low. In particular, image (f) shows a lack of texture near the figure, which hinders the observation of the fence. Image (d) has an obvious block effect and the detail part of image (g) has good performance, but the wavelet transform produces a residual shadow, and image (l) has partial distortion due to the convolution process. In images (h) and (j), the significant features of grass, trees, and fences are not obvious, and the background information is severely lost. When compared to images (k) and (m), image (n) effectively retains the grass information from both the infrared and visible images and accurately represents the source image's characteristic features. Meanwhile, image (o) enhances the details of the fence and grass, resulting in higher overall contrast and sharper edges.

Subjective Evaluation
For the first set of experiments depicted in Figure 8, all 14 methods demonstrated the ability to effectively combine the details of the infrared images with the context information of the visible images. However, images (a) and (e) contain more noise, resulting in a visually blurry image and relatively poor visual effects. Image (b) has overall brightness that is too high, losing important information in the visible image and containing more noise. In the case of images (c), (f), and (i), the comparison between the objects and the background is relatively low. In particular, image (f) shows a lack of texture near the figure, which hinders the observation of the fence. Image (d) has an obvious block effect and the detail part of image (g) has good performance, but the wavelet transform produces a residual shadow, and image (l) has partial distortion due to the convolution process. In images (h) and (j), the significant features of grass, trees, and fences are not obvious, and the background information is severely lost. When compared to images (k) and (m), image (n) effectively retains the grass information from both the infrared and visible images and accurately represents the source image's characteristic features. Meanwhile, image (o) enhances the details of the fence and grass, resulting in higher overall contrast and sharper edges.
In the second set of experimental results, Figure 9, images (a), (g), and (j) contain a large number of artifacts, which make the task target unclear and not conducive to human recognition. When images (b) and (l) retain relatively complex detail, they fail to highlight distinctive traits of the image, such as the loss of detail in the tree above the person on the right side of the image. Images (c), (d), and (k) do not produce halos but lack some details. Images (e), (f), and (m) contain more noise and severely lose their texture features in dense smoke areas. Image (h) lacks visible image texture information due to the minimal incorporation of visible image features. In contrast, image (i) is better at identifying the target but shows image discontinuity. The results obtained using Combination 1 of the proposed fusion algorithm, represented by image (n), highlight the target clearly, contain fewer artifacts, and do not exhibit double images or visual blurs. Meanwhile, image (o) successfully retains the overall message of the original image and the salient features of the objects are distinctly visible. In the second set of experimental results, Figure 9, images (a), (g), and (j) contain a large number of artifacts, which make the task target unclear and not conducive to human recognition. When images (b) and (l) retain relatively complex detail, they fail to highlight distinctive traits of the image, such as the loss of detail in the tree above the person on the right side of the image. Images (c), (d), and (k) do not produce halos but lack some details. Images (e), (f), and (m) contain more noise and severely lose their texture features in dense smoke areas. Image (h) lacks visible image texture information due to the minimal incorporation of visible image features. In contrast, image (i) is better at identifying the target but shows image discontinuity. The results obtained using Combination 1 of the proposed fusion algorithm, represented by image (n), highlight the target clearly, contain fewer artifacts, and do not exhibit double images or visual blurs. Meanwhile, image (o) successfully retains the overall message of the original image and the salient features of the objects are distinctly visible.
The results of the third set of experiments in Figure 10 show that images (a), (g), and (j) contain a significant amount of artifacts and noise in the background, resulting in poor visual perception. The overall brightness of images (d) and (e) is poor, and the contrast is not high. In image (l), the target person stands out prominently but the grass details are lost. Image (c) and image (i) have artifacts around the target person. Image (f) has low contrast and unclear edge information. Images

Objective Evaluation
The subjective evaluation of image quality is predicated on the subjective perception of the observer, which may be affected by differences in individual visual sensitivity and may lead to biased conclusions. Therefore, this paper combined quantitative indicators to comprehensively evaluate the quality of fusion images, whereas, as we all know, a single evaluation index cannot reflect the quality of the fused image well in quantitative evaluation.
To evaluate the fusion performance of various fusion technologies more objectively, this paper selects nine commonly used fusion evaluation indicators, namely peak signal-to-noise ratio (PSNR), structural similarity (SSIM), feature similarity (FSIM), multiscale structural similarity (MS_SSIM), correlation coefficient (CC), root mean square error (RMSE), average gradient (AG), edge intensity (EI), and spatial frequency (SF). Among them, the first six indicators are used to evaluate the fusion quality of algorithm Combination 1 in this paper and the last three indicators are used to evaluate the fusion quality of algorithm Combination 2, mainly to reflect the contrast, clarity, and edge preservation of the fused image. With the exception of the RMSE index, the fusion performance improves with higher values of the other eight indicators. The stacked bar chart used for evaluating the fusion effect of three sets of images is shown in Figure 11, and Table 3 presents the objective evaluation results for 32 sets of image fusion using both infrared and visible light. The evaluation metric's optimal value is indicated in bold within the table.    The evaluation presented in Figure 11 demonstrates that the algorithm proposed in this paper outperforms Combination 1 and other algorithms in terms of RMSE and SSIM, indicating a reduction in image noise and better preservation of structural similarity. However, the low brightness of the third set of visible images makes it difficult to evaluate FSIM and MS_SSIM. Additionally, the feature similarity between Combination 1's low brightness image and the source image is limited. The evaluation of AG, EI, and SF in the last row suggests that Combination 2 achieves optimal edge detail and contrast across various environments.
In Table 3, it can be seen that the algorithm proposed in this paper effectively preserves a large amount of information from the source image and improves the similarity with the source image by utilizing ResNet50 to fuse the base layers. Additionally, by adopting an adaptive weighting strategy based on the structural tensor model to process the multi-layer detail layers of the enhanced image after secondary decomposition, the algorithm can effectively fuse the texture features of the image details, highlighting its advantage in overall brightness and edge information in Combination 2. The following is a detailed analysis of the two combined effects of the proposed algorithm: (1) Combination 1 of this paper's algorithm does not perform well in the AG, EI, and SF metrics, but it is the best in six metrics, including SSIM, FSIM, MS_SSIM, RMSE, PSNR, and CC, improving by 2.71%, 1.86%, 0.09%, 0.46%, 0.24%, and 0.07% compared with the other algorithms, respectively. This indicates that the fusion image of Combination 1 contains more effective information, has higher similarity to the characteristics of the source image, and contains less artificial noise in the fusion image. (2) The fusion image of Combination 2 improved by 37.42%, 26.40%, and 26.60% in the AG, EI, and SF metrics, respectively. The experimental results show that the fusion image of Combination 2 significantly improved in terms of contrast, preserving image edges and textures. In summary, this paper's algorithm provides two combination methods that can be used to meet different needs in the engineering application of infrared and visible light image fusion.
Finally, two combinations of the algorithms in this paper were compared with 13 other algorithms in terms of time efficiency and the results are shown in Table 4. Since the algorithm proposed in this paper combines four modules, bilateral filtering, the least-squares model, the ResNet50 network, and the structure tensor model, it can better preserve high-quality images and adapt to the requirements of both scenes. The proposed algorithm is definitely more time efficient than MDLatLRR, NSCT_SR, and CNN. It sacrifices more time and fuses higher-quality images than other algorithms with multiscale transform and simple filter decomposition.

Conclusions
To improve the quality of fusion images and meet the application requirements of different scenes, in this article, we propose a novel fusion algorithm for infrared and visible images based on DBLSF. This article achieves the following objectives: (1) The hybrid filter is utilized in this paper to address the issues of halo artifacts around edges and noise reduction in the fusion results of infrared and visible images by achieving spatial consistency, edge preservation, and texture smoothing. On the other hand, C2 has more pronounced effects in preserving edge information and improving contrast. The AG, EI, and SF indexes increase by 37.42%, 26.40%, and 26.60%, respectively.
In addition, the algorithm described in this paper can also be applied to other imagefusion tasks, such as medical images, multi-focus images, remote sensing images, and multiple exposure images. In future research, we will consider combining more efficient edge smoothing filters with lightweight deep-learning algorithms to further improve the quality and speed of fused images.