Enhanced CycleGAN Network with Adaptive Dark Channel Prior for Unpaired Single-Image Dehazing

Unpaired single-image dehazing has become a challenging research hotspot due to its wide application in modern transportation, remote sensing, and intelligent surveillance, among other applications. Recently, CycleGAN-based approaches have been popularly adopted in single-image dehazing as the foundations of unpaired unsupervised training. However, there are still deficiencies with these approaches, such as obvious artificial recovery traces and the distortion of image processing results. This paper proposes a novel enhanced CycleGAN network with an adaptive dark channel prior for unpaired single-image dehazing. First, a Wave-Vit semantic segmentation model is utilized to achieve the adaption of the dark channel prior (DCP) to accurately recover the transmittance and atmospheric light. Then, the scattering coefficient derived from both physical calculations and random sampling means is utilized to optimize the rehazing process. Bridged by the atmospheric scattering model, the dehazing/rehazing cycle branches are successfully combined to form an enhanced CycleGAN framework. Finally, experiments are conducted on reference/no-reference datasets. The proposed model achieved an SSIM of 94.9% and a PSNR of 26.95 on the SOTS-outdoor dataset and obtained an SSIM of 84.71% and a PSNR of 22.72 on the O-HAZE dataset. The proposed model significantly outperforms typical existing algorithms in both objective quantitative evaluation and subjective visual effect.


Introduction
With the rapid development of digital society, computer vision technology is increasingly being applied in the fields of autonomous driving, remote sensing imaging, and intelligent monitoring. However, the quality of the images acquired by photographic equipment in hazy weather is severely affected, with the target object being obscured and the image losing a lot of detailed information. Furthermore, degraded images are not conducive to subsequent high-level vision tasks. Therefore, a method for processing and clarifying hazy degraded images is highly desired.
At present, single-image dehazing has become a mainstream method for image clarification because it is cost-effective and requires no additional constraint information. Singleimage dehazing methods can be classified into image enhancement, image restoration and learning-based approaches based on their mechanism.
Traditional image enhancement methods for dehazing include Retinex theory [1], histogram equalization [2], and wavelet transforms. These methods adjust the contrast and saturation of the image to achieve dehazing without considering the physical nature of haze formation. However, these global enhancements often cause the loss of some local information and perform poorly when facing hazy images in complex scenes. In recent years, many methods combining image fusion have been widely proposed. Zheng et al. [3] used gamma correction to obtain a sequence of multiple exposed images from a single hazy image, then integrated the best region of saturation using adaptive decomposition to produce a clear image. Similarly, Galdran [4] utilized a multiscale Laplacian transform to fuse artificially exposed images to achieve dehazing. Zhu et al. [5] implemented feature extraction of a single image based on the idea of image space domain transformation; these authors then used a multiscale fusion algorithm based on fast filtering and saturation curve analysis to fuse the transformed image. These image fusion-based methods have improved traditional image enhancement, but the complexity of the algorithms used in these methods is high, and there are certain limitations to the stability of the dehazing they can provide.
Image restoration [6][7][8][9] uses prior knowledge or assumptions to establish a physical model of image degradation to achieve clarity. He et al. [7] discovered the dark channel prior (DCP) and mapped it to the atmospheric scattering model [10], designing an effective haze removal method. Zhu et al. [8] revealed the connection between haze concentration, image brightness, and saturation and developed the color attenuation prior. Berman et al. [9] proposed that haze alters the original tight color clusters in RGB space and forms haze lines through atmospheric light coordinates. Wang et al. [11] estimated the transmittance based on a prior for which there exists a linear relationship between the minimum channel of a hazy image and a clear image; these authors also introduced a weakening strategy combined with a quad-tree method of subdividing additional channels to restore the atmospheric light. Physical-model-based methods have been widely adopted since they are simple and efficient. Unfortunately, this type of algorithm requires high accuracy in parameter estimation and fails in regions that do not satisfy the prior; thus, the defogging results are often accompanied by negative effects such as color distortion and halos.
Recently, learning-based methods have significantly pushed the state of the art of unpaired image dehazing. Cai et al. [12] devised DehazeNet by integrating four different traditional defogging algorithms with deep learning. Zhang et al. [13] set two sub-networks in the pyramid network to obtain the transmittance and the atmospheric light, respectively. Li et al. [14] proposed a light-weight CNN network that they combined with the atmospheric scattering model to achieve dehazing. Chen et al. [15] developed GCANet on the basis of generative adversarial networks and adopted smooth convolution instead of extended convolution to solve the problem of grid artifacts. Similarly, a series of networks [16][17][18][19] were designed to derive clear images directly from the input hazy images without considering the degradation mechanism. Compared to conventional image enhancement and prior-based haze removal models, learning-based methods have achieved great progress. However, most of these methods are trained based on paired data and rely on clear images for positive supervision. This training process of supervision leads to excessive sensitivity to samples and the poor generalization of real-world haze removal. To address this issue, various unsupervised learning methods have been proposed. Li et al. [20] presented an unsupervised, unpaired defogging algorithm based on layer disentanglement, breaking away from training on large-scale datasets. However, this algorithm often produces images with serious color distortion and poor stability during defogging. Zhao et al. [21] proposed a weakly supervised RefineDNet, which combines the dark channel prior with a learning-based method using unpaired data for adversarial learning to improve the quality of the defogged images. Li et al. [22] integrated multi-scale feature representation with an attention mechanism and designed an enhanced decoder to improve the extraction of haze information. Ding et al. [23] unified the haze removal and noise suppression tasks and introduced a region similarity fusion module to obtain the final results. The development of unsupervised defogging algorithms has significantly alleviated the overfitting problem associated with supervised methods, but the defogging results lack realism, and the network structures tend to be complex, requiring higher computational resources.
Known as a powerful tool for unpaired image processing, CycleGAN (cycle generative adversarial network) [24] is characterized by its structure, which enables images to be converted between two domains. Recently, many unpaired CycleGAN-based dehazing approaches have been widely proposed to solve the problem that paired samples are nearly unavailable in the real world. Engin et al. [25] designed a CycleDehaze system that combines a pyramid network for high-resolution images and introduces a cyclic perception loss to improve the dehazing quality. Zheng et al. [26] introduced an enhanced attention mechanism in the CycleGAN framework and applied it to the task of defogging remote sensing images. Most CycleGAN-based dehazing methods ignore the physical properties of the hazy environment; thus, the results lack realism and variability. In order to make progress on this issue, Yang et al. [27] combined CycleGAN with the atmospheric scattering model to recover the scene depth and haze density of images to improve dehazing quality and achieved better results on synthetic datasets; however, their network, with its highcomplexity structure, is still limited regarding the accuracy of estimation for transmittance.
In this paper, we specifically propose a novel unpaired dehazing network termed ADCP-CycleGAN (adaptive DCP combined with CycleGAN). The network consists of two branches that implement the reconstruction of hazy and clear images, respectively. In the dehazing process, we use the scale-adaptive DCP to accurately recover the transmittance and atmospheric light and combine the variable scattering coefficient with depth to achieve a more realistic rehazing process.
The contributions of this paper can be summarized as follows: • A novel unpaired single-image dehazing model is proposed to fuse the dark channel prior and the enhanced CycleGAN. • An adaptive DCP is designed to rely on the Wave-ViT semantic segmentation model, and it can accurately recover the transmittance and atmospheric light. • In the enhanced CycleGAN method, the scattering coefficient β is obtained from two different approaches in order to generate haze of various thicknesses and uneven distributions. β 1 is derived from the atmospheric scattering model, while β 2 is randomly sampled.
The article is organized as follows. Section 2 explains the preliminary knowledge of the atmospheric scattering model and the dark channel prior, as well as the basic structure of the cycle generative adversarial network. Section 3 elaborates the proposed image dehazing method based on CycleGAN with the adaptive dark channel. The experimental results, along with relevant discussions, are illustrated in Section 4. Conclusions and future work are summarized in Section 5.

Atmospheric Scattering Model
To describe the mechanism of haze generation, McCartney et al. [10] proposed an atmospheric scattering model in 1977, where I(x) and J(x) indicate a hazy degraded image and a clear image, respectively. A is the value of the global atmospheric light, and the transmission map, t(x), can be derived from the following relationship: where β is called the scattering coefficient, which can reflect the haze density. d(x) is the depth of field. Based on the atmospheric scattering model, a series of dehazing algorithms using prior knowledge [6][7][8][9] have been proposed. Among them, the most representative one is the dark channel prior [7] discovered by He et al.

Dark Channel Prior
The prior refers to certain pixels with lower intensities in at least one RGB channel as the dark channel, which can be represented as where J c (y) denotes one of the RGB channels of a clear image, and Ω(x) is a patch centered on pixel x. The internal transmittance of Ω(x) can be approximated as a constant provided that the patch scale is sufficiently small. Substituting this into Equation (1), a mathematical derivation gives an estimate of the transmittance: where I c (y) and A c represent the original hazy image and the atmospheric ambient light in one of the RGB components, respectively. The subtraction term in Equation (4) is actually the dark channel intensity of A c . Combined with Equation (1), a clear result can be obtained as follows: in which t 0 is a tiny constant set to prevent the value of the denominator from being zero. The patch size of the crucial parameter Ω(x) has a decisive impact on the defogging result. As shown in Figure 1b-d, an oversized patch (Ω(x) = 30) would invalidate the assumption that "the transmittance in the patch is constant", and the patch tends to cross the edge of the depth of field, leading to the halo effect. Conversely, as shown in Figure 1e-g, if the patch scale is too small (Ω(x) = 3), the intensity of the dark pixels increases; thus, the transmittance obtained from Equation (4) is less than the real value, which may result in oversaturation, distortion, and an overall darkening of the image. Therefore, a single-scale Ω(x) will cause many unexpected negative effects and reduce the image quality. Based on this, a number of algorithms were subsequently proposed to optimize DCP performance. Chen et al. [28] proposed the concept of a "bright channel", as opposed to the dark channel, in order to solve the problem of the misalignment of brightness in dehazing results. Zhu et al. [29] and Jackson et al. [30] introduced the energy minimization theory and Raleigh scattering theory, respectively, to remove artifacts and halos. To some extent, these approaches that introduce external theories act as a correction and complement to the original DCP, while they also undermine the advantages of DCP, i.e., its efficiency and simplicity. From the perspective of parameter adaption, Song et al. [31] compared the defogging effect at different scales in detail and adaptively adjusted the scale range of the dark channel according to the color and edge characteristics of the hazy image. Hu et al. [32] and Guo et al. [33] focused on segmenting the sky region, which does not satisfy the prior, to improve the accuracy of transmittance recovery. Inspired by previous research, we attempt to further subdivide the feature regions of the images and apply more accurate segmentation techniques to improve the quality of parameter adaptation. In Section 3.2, we will elaborate on the detailed optimization method.

CycleGAN
Cycle generative adversarial network was first designed by Zhu et al. [24]. It has a network structure with two generators and dual discriminators by mirror-symmetrizing the traditional GAN. Based on this special network structure, CycleGAN can convert images in the original and target domains without the supervision of paired datasets, a property that makes it widely preferred for unpaired dehazing tasks [25,26,34,35].
As shown in Figure 2, the previous CycleGAN-based dehazing networks contain a rehazing cycle and a dehazing cycle. In essence, most of them simply treat "hazy" and "clear" as two style domains for image transformation, with poor network interpretability and severe traces of artificial recovery. Specifically, the rehazing operation ignores real hazy environments that occur with various thicknesses and uneven distributions in the natural world, resulting in a large gap between the generated hazy images and the actual photographed hazy dataset. This means that the rehazing cycle has little significance for the enhancement of dehazing processing and can even negatively affect the quality of outputs, resulting in issues such as obvious artificial recovery traces and distortion. In order to improve the above issues, we introduce critical physical information to realize the enhancement of the dehazing and rehazing cycle. More details will be illustrated in Section 3.1.

Proposed Method
In this section, we elaborate on an unsupervised unpaired dehazing network termed ADCP-CycleGAN. We adopt adaptive DCP to accurately recover transmittance and atmospheric light for dehazing and achieve rehazing based on the depth and scattering coefficients. The two cycle branches of hazy/clarity reconstruction are connected by the atmospheric scattering model to form enhanced CyleGAN. The algorithm and network structure are detailed as follows.

Network Structure
The network consists of a hazy image reconstruction H-H branch and a clear image reconstruction C-C branch, as shown in Figure 3. H-H Branch. Given a hazy image H real 1 , we first perform Wave-ViT segmentation of the image to obtain the regional feature map. After the DCP operation, a dark channel map is obtained to deduce the transmittance T and atmospheric light A according to Equation (4). Then, the clear image C f ake 1 can be acquired as follows: Based on the clear image, we can restore the depth D, at which time the scattering coefficient β 1 can be recovered to reflect the density of the haze distribution. With the depth and scattering coefficients, we ultimately obtain the reconstructed hazy H f ake 1 . In this branch, the generator G C is the dehazing processor, and D C is the discriminator that identifies whether C f ake 1 pertains to the clean domain.
C-C Branch. We initially derive the depth information from the input clear image C real 2 . The scattering coefficient β 2 is randomly sampled in the range of [0. 5,2]. The corresponding hazy image H f ake 2 is subsequently acquired, and the same dehazing process as in the H-H branch is then adopted to acquire the final reconstructed clear image C f ake 2 ; that is, In this branch, the generator G H produces the haze, and the discriminator D H is used for recognizing if H f ake 2 belongs to the hazy domain.

Adaptive DCP
In Section 2.2, we discussed in detail the drawbacks of the global fixedness of Ω in DCP. In this section, we continue the idea of parameter adaption to make further improvements.
To achieve a more refined segmentation of the feature regions, we here adopt the Wave-ViT model proposed by Yao et al. [36]. This model unites the wavelet transform with the Transformer network. With reversible downsampling for the lossless recovery of object texture details, it shows good performance in semantic segmentation tasks. The image division effect is shown in Figure 4. We determine distinct patch sizes based on the essential properties of different areas in the image to achieve parameter self-adaption. The image can be divided into 3 regions: (a) The Foreground region, which consists of complex objects with rich colors and high saturation. An undersized patch may further aggravate the oversaturation phenomenon, whereas an oversized patch will violate the change of transmittance distribution in this region, causing an obviously distorted visual effect. Therefore, we set the patch scale of the foreground area in a normal interval that varies uniformly with the saturation. Specifically, the patch size of foreground area Ω f ore can be determined based on the saturation S and luminance L as follows: S(x) = 1 − min c∈{r,g,b} Ω f ore = max{5, round(k · min c∈{r,g,b} where the max and round operators are used to set the patch scale as a positive integer. Based on previous research [7,28,[31][32][33] on the defogging quality at different scales, we further conducted a validation experiment on the RESIDE dataset [37]. The results demonstrate that [5,15] is a scale range that enables dark channel defogging to achieve optimal results, and other patch scales below or above this range experience significant negative effects, such as halos, luminance distortion, and oversaturation. Therefore, we take the value of k as 15 to ensure that Ω in the foreground region is adaptive in this range. As we calculate the brightness and saturation of the image in the HSI color space, the saturation value is in the range of [0, 1]. In order to change the patch scale uniformly with the saturation, we construct a linear mapping relationship between [5,15] and [0, 1] so that pixel blocks with different levels of saturation in the foreground area can correspond to the suitable patches. (b) The Sky region has high brightness and low saturation. We choose a larger patch scale in the [25,30] range in this area to intensify the defogging effect. At the same time, partitioning out the region helps us find the atmospheric light values using the method in [7]. Notably, though we set the patch scale in the sky region to be much larger than in the foreground area, the sky area usually does not contain too much detail, the color saturation is more homogeneous, and the composition of the scene is simpler. At this point, the negative effects of large patches can be reduced. (c) The Edge mutation region. We set a smaller patch value in the range of [0, 3] in the depth of field border area to prevent halo effect and to preserve richer detail information.

Acquisition of Scattering Coefficient
In order to simulate the generation of real haze environments, which occur with various thicknesses and uneven distributions in the natural world, we optimize the rehazing process based on the atmospheric scattering model by combining the depth and density.
In the H-H branch, the scattering coefficient β 1 can be recovered according to Equation (2), as shown below: Based on this, the reconstructed hazy H f ake 1 can be described as: Different from the H-H branch, the scattering coefficient β 2 in the C-C branch is randomly sampled in the range of [0.5, 2]. By altering the scattering coefficients, the generator G H can produce hazy environments with arbitrary density distributions, as shown in Figure 5. Correspondingly, the hazy image H f ake 2 is subsequently acquired as follows: It is noteworthy that, based on the atmospheric scattering model, the transmittance T and atmospheric light A derived from G C can be applied in G H to generate haze. Furthermore, these variable foggy images can also be used to augment the training of G C . This mutually reinforcing haze removal/generation process constitutes the enhanced CycleGAN.

Calculation of Losses
GAN losses are incurred during the adversarial game between the generator and the discriminator. In our network, this occurs to ensure the quality of the dehazing and rehazing process. In the H-H branch, the losses of the generator G C and the discriminator D C can be expressed as follows: in which C f ake 1 is a clear image constructed by the generator G c , and C real 1 is sampled from the clear image set Set{C}. In the C-C Branch, correspondingly, H f ake 2 , which is derived from the rehazing generator G H , and H real 2 , which is sampled from the hazy image Set{H}, are adopted to calculate the loss, which can be described as Cycle-consistency losses calculate the consistency between the original and the target domain at both ends of the loop branch. In the H-H branch, the input H real 1 and the reconstructed hazy image H f ake 1 must display sufficient levels of consistency. Likewise, C real 2 should agree with C f ake 2 . Thus, the cycle-consistency losses can be written as Equation (18), where || || 1 denotes the L1 norm.
Cycle-perceptual losses. Although the cycle-consistency losses can be used to remove part of the noise, we also add cycle-perceptual losses to extract richer details and advanced features based on the VGG16 network to further enhance the structural similarity and ensure more realistic visual effects. The perceptual loss can be seen as Equation (19), where ϕ is the feature extractor and || || 2 denotes the L2 norm.
Thus, the total loss function of ADCP-CycleGAN can be derived as: λ 1 , λ 2 , and λ 3 are the weight-balancing factors of the three loss functions.

Experimental Configuration
Datasets. In the experiment, four diverse datasets were adopted. (a) The RESIDE datasets [37] contain large amounts of hazy images synthesized artificially. SOTS-indoor and SOTS-outdoor contain 500 hazy/clear images indoors and outdoors, respectively, while ITS and OTS include 13,990 and 72,135 indoor and outdoor hazy and clear images, respectively. (b) The O-HAZE [38] dataset from the 2018 NTIRE Single Image Defogging Challenge contains 45 pairs of outdoor fogged/clear images with 10 pairs for testing. The images within this dataset are of high resolution and originate from real shots. (c) The BeDDE [39] dataset contains 208 real-world paired fogged/clear images of high quality captured in 23 different Chinese cities. We conducted qualitative comparison experiments on this dataset to evaluate its generalization ability and assess the subjective visual quality of real-world defogging effects. (d) In addition to the validation on the reference dataset, to compare the visual effects, we additionally introduced 30 hazy images captured in real life, as well as Fattal's dataset [40], which contains 31 real hazy images as non-reference samples.
Competitors & Metrics. We compared the proposed method with several state-of-art algorithms, including the most representative prior-based defogging algorithm, DCP [7]; supervised methods, including DehazeNet [12], GCANet [15], and FFANet [19]; and unsupervised methods, including ZID [20], RefineDNet [21], D4 [27], and USID [22]. For persuasive and reliable comparisons, the parameter settings were still implemented according to the content in Refs. [7,12,15,[19][20][21][22]27]. We chose SSIM, PSNR [41], and LPIPS [42] as objective evaluation metrics for the dehazing performance on the reference dataset. For the test on the non-reference dataset, we focused on the evaluation of the visual effect of the dehazed image; thus, the information entropy (IE) and average gradient (AG) were employed to reflect the overall information and the local detail performance of the image, respectively. Moreover, we introduce the N IQE [43] (natural image quality evaluator) metric, which can be expressed as where v 1 , v 2 , ∑ 1 , and ∑ 2 represent the mean MVG value and variance matrices of the natural and distorted image, respectively. N IQE evaluates the test image by extracting features from the natural landscape, and its smaller value means the image is more compatible with human eye perception. Training Settings. In the training phase, we randomly select 6000 images each from ITS and OTS, 380 images each from SOTS-indoor and SOTS-outdoor, and the training set from O-HAZE as input samples. Notably, due to the small sample size and high image resolution in the O-HAZE dataset, we cropped the 35 images into 700 copies to achieve sample expansion. All training images were rescaled to 256 × 256. We set λ 1 , λ 2 , and λ 3 as discussed in Section 3.4 to 0.2, 1, and 0.0001, respectively, to balance the weights of the three loss functions. The learning rate of the Adam optimizer was set to 0.0001, with a batch size of 2; furthermore, β 1 = 0.5, and β 2 = 0.999. We trained our model with an Nvidia GeForce RTX 2080 Ti graphics card and conducted our experiments on PyTorch.

Results on Reference and No-Reference Datasets
Comparison of reference datasets. Table 1 summarizes the average value of SSIM, PSNR, and LPIPS for every dehazing method tested on the SOTS-indoor (120 remaining images that differed from the training set, SOTS-outdoor (120 remaining images that differed from the training set), and O-HAZE datasets (10 test images cropped into 500 copies). On the SOTS-indoor test set, the supervised algorithms FFANet and GCANet demonstrate their strong capabilities and significant advantages. This is due to the fact that the supervised algorithms can sufficiently learn the image features based on paired datasets, thus performing well in simpler indoor scenes. Our proposed method achieves the best results among unsupervised algorithms. In outdoor haze removal, our algorithm performs the best among all nine algorithms on both SOTS-Outdoor and O-HAZE datasets, demonstrating that the proposed method maintains better generalization and high-quality defogging effects even in complex outdoor scenes. Meanwhile, it is worth noting that the supervised methods lose their dominant positions. To some extent, the results reflect the overfitting issues of supervised algorithms and their poor generalization abilities in handling complex scene defogging tasks.
Furthermore, we display visual comparisons in Figure 6. As can be observed, DCP results in an overall low brightness with obvious color distortion in the sky area. This is due to the fact that the prior is not met in the sky region. While ZID can remove haze, it suffers from significant color distortion in the fogged image. In the case of indoor defogging, RefineDNet produces some unpredictable noises in certain localized areas, such as the color block in the upper left corner of (g) and (h). The indoor defogging results of D4 suffer from serious over-brightening in the deep field due to its inaccurate estimation of atmospheric light, which is determined by taking the brightest pixel point as atmospheric light. This estimation method may lead to over-brightening of the image, especially in indoor images with artificial noise, such as light sources and mirrors. On the other hand, FFANet, GCANet, and our supervised algorithm perform better in indoor defogging, with the former two being better at preserving the details of distant indoor objects. The outdoor defogging results, as shown in Figure 6c-f, reveal the overfitting problem of FFANet, as evidenced by the noticeable color halos on the gable roof in rows d and f. The proposed method exhibits better removal of residual haze in distant parts of the image, such as the distant buildings in row f. Overall, our algorithm shows good generalization ability for various types of defogging tasks, achieving thorough defogging and satisfactory subjective visual perception. Additionally, we compared the number of trainable parameters and the running time of our proposed ADCP-CycleGAN with other methods under the same experimental environment and summarized the results in Table 2. Of these methods, the prior-based DCP [7] does not require trainable parameters, and USID [22] outperforms the other algorithms in terms of the number of parameters and running time since it does not rely on calculating physical parameters in the atmospheric scattering model. The proposed method demonstrates acceptable network complexity and defogging efficiency, with fewer parameters and faster running speed compared to other state-of-the-art algorithms.  [12] 0.008 × 10 6 1.6200 GCANet [15] 0.660 × 10 6 0.9275 FFANet [19] 4.964 × 10 6 1.3418 Unsupervised RefineDNet [21] 63.378 × 10 6 0.7053 ZID [20] 48.232 × 10 6 57.3681 D4 [27] 11.707 × 10 6 0.0579 USID [22] 4.022 × 10 6 0.0432 Ours 4.275 × 10 6 0.0656 Comparison on no-reference real datasets. To verify the generalization ability of the network and the realism of the defogging results, we additionally introduced no-reference datasets. The quantitative evaluation results are reported in Table 3. Our method obtains the best scores in all three metrics, which means that the defogged images achieve acceptable results in terms of information content, detail representation, and visual effect. Table 3. Quantitative evaluation of nine algorithms on no-reference datasets. The best score is indicated in Bolded.

Type
Methods

IE↑ AG↑ NIQE↓
In order to demonstrate the defogging effect more clearly, we framed some local details of the image and zoomed in for comparison, as shown in Figure 7. In rows a and c, we framed the text area and zoomed in. Our method successfully preserved more edge details and restored the text information well. For the nature landscape image in row b, our method has effectively removed the residual haze, resulting in a natural color perception of the defogged image. Though GCANet also produces results with less residual haze, there are noticeable distortions in the sky region in rows b and d. FFANet defogging in real-world scenes is not desirable, as there are noticeable haze residues in the results and a large number of artifacts in the sky area of row d. In addition, the hues of USID dehazing results in rows c and d lacking naturalness and realism, resulting in poor visual effects. To summarize, our algorithm consistently shows good defogging performance on real-world no-reference datasets, providing appealing subjective and visually realistic effects. Figure 7. Comparative test of nine algorithms on no-reference datasets. The proposed method shows strong generalization ability and robustness in various real-world defogging tasks, with satisfactory overall picture quality and detailed information performance (a-e).
In addition, we conducted abundant extended experiments on the BeDDE dataset, and the visual comparison of the defogging results is shown in Figures 8 and 9. The satisfactory defogging effects further reveal the strong generalization ability and defogging stability of the proposed algorithm.

Ablation Study
To verify the effectiveness of the different components in ADCP-CycleGAN, we conducted an ablation study on the network. Three additional models were trained and compared with our proposed model on the SOTS dataset as follows: (a) Model A removes the Wave-ViT semantic segmentation and parameter adaptation module, and thus, the transmittance and the atmospheric light are recovered by the original DCP method; (b) the value of the scattering coefficient in Model B is set to a fixed constant; and (c) Model C deletes the cycle perceptual loss.
The dehazing results of the four models are reported in Figure 10. After removing the semantic segmentation module for the parameter adaptation of DCP, the defogging results of Model A show an obvious distortion. For the areas where the prior fails (such as the white floor tiles in row b), the distortion phenomenon appears, and the brightness of the picture in row c is also significantly darker. Model B lacks realism in the subjective visual effect of the defogging result since the scattering coefficient is set to a fixed value. Model C has a degraded performance regarding detail recovery after removing the cycle perceptual loss. The flowers in the far field in row c show an oversaturation of color, indicating that the deletion of cycle perceptual loss has an impact on defogging stability. It is worth noting that in the quantitative analysis (as shown in Table 4), the degradation of Model B compared to ADCP-CycleGAN is different in indoor and outdoor dehazing tasks. This may due to the fact that the haze distribution in indoor scenes is more uniform compared to outdoors; thus, the scattering coefficient may have a more significant impact on the outdoor haze removal. This also confirms that the scattering coefficient is not negligible in the outdoor dehazing. Table 4. Quantitative evaluation of ablation study. The best score is indicated in Bolded.

Conclusions and Future Work
In this paper, we propose ADCP-CycleGAN, a novel enhanced CycleGAN network with adaptive DCP for unpaired single-image dehazing. In the network, we achieve the parameter adaption of DCP through a Wave-ViT semantic segmentation model to recover the transmittance and atmospheric light accurately. We optimize the rehazing process by deriving the scattering coefficient from both physical calculation and random sampling means to simulate the real haze distribution. The atmospheric scattering model is applied to realize the connection between the dehazing and rehazing branch in order to build the enhanced CycleGAN. The extended experiments on both reference/no-reference datasets with diverse evaluation metrics confirm the effectiveness of our method. Specifically, our approach can generate haze that is more consistent with real-world scenarios based on depth and density. This could be particularly meaningful for tasks that require clear vision but lack unpaired datasets, such as remote sensing images, autonomous driving, and intelligent monitoring. Furthermore, we hope that our innovative combination of physical prior models with CycleGAN for dehazing can contribute to future developments in unsupervised learning for low-level vision tasks. However, there are also some aspects of our algorithm that deserve improvement. The accuracy of the depth estimation of the proposed method is affected when there is noise such as strong light and obscuration in the image. Meanwhile, due to the incorporation of a physical model in the proposed method, its inherent limitations may result in the local over-enhancement in a few defogged results. In our future work, we will also investigate the post-processing of the defogged images to further improve the image quality.