Unsupervised Color Retention Network and New Quantization Metric for Blind Motion Deblurring

Unsupervised blind motion deblurring is still a challenging topic due to the inherent ill-posed properties, and lacking of paired data and accurate quality assessment method. Besides, virtually all the current studies suﬀer from large chromatic aberration between the latent and original images, which will directly cause the loss of image details. However, how to model and quantify the chromatic aberration appropriately are diﬃcult issues urgent to be solved. In this paper, we propose a general unsupervised color retention network termed CRNet for blind motion deblurring, which can be easily extended to other tasks suﬀering from chromatic aberration. New concepts of blur oﬀset estimation and adaptive blur correction are introduced, so that more detailed information can be retained to improve the deblurring task. Speciﬁcally, CRNet ﬁrstly learns a mapping from the blurry image to motion oﬀset, rather than directly from the blurry image to latent image as previous work. With obtained motion oﬀset, an adaptive blur correction operation is then performed on the original blurry image to obtain the latent image, thereby retaining the color information of image to the greatest extent. A new pyramid global blur feature perception module is also designed to further retain the color information and extract more blur information. To assess the color retention ability for image deblurring, we present a new chromatic aberration quantization metric termed Color-Sensitive Error (CSE) in line with human perception, which can be applied to both the cases with/without paired data. Extensive experiments demonstrated the eﬀectiveness of our CRNet for the color retention in unsupervised deblurring.


INTRODUCTION
I MAGE deblurring is a process of removing the blurring artifacts and recovering the latent image from the blurred ones, which has been attracting increasing attention in emerging applications.Motion blur is the most common kind of blur in reality.For example, when taking pictures using handheld devices, it is common to obtain motion-blur pictures due to the hand's shaking or rapid movement of the targets in lens.For the motion blur, the relationship between the blurred and sharp images can be formulated as follows: where B, S and N denote the blurred image, sharp image, and noise, respectively.K is a blur kernel.• and + denote the convolution and element-wise addition operations, respectively.When K is known, the related task is called non-blind deblurring that is also a classical deconvolution problem [1], [2], [3], [4], [5], [6], [7], [8].However, K is often unknown in reality, and the related task is called blind deblurring.Traditional blind deblurring methods usually require a time-consuming iterative process [9], [10], [11].With the fast development of deep neural networks (DNN), some end-to-end blur-kernel-free DNN-based deep deblurring models have been proposed for blind deblurring.Besides, deep deblur models can avoid the complicated and timeconsuming learning process, and have better generalization ability to other images.Because of using massive labeled data for training and strong fitting ability of DNN, supervised deep deblur models have obtained better quantitative and visual performance [12], [13], [14], [15], [16], [17].However, obtaining paired data (i.e., blurry image and its sharp image) from specific domains is usually expensive, and the data-driven deep models also cannot be generalized to the blurry images in different domains.As such, unsupervised deblur models offers advantages over fully-supervised ones, due to the fact that no paired data are required in unsupervised cases [18], [19].However, current unsupervised deblur models still cannot handle the natural images well, due to the complex contents and details.Besides, current unsupervised deblur models usually obtain the sharp image in an end-to-end manner, by focusing on judging whether the image is sharp.As a result, it may lead to serious color information loss in training, and cause unfriendly visual experience.In fact, most supervised deblur models also suffer from the chromatic aberration issue of a different degree, although paired data are used for training.Some latent images generated by current unsupervised methods are visualized in Figure 1 as examples, where we see clearly that the chromatic aberration clearly exists.Specifically, when the chromatic aberration is obvious, human eyes can easily identify the difference, but cannot figure out how large the numerical color difference.When the chromatic aberration is not obvious, it will make the human eyes inept in observing the color difference.Since color retention ability is an important factor to evaluate the performance, however how to quantitatively evaluate the color retention ability of unsupervised models is still a challenging issue.It is urgent to explore effective metrics to asses and quantify the chromatic aberration produced in unsupervised tasks.
In this paper, we therefore propose effective strategies to solve the chromatic aberration issue and quantify the chromatic aberration in the case without paired data.The major contributions of this paper are summarized as follows: • Technically, we propose a novel unsupervised color retention network called CRNet for deep blind motion deblurring, which presents novel strategies of blur offset estimation and adaptive blur correction for preserving the color information.CRNet has a strong generation ability, which can well handle both the natural and domain-specific images.To the best of our knowledge, this is the first study on designing specific unsupervised color retention network for image deblurring.The proposed unsupervised color retention strategy is also general, which can be extended to other tasks involving chromatic aberration, e.g., image deraining/dehazing, image generation, and image transformation-based ones.

•
To retain the color information of the original images to the greatest extent, we have a new solution, i.e., learning a mapping from the blurry image to motion offset firstly, and then performing adaptive blur correction on the motion offset to obtain the latent image, thereby preserving the color information and detailed information.A dedicated pyramid global blur feature perception (PGBFP) module is also designed to perceive the non-local information, which can further improve the deblurring performance.

•
To quantify the chromatic aberration and assess the color retention ability of our CRNet, we introduce a new metric called Color-Sensitive Error (CSE) based on the color histograms.The objective evaluation of CSE is more consistent with the human's perception and can reflect the detailed information, which will be more accurate than the widely-used PSNR and SSIM metrics.To the best of our knowledge, this is one of very few studies on designing chromatic aberration assessment method.Note that CSE metric can be used for evaluating the color retention abilities of both unsupervised and supervised methods.

•
Extensive experiments verified the effectiveness of our CRNet for deblurring in terms of overall restoration performance and color retention, which also has great advantages in restoring the image details.This paper is outlined as follows.Firstly, a brief review of related deblurring models is presented in Section 2. In Section 3, we introduce the unsupervised CRNet and the quantization metric CSE.Extensive experiments on both natural and domain-specific image datasets are shown in Section 4. Finally, Section 5 offers the conclusion.

RELATED WORK
We briefly introduce the related image deblurring methods to our proposed CRNet.

Traditional Deblur Methods
Blind motion deblurring aims at restoring a blurry image to a sharp image, with the blur kernel unknown.Traditional methods usually assume that the sharp image or blur kernel meets a certain prior to constrain the solution space of the latent image, e.g., heavy-tailed gradient prior [20], patch prior [21], dark channel prior [22], local maximum gradient prior [23] and multi-scale latent structure prior [24].But these methods have obvious drawbacks.On one hand, processing the blurry images requires a complex and time-consuming iterative learning process.On the other hand, the restored image is either under-deblurred (the restored image is still blurry) or over-deblurred (obvious artifacts existed in the restored image), due to various prior inaccuracies.

GAN-Based Deep Deblur Models
With the rapid development and strong learning ability of deep networks, deep deblur models have obtained better performance than traditional ones [16], [25], such as generative adversarial networks (GAN) [26]-based methods.

GAN-Based Supervised Methods.
As a powerful neural network and due to its great success for image representation, most of existing deep deblur models are designed based on GAN.For example, multi-scale images are used as input [12], and then the up-sampled prediction result of small image and larger-scale image are concatenated as the next-scale input, which can well retain the detail of restored images.Strictly speaking, Deblur-GAN is the first end-to-end GAN-based supervised deep deblur model that uses the WGAN-GP [27] for training [13].DeblurGAN-v2 [14] further introduces the feature pyramid network (FPN) into the deblurring task and adds different backbones (e.g., Inception-ResNet and MobileNet), hence achieving a lightweight of the model and obtaining more promising deblurring performance.

GAN-Based Unsupervised Methods.
Training a supervised deep deblurring network has to involve massive labeled data, which suffers from the tricky and time-consuming labeling issue.As such, some unsupervised deep deblurring methods were proposed recently, such as [18] and DisentDeblur [19].The method proposed in [18] is the first dedicated GAN based unsupervised deblur model, which uses a reblur loss and a multi-scale gradient loss to constrain the solution space and obtains better results on synthetic datasets.DisentDeblur [19] is a disentangled unsupervised deblur model that separates the content attribute and blur attribute in a blurry image, and adds the KL divergence loss to blur encoder to prevent the blur attribute from the process of encoding content attribute.DualGAN [28] and DiscoGAN [29] have similar ideas as the classic CycleGAN [30], but the structures of the generators and discriminators are different.Although DiscoGAN and DualGAN are not specifically designed for the deblurring tasks, if the sharp domain (S-domain) and blurry domain (B-domain) are considered as two domains, deblurring can be performed accordingly.

CRNET: UNSUPERVISED COLOR RETENTION NETWORK FOR BLIND MOTION DEBLURRING
We introduce the idea and the framework of our CRNet in detail, including the chromatic aberration problem, and the corresponding solutions, network architecture and specific constraints for the color retention of our CRNet.

Problem Statement and Motivation
Chromatic aberration is rather common in the real-world applications.In recent decades, researchers have noticed this issue and started to study this topic [31].For example, [32], [33] believe that the human eyes also have chromatic aberration, i.e., when we observe an object, the observed color may be different from the object's actual color.In fact, image processing results in vision tasks more or less have color difference in comparison to the original image, which is more obvious in unsupervised cases without paired data.As for deblurring task, even if the unsupervised deblur models can recover the structural information, they still cannot effectively retain the color information of the original image, due to lack of the constraint of paired data.This will directly lead to chromatic aberration with poor visual effects, which will directly limit the applicability in reality.
Generally, two solution schemes can be applied to handle the chromatic aberration.The first one is adding additional constraint on color information, such as the l 1 or l 2 norm on the color histogram vectors of both original blurry and latent images.But the constraint on the color information of entire image is still too weak, as it cannot ensure the color information of each pixel to be approximately accurate.In other words, it still may be a highly ill-posed constraint and cannot be directly used for training model.
The other one is by retaining the color information of original image as much as possible, which has two benefits: 1) there is no need to design complex and time-consuming constraints on color, which simplifies the model training and improves the training speed; 2) directly correcting the original image in a specific way to obtain the latent image, which can avoid the issue that color information of the image degrades in the training process.As such, in this paper we design an unsupervised color retention strategy along this line, and propose new strategies of blur offset estimation and adaptive blur correction operations.Next, we will elaborate the details.

CRNet Architecture
The architecture of our CRNet is shown in Figure 2. Similar to the current unsupervised models [28], [29], [30], CRNet also contains two branches, i.e., blur branch and deblur branch.The blur branch is used to improve the deblurring performance, although the deblur branch can complete the deblurring task independently.We use A and B to denote unpaired images, and use the subscripts to indicate the  domain in wihch the image is located.For example, A S and B S are unpaired images in sharp domain (S-domain), and A B and B B are unpaired images in blurry domain (Bdomain).We use I to describe an image that does not need to distinguish which domain it belongs to.In general, our CRNet includes three major parts: 1) two blur offset estimation (BOE) sub-networks G BS and G SB for B-domain and S-domain; 2) adaptive blur correction (ABC) scheme denoted by F ; 3) two multi-scale cropping discriminators D B and D S for two domains.Specifically, given a blurry image A B and a sharp image B S , both G BS and G SB respectively generate the corresponding blur offsets and inverse blur offsets.Then, the adaptive blur correction scheme F corrects the blurry image and sharp image into the images in opposite domains.Finally, D B and D S are used to distinguish the real and generated samples of the corresponding domains.Note that we use a multiscale approach for the structural design of discriminators.Traditional "multi-scale" refers to resizing the input image to generate images of different sizes.Then, this set of images is used to compute the adversarial loss.In this case, the discriminator only needs to judge the authenticity of the entire image, while not focusing on the local information.As a result, it will be unreasonable for the deblurring task, since humans only need partial information to determine the authenticity of the entire image.Hence, inspired by the attention mechanism, we propose a variant of the original multi-scale discriminator, called multi-scale cropping discriminator (MCD).To be specific, we aim at reducing the image size by cropping to make the discriminator pay more attention to the local details of the target image.
The blur offset estimation and adaptive blur correction operations are two key stages for color retention in our model.Different from the current unsupervised deblur models that obtain the restored latent image directly from the blurry image, our CRNet adopts a new "blurry image → motion offset → latent image" manner.That is, a mapping is firstly learnt from the blurry image to motion offset, and then according to obtained motion offset, the adaptive blur correction operation is performed to obtain the latent image.The adaptive blur correction operation starts from the original image and performs physical correction on the original resolution without learning, which can make the restored image have no chance to produce large chromatic aberration.At the same time, the blur offset estimation network introduces the self-attention mechanism and builds a new module PGBFP to learn global blur information, thereby further retaining the color information of image.

Blur Offset Estimation (BOE) Network
The BOE network designs a DNN to obtain the blur offset from original image.Since CRNet has two branches, this stage will produce different blur offsets according to different inputs.The structure of the BOE network is shown in Figure 3. Clearly, the entire BOE network contains a dedicated PGBFP module and a blur offset estimation subnetwork.The sub-network will play a core role in the BOE stage.The structure design of PGBFP needs to cooperate with that of the sub-network.PGBFP jointly performs the blur offset estimation by cascading with the sub-networks at different scales.For the convenience of description, we introduce the concept of cascade level.The cascade level ℵ denotes the number of information exchanging times between PGBFP and sub-network.That is, a higher cascade level means a deeper sub-network structure.

Pyramid Global Blur Feature Perception (PGBFP)
Convolutional layer is one of the most important components of DNN.However, due to the limited receptive field, convolutional layer cannot perceive non-local information, which leads to poor non-local performance in some tasks.To perceive the non-local information, SAGAN [34] introduces the attention mechanism the DNN.In this paper, we also introduce PGBFP to learn global blur information.Unlike the self-attention manner of SAGAN, we further refine the selfattention block and embed it into our model.Specifically, after getting the self-attention feature map, we directly feed it into a special down-sampling block containing a 1-by-1 convolutional layer and an average pooling layer to change the dimension and size of the feature maps, so that the multiple self-attention blocks can be well connected.Figure 3 shows the specific details on PGBFP.Through considering the structure of blur offset estimation sub-network and graphic memory limitation, we set the cascade level ℵ to 3 in this study.For the i-th cascade, let r i be the feature map from the sub-network into PGBFP, let m i denote the feature map from the PGBFP into the sub-network, and let s i be the internal feature maps to be fused, then the cascaded operation requires the size consistency between the feature maps generated by the PGBFP and sub-network, i.e., size(r i ) = size(m i ).In addition, size(r i ) = size(s i−1 ) is needed to ensure that r i and s i−1 can be fused appropriately within PGBFP.Technically, the operations within the PGBFP can be described as follows: where

Blur Offset Estimation Sub-network
For the macrostructure design of the sub-network, we follow the classical "encoding, transforming and decoding" structure.Specifically, the sub-network has a pre-processing layer, three down-sampling layers, nine residual blocks [35], three up-sampling layers and a post-processing layer.The pre-processing and post-processing layers change the channel dimension by a 7-by-7 convolution kernel.Meanwhile, the pre-processing layer uses a larger receptive field to perturb the information on original size, which can enhance the learning ability of network.The post-processing layer sets the affine parameters of instance normalization to a learnable state and uses the Tanh function in the latter layer to map the values to [-1,1].The up-sampling and downsampling layers use a 3-by-3 convolution kernel to change the channel dimension and size of the feature maps simultaneously.The number of up-sampling and down-sampling layers correspond to the cascade level.For the residual blocks, we use the same settings, except that the instance normalization is used to replace the batch normalization.
Next, we will introduce how the blur offset estimation sub-network is cascaded with the PGBFP.For the i-th cascade, let d i be the feature map generated after the downsampling layer, let u i be the feature map generated after the up-sampling layer, then d i and r i will be equal, i.e., d i = r i .Besides, the sizes of u i and m i should be equal as well, i.e., size(u i ) = size(m i ).Thus, the process of each cascade in the sub-network can be described as follows: where i = 1, 2, • • • , ℵ, Res(•) denotes the module composed of 9 residual blocks, and U pSampling i (•) denotes the upsampling layer in the i-th cascade.Finally, for the BOE network, given unpaired blurry image A B and sharp image B S , it will produce the normal blur offset f and inverse blur offset f * respectively as follows: To understand how the blur offset works in subsequent adaptive blur correction, we visualize the blur offset of some images with different iteration times for observation in Figure 4. We see that BOE network firstly obtains rough image contour in early stage.As the number of iterations increases, the details of the blur offsets between blurry and sharp images can be gradually learned and becomes clearer.

Adaptive Blur Correction (ABC) Stage
Adaptive blur correction has the greatest contribution to the color retention.At this stage, the model selects a correction scheme for the blur offset and does not need to learn parameters except for the adaptive factor.However, it is precisely because the model basically does not need learn parameters at this stage and only directly corrects the original image at the physical level, so that the network can be prevented from losing useful color information in training.

ABC Calculation
This stage aims at designing a scheme F to correct the original image based on the blur offset generated in the BOE stage.By unifying the normal blur offset f and inverse blur offset f * into f , the entire blur correction process can be performed as follows: where denotes the correction scheme, * denotes the multiplication operation, and ϕ denotes an adaptive factor.
The adaptive factor ϕ makes the blur correction an adaptive operation, which can further enhance the model's color retention ability.The adaptive factor is initialized to 0, and finally a stable value will be learned after training.As can be seen from Eq. ( 6), ϕ is equivalent to the weight of blur offset.The larger the weight, the higher the correction degree of the original image, and the more color information will be lost.Next, we consider two cases: 1) no adaptive factor is introduced (this is equivalent to directly setting the adaptive factor to 1), i.e., the weight of the blur offset is equal to 1.In fact, the weight of the blur offset is high, and a lot of color information will be lost during the blur correction process; 2) introducing an adaptive factor.The involvement of ϕ provides a gradual process, i.e., ϕ is initially set to 0, and the restored image will be the original image, i.e., no color information loss.In the training process, ϕ will gradually increase, which leads to an increase in the weight of the blur offset.From the experimental observation, ϕ can eventually increase to 0.55, which is 45% lower than without introducing ϕ, i.e., the adaptive factor is helpful.

Blur Correction Scheme Selection
The scheme of blur correction is in fact not unique, since it corresponds to different kinds of blur offsets learned in ABC stage.In this paper, we tested three kinds of blur offsets.
(1) Residual.This blur correction scheme is element-wise addition, i.e., the BOE stage will obtain the residual of the same size as the original image, and then an element-wise addition is performed between the original image and the residual to correct the blur; (2) Element-wise weights.This kind of blur correction scheme is element-wise multiplication, i.e., the BOE stage obtains the weights of each pixel in the original image, and then perform an element-wise multiplication to correct the blur; (3) Non-local attention.This kind of blur correction scheme is matrix multiplication, inspired by the attention mechanism.Specifically, a rectangular attention map on each channel is obtained, and then a matrix multiplication is performed to correct the blur.
Figure 5 shows the three kinds of blur offsets with corresponding correction schemes.The element-wise weight and residual blur offset are designed according to the original image scale.This has two advantages: 1) the obtained blur offset is easy to understand, and the specific content information can still be seen from the blur offset; 2) the design of the blur correction scheme is simple, since there is no complicated calculation.Unlike the blur offsets of original scale, non-local attention is more difficult to understand, and the calculation is more complicated and needs additional constraints.By quantitative comparison in our study, the original scale blur offsets obtain better performance.

Loss Function
We first describe the objective function of our CRNet as the weighted sum of several loss functions: where L adv is adversarial loss using to convert the image between two different domains, L cc is cycle-consistency loss using to maintain the content information of the original image, L p is perceptual loss used to alleviate the artifact generated in the deblurring process.L adv , L cc , L p are the weighting factors of the corresponding loss.These losses The entire color histogram in HSV, containing three color histogram component vectors, The subscript can be added to indicate the image.
The primary peak points: (p) x , (p) y . q The auxiliary peak points: (q) x , (q) y Q The set of auxiliary peak points: (q 1 ) x , (q 1 ) y , . . . , (q N −1 ) x , (q N −1 ) y , where N denotes the sum of the number of the primary peak and auxiliary peak(s) t The uniform representation of peak points: The weight of different peaks in all peaks.Superscripts and subscripts indicate a certain component and the type of peak, respectively.
are commonly used in deblurring tasks.Note that for perceptual loss, VGG19 [36] pre-trained on ImageNet [37] is applied in the experiment.

COLOR-SENSITIVE ERROR (CSE): NEW CHRO-MATIC ABERRATION QUANTIZATION METRIC
The proposed unsupervised color retention network cannot be directly used as a training loss, but from another perspective, this idea is suitable for designing new metric to assess and quantify the chromatic aberration.Specifically, we propose a new image quality assessment method, termed Color-Sensitive Error (CSE), to quantify the chromatic aberration between restored image and target image.Note that the target image in our method can be the ground truth for paired data in supervised case, and the original blurring image for unpaired data in unsupervised case.That is, CSE metric can be employed in both supervised and unsupervised cases.Technically, we design the CSE metric based on the color histograms of the two images.For an image, the gray values that account for more will converge into the peaks in the histogram, and it will be easier for human eyes to perceive the faint change of the peaks.As such, CSE is clearly designed around the peaks to conform to the human eye's intuition.Specifically, we describe the entire calculation process of CSE as the weighted average of the errors on the three components h, s and v. On each component, the error includes three parts: 1) peak-area error (PAE) that is used to measure the overall difference between each pair of peaks in the color histograms of the two images.This error is designed from the perspective of peak height and area; 2) peak-offset error (POE) that is used to measure the difference between the primary peaks of the two compared color histograms, and also the difference between auxiliary peaks.This error is designed from the perspective of peak position; 3) component weight is used to further optimize the error.Next, we will elaborate the details on the above three parts.Since the calculation and derivation process of the CSE metric involves some symbols, we firstly describe the used symbols in Table 1 for clear presentation.

Peak-Area Error (PAE)
We first describe the PAE in detail.As we know, HSV color space C = {h, s, v} is closer to the human's perception than RGB space, so we quantify the chromatic aberration in HSV space.Given an RGB color image I ∈ R H×W ×C , we firstly convert it into Ĩ ∈ R H×W ×C in HSV space.For any HSV image, the value ranges of the three components h, s, v are different.For each component c ∈ C , the value width R c can be defined as the following formula: where sup[•] and inf [•] denote the supremum and infimum values respectively, and Z c denotes all possible distributions of the color histogram of c component.For an integer HSV image, the value width of h component is 180, while the value width of s and v components is 256.Prior to computing the color histograms of images, a factor can be set, i.e., histsize, which determines the length of a single partition in the histogram vector.
In Figure 6, we show examples of the color histograms of an image under different histsizes.Clearly, a larger histsize corresponds to a smoother histogram, but it may not be helpful and conducive to the the quantitative metric.Specifically, when histsize is large enough, the difference between the two histograms will become small.More imporatantly, large histsize may merge some important peaks in the histograms, directly making the difference between two histograms insignificant.On the contrary, a small histsize will lead to a sharper histogram that will be more numerically accurate and detailed.However, histsize should not be too small, since in such case it will be difficult to determine the peaks to be calculated.Therefore, we make a trade-off in this paper.According to our experiments, we set histsize to the greatest common divisor of 180 and 256, i.e., 4, to quantify the metric and visualize the color histograms.
Then, we can obtain a set of color histogram vectors V = {V c |c ∈ C} over each component of image Ĩ.Let L c be the number of partitions on each component, i.e., the length of vector V c , it can be calculated by the following formula: In the color histogram, the peaks play a decisive role in human intuition and perceiving the difference.To highlight the importance of peaks and prevent the non-peak elements from participating in the calculation as much as possible, we consider the filterability of the difference to those non-peak elements and make a first-order difference to V c to obtain the difference vector ∆V c by the following formula:   where ∆ k denotes k-th order difference, V c i denotes the i-th element of V c .After filtering out those non-peak elements and obtaining the difference vector ∆V c , the discrete integral can be obtained to indirectly reflect the height and area of all the peaks in the histogram vector.Specifically, we compute the discrete integral σ (∆V c ) of ∆V c according to the difference vector ∆V c as follows: where For ease of understanding, Figure 7 shows two comparison examples of the calculation from V c to σ (∆V c ).
Similarly, given two HSV images Ĩa ∈ R H×W ×C and Ĩb ∈ R H×W ×C , we can obtain the color histogram vectors V c a and V c b , the difference vectors ∆V c a and ∆V c b , and the discrete integral σ (∆V c a ) and σ (∆V c b ) on each component.
Finally, the peak-area error (PAE) is calculated as

Peak-Offset Error (POE)
The calculation process of POE involves the numerical calculation of the peak, so we first give a unified expression of the peaks.Given a color histogram vector V c , we define t c = (t c ) x , (t c ) y as the peak point, where x and y indicate the peak position and peak value.Specifically, (t c ) y is a certain value in V c , while (t c ) x is the corresponding index in V c .In general, for a peak point t c in V c , the following two conditions need to be satisfied: In particular, the peak point can either be the starting point satisfying condition 2 , or the ending point satisfying condition 1 .In fact, we do not need to solve all peak points in the distribution, since the values of some peak points are too small, so that they can be considered as non-peaks.Therefore, in the design of POE, we only select the peak point with the highest value (i.e., primary peak) and some peak points with the next highest values (i.e., auxiliary peak(s)).To facilitate the statement, we use p to denote primary peak, use q to denote auxiliary peak, and use Q to denote the set of auxiliary peaks, i.e., Q = {q 1 , q 2 , ..., q N −1 }.Note that the primary peak is necessary and the auxiliary peaks are not necessary.In other words, Q can be an empty set in some cases.To define the POE metric, the following two steps will be performed: Peak positioning.Given a histogram vector V c , how to find and position the peaks p and q is core process.In this paper, we propose a peak positioning algorithm to obtain the needed peaks on histogram vector V c , as summarized in Algorithm 1. Specifically, we perform the following three operations in order: 1) setting an empty peak set T to store the peaks; 2) sorting the vector V c in descending order to obtain the index vector idx, i.e.,  idx traversal is completed, the peak set T will be returned.Note that the primary peak is necessary, i.e., N should be more than 1.In this paper, we set N to 2 to obtain the primary peak p and one auxiliary peak q.Peak aligning.After obtain the primary peak and auxiliary peaks, it is still possible to encounter a tricky problem, i.e., peak misalignment.While the peak misalignment issue will directly affect the accuracy of the metric.For example, given two vectors V c a and V c b , the peaks p c a , q c a for V c a and the peaks p c b , q c b for V c b can be obtained by peak positioning.For the primary peaks p c a and y is very small, and meanwhile similar arguments for the auxiliary peaks, then the misalignment of peaks will exist, as shown in Figure 8.We can see that the misalignment of peaks exists between p a and p b , and between p b and p s , which will directly result in inaccurate calculation results.Thus, we design a peak aligning algorithm to overcome this issue, as described in Algorithm 2, where the parameter ϑ is set to 0.3 in this paper.
Note that the peak offset is defined as a distance between the x coordinates of the two peaks.Since we consider different kinds of peaks (i.e., primary and auxiliary peaks), both the primary and auxiliary offsets will be calculated.However, the primary peaks are usually more important than the auxiliary peaks, so we propose to weight the two kinds of offsets according to their y coordinates.That is, given two images Ĩa , Ĩb ∈ R H×W ×C , we can obtain the aligned peaks t c a and t c b , t ∈ (p ∪ Q) based on the color histogram vectors Ṽ c a and Ṽ c b on each component of the HSV color space.Technically, the weights γ c t for the primary and auxiliary peaks are defined as follows: where the superscript and the subscript of γ c t indicate the component and type of peaks, i.e., After that, the peak offsets can be obtained by weighted sum of the primary and auxiliary peaks as follows: Algorithm 1 Peak positioning.Input: V c -An arbitrary vector.N -The number of peak points to be found on this vector.

Output:
T -A set of peak points.
if N = 0 then end if 13: end for 14: return T Finally, the POE metric is formally defined as follows:

Component Weight and CSE Metric
Considering that the contributions of different components to the human's perception is different, we also weight the obtained errors PAE and POE in component dimension.Similar to the weights for peaks, we utilize the value of the peaks to design the weights on components as follows: where max (•) denotes the maximum value operation.The purpose of weighting the components is to highlight the contribution of different components, but when the distribution of a component of the two histograms is flat enough and even without obvious peaks (such as the two distributions of component vectors in Figure 9), it will be difficult to appropriately weight the components.Specifically, in this case, the expected error value of the two histogram vectors should not be very large, since these two vectors looks similar.However, in fact, although the peak of the histogram vector is very inconspicuous, the peakpositioning algorithm still can locate the required peaks.However, such peaks are usually inaccurate and will lead to erroneous results, as shown in Figure 9.As a result, when the histogram vector of the target image is flat enough, we compute the root mean square error between restored image and target image, instead of using the peaks to accumulate the error due to the invalidity of the peaks in those cases.Note that this will bring one extra benefit.That is, only when the error of the remaining components is Algorithm 2 Peak aligning.

Input:
T a = t a1 , t a2 , t a3 , ..., t a N −1 -A set of peaks for vector a, containing a primary peak and one or more auxiliary peaks, where t a1 is primary peak.T b = t b1 , t b2 , t b3 , ..., t b N −1 -A set of peaks for vector b, containing a primary peak and one or more auxiliary peaks, where t b1 is primary peak.ϑ -The threshold to determine whether an alignment operation should be performed.
return ; 6: end if 7: for i ← 1 to N a − 1 do 8: end for 18: end for 19: return T a , T b small enough, the error of the flat component will play a leading role, otherwise the flat component will only occupy a small proportion, which does not conflict with our original intention of using peaks to assess the color difference.In this paper, when the maximum value of a certain component of the histogram is less than 1/4 of the overall maximum value, we will consider this component as flat one.
Then, we utilize R c to perform the normalization, and the CSE on certain component is calculated as follows: Finally, the entire CSE error can be calculated by accumulating the errors over all components as follows: Remarks.PSNR and SSIM are two widely-used quantitative evaluation metrics to assess the image deblurring quality, however they clearly have two shortcomings: 1) the objective evaluation result is not consistent with the human perception, which may be because the detailed Intuitively, the error between them seems small, but in fact the error is large.This error is mainly caused by the distributions of the histogram vectors, i.e., it is too flat to locate the peaks correctly.Without properly considering the weights, these two vectors will produce a large peak-offset error, with the order of magnitude being e 56 .

Blurry Sharp
Fig. 10.Comparison of visual and color histograms of the original blurry image (left) and sharp image (right).It is clear that the blurry image contains most of the color information in image, which can also be concluded from the similar distributions of color histograms.
and color information are not appropriately considered; 2) paired data are needed for calculating the values.In contrast, our CSE metric is based on the statistical principle on the HSV color histogram vectors, so it will be more close the human perception and sensitive to color changes.It is also noteworthy that although the CSE metric is originally proposed to assess the color retention power, the values of it can also be used for evaluating the ability of restoring the details of images to some extent.In addition, CSE metric is a general metric that can be used to evaluate different kinds of deblurring models with/without paired data, including unsupervised, fully-supervised and semi-supervised ones.Specifically, for paired data with sharp image, we calculate the CSE based on the restored and sharp images.10.As such, directly using the blurring image for evaluating the unsupervised methods will be reasonable, which has also been verified by experiments.

EXPERIMENTS
We evaluate CRNet qualitatively and quantitatively, and illustrate the comparison results with several closely related models.A natural image dataset GOPRO [12] and a domainspecific synthetic dataset CelebA [38] are employed for evaluations.We use an Nvidia RTX2080Ti graphics card with 10G memory for our experiments.

Evaluation and Implementation
Evaluation methods.Note that the related models that can be included for comparison are limited, since only a few unsupervised deblur models have been proposed in the field.In this study, three unsupervised methods are compared, i.e., CycleGAN [30], DualGAN [28] and DisentDeblur [19].
Evaluation metrics.For the numerical evaluations, the commonly-used PSNR and SSIM metrics, and our CSE metric are employed.Although we mainly discussed the unsupervised deblurring task without ground truth, we can still compute the CSE using the original blurry image or sharp image.Specifically, we use CSE-1 to quantify the chromatic aberration between the restored and blurry images, and use CSE-2 to quantify the chromatic aberration between the restored image and the sharp image.Implementation details of CRNet.The proposed deblur model is implemented on PyTorch [39].In the training phase, the weighting factors λ adv , λ cc and λ p are respectively set as 1, 20, and 0.1 for CRNet in all experiments.We use the Adam optimizer [40] with β 1 = 0.5 and β 2 = 0.999, the initial learning rate is set to 0.0002, the learning rate remains unchanged in the first 40 epochs and decays linearly to 0 in the last 40 epochs.In the feedforward of the discriminator, we follow [30] to set the sample buffer pool to allow the model to learn more general inter-domain relationships.In the testing phase, the blur offset estimation network G SB and the discriminators D S and D B will be removed.Given a blurry image I B , the testing process is performed as follows:

Experiments on GOPRO
We first evaluate each model for image deblurring on the most commonly used natural image dataset GOPRO.It Since GOPRO is a natural image dataset, it is less visually sensitive than the domain-specific dataset and we did not add any additional details for comparison.Due to graphics card memory limitation, it is difficult to evaluate the performance at the original resolution, so we resize the original image to intermediate size and then crop the image size into 128*128 pixels for training.In the test phase, we resize the original image to 224*128 resolution to maintain the same length-width ratio as the original image.
The numerical evaluation results are described in Table 2, and some visual results are shown in Figure 11, where we use the same settings as GOPRO dataset.We see that: 1) our CRNet surpasses the other state-of-the-art methods in terms of PSNR, SSIM and our CSE metrics in Table 2.However, it should be noted that the performance improvement of the methods in terms of both PSNR and SSIM is usually limited, that is, PSNR and SSIM cannot distinguish the performance difference significantly due to their inabilities to fully consider the detail in images; 2) compared with CRNet in Figure 11, other unsupervised methods cannot effectively preserve the color information of images, which can also be demonstrated by the results of CSE-1 and CSE-2.That is, our CRNet can obtain better deblurring performance, while effectively retaining the color information.Note that the orders of model performance for each method are always consistent between CSE-1 and CSE-2.That is, calculating the assessment results from the original blurry images or the ground-truth sharp images are indeed feasible, which will be important for the cases without paired data.

Experiments on CelebA
We then evaluate each model for deblurring on the domainspecific synthetic dataset CelebA that is a face attribute dataset.CelebA contains 202,599 sharp face images, and the size of each image is 178*218.In our experiment, we use an aligned and cropped version of the CelebA dataset [38].Since CelebA only contains sharp images, we need to generate motion blur for them manually.In this study, we follow the motion blur generation algorithm of [13].For the parameter setting, we set the impulsive shake probability as 0.005 and the max movement length as 20.
In the training phase, we similarly resize the original image to intermediate size and then crop the image size into 128*128 pixels.In the testing phase, we perform the deblurring on the original resolution.
In this study, the method proposed in [29] is not included for comparison, since it performs poorly on the domainspecific dataset.Due to different settings, we cannot directly use the pre-trained model provided by the authors.As such, we used the same settings to retrain the compared models on CelebA for the fair comparison, according to the codes provided by authors.The quantitative deblurring results on CelebA are described in Table 3, and some visual results are shown in Figure 12.We find from Figure 12 that other compared unsupervised methods produce obvious chromatic aberration in the latent images, as can also be seen from the quantitative results of CSE-1 and CSE-2.In contrast, TABLE 4 Results of ablation studies.The upper part of shows the model performance after gradually adding components into the baseline, the middle part shows the model performance after deleting each component, and the lower part shows the comparison results via different blur offset types.

Ablation Study
We conduct ablation experiments to prove the effectiveness of different components and modules in our CRNet, and verify their effects on the performance.Specifically, we verify the effectiveness of four components on dataset GOPRO: 1) dual blur branch; 2) multi-scale cropping discriminator; 3) perceptual loss; 4) PGBFP.
Therefore, a complete ablation study in our study should have 16 items, since each component has two possibilities of adding or removing.In fact, we have carried out all these experiments, but considering the space limitation, we only report the ablation study by gradually adding them, or removing certain component for comparison.The results of ablation study are described in Table 4.We see that adding both dual blur branch and PGBFP can significantly improve the performance of deblurring.We also notice that deleting the dual blur branch greatly degrades the performance.
Note that we have also tried three different kinds of blur offset, i.e., non-local attention, element-wise weight and residual, and the results are described in the lower part of Table 4.We see that both the element-wise weight and residual blur offset are highly competitive, and both of them are more effective than the non-local attention.

CONCLUSION
We have discussed the color retention and quality assessment problems for the unsupervised blind motion deblurring.Technically, a general unsupervised color retention network CRNet and a new chromatic aberration quantization metric CSE are proposed.The deblurring architecture of CR-Net solves the chromatic aberration problem by proposing a new strategy consisted of a blur offset estimation network and an adaptive blur correction stage.The advantages of the new metric CSE are threefold.First, it can be easily used for the supervised/unsupervised settings with/without paired data; Second, the quantization result is more consistent with the human perception than PSNR and SSIM metrics; Third, the performance difference evaluated by the CSE metric is more significant than PSNR and SSIM metrics.Extensive experiments on both natural image and domainspecific datasets show that our CRNet has obtained better performance for deblurring and color retention.In future, we will try to figure out the underlying cause of blurring at the pixel level, and also the difference of distributions between sharp and blurry images, so that the task can be performed better in unsupervised manner.

Fig. 1 .
Fig. 1.Comparison of the chromatic aberration suffered in current unsupervised deblurring model (left), and the color retention ability of our CRNet (right).The chromatic aberration in top example is obvious, while the chromatic aberration in bottom example is not obvious.The values of PSNR, SSIM and our CSE metrics are also described.

Fig. 2 .
Fig. 2. CRNet architecture.The entire architecture consists of two branches that jointly constrain the model, where the upper one is the deblur branch, and the lower one is the blur branch.A S and B S represents unpaired images in sharp domain (S-domain), and A B and B B represents unpaired images in blurry domain (B-domain).For the deblur branch, given a blurry image A B , the blur offset estimation operation generates the blur offset and the adaptive blur correction phase corrects the original image to obtain the latent image A S .Similarly, inverse blur offset estimation and adaptive blur correction operations are performed to obtain the reconstructed image A * B in B-domain, e.g., A B → A S → A * B ≈ A B .For the blur branch, the entire pipeline can be represented as B S → B B → B * S ≈ B S in S-domain.In addition, two multi-scale cropping discriminators are used to distinguish the real and fake samples in both S-domain and B-domain.

Fig. 3 .
Fig. 3. Structure of blur offset estimation (BOE) network.Given an original image, BOE network will generate a blur offset.The entire structure consists of an encoding process, a transformation process and a decoding process.PGBFP participates in the whole process.

Fig. 4 .
Fig. 4. The gradient maps of the blur offsets generated during the training process, where the kind of the selected blur offset is residual.
•) and S(•) denote 1by-1 convolutional operations, A(•) is average pooling, O is the feature map with all zero values, • denotes matrix multiplication, and + denotes element-wise addition.For the PGBFP with a certain cascade level, the function is to obtain the output feature maps m 1 , m 2 , • • • , m ℵ according to the input feature maps r 1 , r 2 , • • • , r ℵ .The process can be simply expressed by the following formula:

Fig. 6 .
Fig. 6.An image and its color histograms with different histsizes in HSV color space.In this paper, we consistently set histsize = 4 to quantify the chromatic aberration and visualize the color histogram in all experiments. c

Fig. 7 .
Fig. 7. Examples of computing the discrete integral according to two different color histogram vectors of the same component.Where the iscrete integral represents the area enclosed by the histogram vector and the 0 axis, as shown in the green area.
1, ..., L; 3) adding zero elements before the start point and after the end point of the vector V c , i.e., V c ← [0, V c 1 , V c 2 , ..., V c L , 0].Then, the vector idx is traversed to determine whether the current value satisfies the condition of becoming a peak.If yes, adding(idx[i], V c [idx[i]]) to T .Finally, if the number of peaks in T reaches the user-specified maximal value N or the vector

Fig. 8 .
Fig. 8. Aligned peaks between Va and Vs, and the non-aligned peaks between Va and V b , V b and Vs.Peak alignment eliminates misalignment by swapping the primary peak and auxiliary peak in the vectors.

Fig. 9 .
Fig.9.Comparison of the color histogram vector of the same component of two images.Intuitively, the error between them seems small, but in fact the error is large.This error is mainly caused by the distributions of the histogram vectors, i.e., it is too flat to locate the peaks correctly.Without properly considering the weights, these two vectors will produce a large peak-offset error, with the order of magnitude being e 56 .
While for the unpaired data without sharp image, we can directly calculate the CSE based on the restored image and original blurry image.Because we have visually and experimentally observed that the original blurry image can still contain most of the color information in image, as shown in Figure

Fig. 11 .
Fig. 11.Visualization results of image deblurring on GOPRO.We show four groups of visualizations.In each group, the top row contains the blurry image, sharp image (i.e., ground truth), and the latent images recovered by different methods.The second row represents the corresponding color histograms.The bottom row contains two parts, where the left part is the local details of the color histogram, and the right part is the chromatic aberration metrics (CSE-1 and CSE-2).The lower CSE-1 and CSE-2, the better of the deblurring result.

Fig. 12 .
Fig. 12. Visualization results of image deblurring on CelebA.In each group, the first row contains the deblurred image comparison, histogram detail comparison and CSE comparison.The second row compares the color histograms of the deblurred images of each method.The lower CSE-1 and CSE-2, the better of the deblurring result.

contains 2 ,
103 pairs of training data and 1,111 pairs of test data.Each pair of data includes a blurry image and a sharp image, and the size of each image is 1280*720 pixels.

TABLE 1
Symbol table, which stores the symbols and the corresponding descriptions used in Section 4.
and subscripts indicate a certain component and image, respectively.

TABLE 2
Comparison of quantitative deblurring results between our CRNet and other state-of-the-art methods on GOPRO.

TABLE 3
Comparison of quantitative deblurring results between our CRNet and other state-of-the-art methods on CelebA.
Zooming in of partial details of the deblurred images on CelebA.For each pair of results, the upper one denotes the blurry image, the latent images restored by different methods and sharp image (ground truth).The lower part zooms in the local detail in the red boxes.