Physical-based optimization for non-physical image dehazing methods

: Images captured under hazy conditions (e.g. fog, air pollution) usually present faded colors and loss of contrast. To improve their visibility, a process called image dehazing can be applied. Some of the most successful image dehazing algorithms are based on image processing methods but do not follow any physical image formation model, which limits their performance. In this paper, we propose a post-processing technique to alleviate this handicap by enforcing the original method to be consistent with a popular physical model for image formation under haze. Our results improve upon those of the original methods qualitatively and according to several metrics, and they have also been validated via psychophysical experiments. These results are particularly striking in terms of avoiding over-saturation and reducing color artifacts, which are the most common shortcomings faced by image dehazing methods.


Introduction
Images captured under adverse weather conditions, such as for or smog, present distorted colors and a loss of contrast, minimizing the quality of the captured image.Different physical models aiming at describing this phenomenon have been proposed, the more widespread being the one by Koschmieder [1] .
Koschmieder's model teaches us that the hazy image I depends on the clear image J (i.e., how the image would look without atmospheric scatter), a transmission map that only depends on the image depth and is therefore equal for the three channels t, and the airlight color A. Mathematically, the model is written as where x is a particular image pixel, and J x,• , I x,• are respectively the 1-by-3 vector of the R,G,B values at pixel x of the clear and the hazy image.Let us note here that Koschmieder's model is a relatively simple model of the atmosphere.It is by no means a complete model, as the optical scattering is extremely complex, due to the wide variability of particle distributions within the atmosphere.This said, even if it relies on physical assumptions that will not always hold, it provides us with a mathematically tractable setting.For this reason, this model is used in almost all the image dehazing literature, and therefore it will be also considered in this paper.
There exists a large number of image dehazing methods based on imposing Eq. (1) as a constraint in the solution.However, there also exists a second type of method that is based on applying image enhancement or image fusion techniques to the original hazy image.This second type of method has been proven effective for removing the haze on images, but it does not include a reliable physical model.In this paper we propose a post-processing procedure for this second type of method.Our goal is to obtain a final result as close as possible to the original algorithm solution, but accomplishing the constraints given by Eq. (1).Our proposed solution can therefore be understood as a bridge linking the two different type of methods.
This paper is an extension of our conference work presented in [2].In particular, we have modified our original formulation to constrain the transmission result by a DCT-basis (effectively, the transmission is modelled to be smoothly varying across the scene/image), and we have performed a much larger number of experiments, where we numerically prove that this new approach outperforms both the original non-physics dehazing methods and our previous work presented in [2].

Related work
Image dehazing has arisen as a prolific topic of research in recent years.This increased interest on research in image dehazing is mostly related to its importance as a pre-processing tool for computer vision methods that need to work in the wild.Some particular examples are surveillance and tracking through CCTV cameras, or self-driving of vehicles and drones.
In this section we will divide the different methods proposed between Physically-based methods and Image processing methods.
Physically-based methods: These methods search for a single transmission t and an airlight vector A. Once these two quantities are found, they obtain the haze-free image J x,• by inverting Eq. ( 1).This said, solving for t and A is an underconstrained problem but can be solved if assumptions are placed on the form of the final solution.Some examples of this type of method are [3], [4], [5], or [6].A special mention should be given to the Dark Channel prior [7] (probably the most used image dehazing method), where the authors assume that the minimum of an image region over the three color channels should be zero.The Dark-channel prior has been largely extended and improved, for example in [8][9][10][11][12][13].Learning-based techniques have also been studied for this case.Some examples of them are [14], [15].Recently, some deep learning techniques have also been used [16], [17].
Image processing approaches: These methods aim to modify the original image to compensate for the visual effect of haze on images.In particular, these methods usually focus on the amount of contrast, saturation or other possible indicators of the presence of haze, and try to compensate for them.For example, [18] proposed to remove contrast loss in hazy images through a linear model of the presence of excessive brightness, based on the ratio between local mean and standard deviation.In [19,20] the authors use a multiscale image fusion approach in which they blend several images derived from the input, such as a white-balanced and a contrast-enhanced version of it.Different approaches based on models of the Human Visual System (HVS), such as Retinex, have also been proposed in [21][22][23][24][25][26].[27] proposed a combination of the last two approaches: a variational formulation based on the HVS is combined with a fusion-based approach.Very recently a dual relation between image dehazing and Retinex has been proven [28].This relation proves that any threshold-free Retinex method applied on inverse intensities performs image dehazing.Finally, machine-learning techniques have also been used for this type of method.For example, a haze density predictor based on natural scene statistics was presented in [29].
There are very few methods focusing on the removal of artifacts for image dehazing.Matlin and Milanfar [30] proposed an iterative regression method that simultaneously performs denoising and dehazing.Li et al. [31] proposed to decompose the original image into high and low frequencies, performing image dehazing only in the low frequencies, thus avoiding blocking artifacts.Chen [32] applied both a smoothing filter for the refinement of the transmission and an energy minimization to avoid the appearance of gradients that were not presented in the original image.

Coupled iterative minimization for image dehazing
In this paper we focus on the post-processing of dehazing methods that do not enforce a physical model, i.e. mostly those listed as image processing approaches in the previous section.Our goal is that, given an original hazy image I and the solution of a dehazing method that does not fulfil a physical model J np , we obtain a new dehazing result J our that: • Accomplishes the constraint given by Eq. ( 1) • Is as close as possible to the initial solution J np .
The most straightforward approach to accomplish both these requirements is to minimize the error in Eq. ( 1) when the result of the image processing method J np is considered.As an aid to our derivations below, we will represent colour and scalar images as respectively N-by-3 and N-by-1 matrices (where N denotes the total number of pixels in the image).Mathematically, we can write this minimization in matrix form as where 1 is an N-by-1 vector that has a value of 1 in every entry, t * is an N-by-1 vector that represents the transmission, A * is a 1-by-3 vector that provides us with the airlight, I, J np are N-by-3 matrices representing the input image and the non-physical dehazing solution, N is the number of pixels, and T * is a N-by-N matrix that has zeros everywhere except in the diagonal, where it has the values of t * .Intuitively, it is easy to see that we need to perform the minimization of Eq. ( 2) iteratively in two different dimensions.In particular, when looking for t our we need to perform the minimization for each pixel x of the image over the three color channels, while when looking for A our we need to perform the minimization for each color channel c over all the pixels.
In the next paragraphs we explain how we perform each of these two minimizations.Minimizing for t our : Let us start supposing that we have an original value for A our .This is a standard case in many image dehazing works, where it is usually supposed either Let us denote as Λ the N-by-3 matrix obtained by the replication of A our for the N image pixels.Then, our minimization for the transmission can be rewritten as Note (I − Λ) is an N-by-3 matrix as is (J np − Λ) so the single solution of Eq. ( 3) involves scaling the rows of (J np − Λ) to match (I − Λ).This minimization has the same structure of the one considered for the Alternative Least Squares method [33], and can be therefore constrained by the use of some basis function.Therefore, we impose a further constraint for t our , specifically that the per pixel multiplication implied by T should be smooth.We implement smoothness by enforcing T to be represented as a linear combination of the first few terms in a DCT expansion.The new smooth adjustment, that we call T DCT , is calculated in 3 steps.First we map the (I − Λ) and (J np − Λ) to images (with P × Q pixels) (I − Λ)(x, y) and (J np − Λ)(x, y) (underscoring remarks that these are RGB images, each pixel has 3 numbers) and (x, y) indexes the pixel location.Now we find the image that minimizes where G k (•) represents the kth DCT basis image.Finally, we map the recovered image back to the diagonal matrix representation: T(x, y) → T DCT .
The computation of the weight vector α = {α 1 , . . ., α K } in Eq. ( 4) is obtained as follows.Let (J np − Λ) j denote the jth color channel of the image stretched out as a vector, and let G k denote the kth basis image stretched out as a vector.Then, for each of the three color channels we calculate K vectors as the following pixel-wise products: With those vectors, we form a 3N × K matrix H -where N is the number of pixels-as Similarly, we create a 3N × 1 vector u as Finally, the weight vector α is obtained as follows where + denotes the pseudo-inverse.
Minimizing for A our : Let us now focus on the minimization of A our given a value for t our .In this case, let us denote as T our the N-by-N matrix that has zeros everywhere except in the diagonal, where it has the values of t our .In this way, the minimization can be rewritten as For performing this last minimization we individually minimize the error for each color channel.
Performing the iterative minimization: The previous minimizations are finally combined in an iterative manner.This means the value found for t our in an iteration (it) is used for obtaining A our at the same iteration, and this latter value is used in the following iteration (it + 1) for obtaining the new value of t our .
Once the method is run for the desired iterations or the desired stopping criteria, our final result is computed as where x is a particular image pixel, and J our x,• , I or x,• are the 1-by-3 vectors of the R,G,B values at pixel x.
A pseudocode for our method can be found in Algorithm 1.

Algorithm 1 Our algorithm • Input:
The original hazy image I or and the output of a non-physical dehazing method J np .
3. Update it.it = it + 1 • Until a number of iterations is performed or the difference between two consecutive iterations is smaller than a predefined tolerance value.

Experiments and results
We have performed different experiments to address the performance of our approach.First, we start by studying how our iterative minimization for t our and A our affects the output image J our .Then, we show some qualitative results where our method clearly outperforms the original dehazing method.Later, we show how our method improves the original dehazing ones quantitatively, both considering reference-based and non-reference image metrics.At the end of the section we also validate our method through a psychophysical experiment where observers were asked to select their preferred image.In all this section, we will compare our method against the following original dehazing algorithms: the EVID method [21], the FVID method [27], the Choi et al. method [29], the Wang et al. method [26], and the use of two Retinex algorithms -SRIE [34] and MSCR [35]-as dual solutions for the dehazing problem as suggested in [28].
For our method we have considered 10 iterations.The number of DCT basis considered for our coupled-DCT method is 10 -i.e.we compute DCT basis up to order 4-unless otherwise stated.Also, we set A 0 = [1, 1, 1] for all the quantitative and psychophysical evaluations.

On reaching steady state for image J our
Our minimization looks for t our and A our , but we are interested in the image J our as our final result.Therefore, it is natural to wonder about the effect the iterative minimization of t our and A our has in the image J our .In particular, it will be interesting to study how the image J our reaches steady state.To this end Fig. 1 shows the difference between two consecutive iterations of the output image J our x,c -where c denotes the R,G,B channels-for the set of 500 hazy images proposed in the FADE dataset by Choi et al. [29].We compute this difference in the Mean Square Error (MSE) form, which for iteration k is defined as where N is the total number of pixels.For visualization purposes, we show the cube root of the MSE in the figure.We can clearly see in the figure that for all the methods the difference ends up being negligible, signifying that in practice the image J our reaches steady state without any significance problem.

Qualitative results
Fig. 2 presents some visual results for our approach with regards to the 6 non-physics methods selected, and to two different airlights: In terms of the starting airlight A 0 (last two columns of the Figure ), we can clearly see that our approach gives very similar results for both of them, therefore showing that our approach is very robust in this respect.
Looking now at the different algorithms -each algorithm is a different row in the Figure-, we can clearly see that in the case of the Choi et al. algorithm our method is able to correct the excessive saturation presented in the field, outputting more natural colors in the image.In the case of the EVID algorithm, our approach is able to correct the over-contrast introduced by the non-physics method in the cow, grass and rocks.Equivalently, the over-contrast is also corrected Fig. 2. Qualitative results for our approach, for 6 different non-physical dehazing methods and 2 different starting airlights.Our method improves all the original methods.Furthermore, our results for both airlights are very similar, showing the robustness of our approach.
for the Wang et al. method, especially noticeable in the tree and the close vegetation, and the Ret-MSCR method in the grass and close-by ducks.
In the case of the FVID algorithm we can clearly see that our approach corrects the artifacts appearing in the sky in the original method.Similarly, the Ret-SRIE mehod presents a halo artifact around the main building in the image that is clearly alleviated by our approach.
In summary, this Figure presents the two main advantages of applying our post-processing approach.First, it is able to correct over-saturation and over-contrast problems, and second, it is able to alleviate the artefacts that can appear when dehazing an image.

Non-reference metrics
In this subsection we study the performance of our method when considering non-reference based metrics.To this end, we consider the set of 500 hazy images proposed by Choi et al. in [29].We evaluate our results with respect to two very well-known non-reference image metrics: NIQE [36] and BRISQUE [37].For both metrics, a smaller number means a better method.Table 1 shows the results for the 6 methods considered in this paper.We can see how the simple coupled-method is already able to outperform the original method for almost all of those tested.Our Coupled-DCT approach drops the error metrics even further, and outperforms the original method and the coupled approaches in 10 and 9 out of 12 cases, respectively.In this subsection we focus on reference-based metrics.In this case, we need a dataset that presents pairs of hazy-clean(ground-truth) images.We have selected to use the Middleburry set of the D-Hazy dataset [38].In this case, images are indoor, and for this reason we run our method with a higher number of DCT basis: 55 (i.e.we compute DCT basis up to order 10).
In this subsection we look at 3 different metrics: the CID [39], which is a color extension of SSIM, the perceptual color difference ∆ E 00 , and the Visual Information Fidelity (VIF) metric [40].In the case of the CID metric and the ∆ E 00 , lower values mean better methods.For the VIF metric, the closer to 1 is the value, the better the method -as this will mean that both result and the ground-truth are equal in terms of the visual information present in the images-.A VIF value larger than one means that the result is over-enhanced, while VIF values smaller than 1 mean that the result is under-enhanced.
Results are shown in Table 2.We can clearly see that our Coupled-DCT approach outperforms all the others in 16 out of 18 cases.Also, the simple Coupled method outperforms the original dehazing method in 10 cases and draw with it in another 3 cases (see the results for RET-SRIE).

Preference ranking
We also performed a psychophysical experiment for which details are given below.Twelve subjects completed the experiment.None of them is an author of the paper.All observers were tested for normal color vision using the Ishihara color blindness test.Ethics was approved by the Comité Ético de Investigación Clínica, Parc de Salut MAR, Barcelona, Spain and all procedures complied with the declaration of Helsinki.

Apparatus
The experiment was conducted on an AOC I2781FH LCD monitor set to "sRGB" mode with a luminance range from 0.1cdm −2 to 175cdm −2 , with spatial and temporal resolutions of 1920 by 1080 pixels and 60 Hz.The display was viewed at a distance of approximately 70 cm so that 40 pixels subtended 1 degree of visual angle.The full display subtended 49 by 27.5 degrees.The decoding nonlinearity of the monitor was recorded using a Konica Minolta LS 100 photometer and was found to be closely approximated by a gamma function with an exponent of 2.2.Stimuli were generated under Ubuntu 15.04 LTS running MATLAB (MathWorks) with functions from the Psychtoolbox [41,42].The experiment was conducted in a dark room.

Stimuli
25 randomly selected images were taken from the FADE dataset [29].They are shown in Fig. 3.
For each image, the six original dehazing methods listed at the beginning of the section were computed.Then, the Coupled-DCT approach proposed in this paper with 10 DCT basis -i.e. the same parameters used for this dataset before-was also computed for each of the original methods.

Procedure
The experiment was independently run for each of the 6 original dehazing methods.The dehazed images -the result of the original method and the result of our coupled-DCT approach-were viewed on either sides of the original hazy image.Subjects were asked to select the image that they preferred out of the two dehazed images.The total number of comparisons was 150 -25 comparisons for each of the 6 original dehazing methods.On average, the experiment took around 25 minutes.

Analysis of the results
We have analyzed the result of our experiment in terms of the Thurstone Case V Law of Comparative Judgment.Figure 4 presents the results for the whole set of 150 comparisons.We can clearly see that our approach is preferred over the original non-physical dehazing methods, with statistical significance.Results for each individual original algorithm are presented in Fig. 5.We can clearly see that our DCT-coupled approach is statistically preferred over the original method for all the cases, showing that it generalizes very well to different non-physical dehazing methods.These results also validate the effectiveness shown by our coupled-DCT method for most of the image metrics cases tested.

Conclusions
We have presented an approach that induces a physical behaviour to non-physical dehazing methods.Its main notion is the consideration of an iterative coupling of the color channels, which is inspired by the Alternative Least Squares (ALS) method.We have shown how our method outperforms the original non-physical dehazing method qualitatively, quantitatively -both in terms of reference and non-reference metrics-.Finally, our method was also validated using psychophysical tests.

Fig. 1 .
Fig. 1.Study about the effect of the iterations on the steady state of J our for different original algorithms in the 500 images of the dataset in Choi et al..We can clearly see that for any original algorithm J our reaches steady state.

Fig. 4 .
Fig. 4. Results of the psychophysical experiment using the Thurstone Case V test for the whole set of 150 comparisons.

Fig. 5 .
Fig. 5. Results of the psychophysical experiment using the Thurstone Case V test for each of the non-physical dehazing methods considered in this work.