Skyless polarimetric calibration and visibility enhancement

Outdoor imaging in haze is plagued by poor visibility. A major problem is spatially-varying reduction of contrast by airlight, which is scattered by the haze particles towards the camera. However, images can be compensated for haze, and even yield a depth map of the scene. A key step in such scene recovery is subtraction of the airlight. In particular, this can be achieved by analyzing polarization-filtered images. This analysis requires parameters of the airlight, particularly its degree of polarization (DOP). These parameters were estimated in past studies by measuring pixels in sky areas. However, the sky is often unseen in the field of view. This paper derives several methods for estimating these parameters, when the sky is not in view. The methods are based on minor prior knowledge about a couple of scene points. Moreover, we propose blind estimation of the DOP, based on the image data. This estimation is based on independent component analysis (ICA). The methods were demonstrated in field experiments. © 2008 Optical Society of America OCIS codes: (110.5405) Polarimetric imaging; (290.5855) Scattering, polarization; (100.3020) Image reconstruction-restoration; (150.1488) Calibration; (000.5490) Probability theory, stochastic processes, and statistics; References and links 1. N. S. Kopeika, A System Engineering Approach to Imaging, SPIE Press, Bellingham (1998). 2. R. T. Tan, N. Pettersson and L. Petersson, “Visibility enhancement for roads with foggy or hazy scenes,” In Proc. IEEE Intelligent Vehicles Symposium 19–24 (2007). 3. R. C. Henry, S. Mahadev, S. Urquijo, and D. Chitwood “Color perception through atmospheric haze,” J. Opt. Soc. Am. A 17, 831-835 (2000). 4. J. S. Jaffe, “Computer modelling and the design of optimal underwater imaging systems,” IEEE J. Oceanic Eng. 15, 101–111 (1990). 5. D. M. Kocak, F. R. Dalgleish, F. M. Caimi and Y. Y. Schechner, “A focus on recent developments and trends in underwater imaging,” MTS Journal 42, 52–67 (2008). 6. Y. Y. Schechner and N. Karpel, “Recovery of underwater visibility and structure by polarization analysis,” IEEE J. Oceanic Eng. 30, 570–587 (2005). 7. P. C. Y. Chang, J. C. Flitton, K. I. Hopcraft, E. Jakeman, D. L. Jordan, and J. G. Walker, “Improving visibility depth in passive underwater imaging by use of polarization,” Appl. Opt. 42, 2794–2803 (2003). 8. D. B. Chenault, J. L. Pezzaniti, “Polarization imaging through scattering media,” In Proc. SPIE 4133, 124–133 (2000). 9. S. G. Demos and R. R. Alfano, “Optical polarization imaging,” Appl. Opt. 36, 150–155 (1997). 10. X. Gan, S. P. Schilders and M. Gu, “Image enhancement through turbid media under a microscope by use of polarization gating method,” J. Opt. Soc. Am. A 16, 2177–2184 (1999). 11. S. Harsdorf, R. Reuter, and S. Tönebön, “Contrast-enhanced optical imaging of submersible targets,” In Proc. SPIE 3821, 378–383 (1999). (C) 2009 OSA 19 January 2009 / Vol. 17, No. 2 / OPTICS EXPRESS 472 #93293 $15.00 USD Received 3 Mar 2008; revised 3 Aug 2008; accepted 7 Aug 2008; published 7 Jan 2009 12. M. J. Raković, G. W. Kattawar, M. Mehrübeoğlu, B. D. Cameron, L. V. Wang, S. Rastegar, and G. L. Coté, “Light backscattering polarization patterns from turbid media: theory and experiment,” Appl. Opt. 38, 3399– 3408 (1999). 13. S. P. Schilders, X. S. Gan, and M. Gu, “Resolution improvement in microscopic imaging through turbid media based on differential polarization gating,” Appl. Opt. 37, 4300–4302 (1998). 14. J. S. Tyo, M. P. Rowe, E. N. Pugh Jr., and N. Engheta, “Target detection in optically scattering media by polarization-difference imaging,” Appl. Opt. 35, 1855–1870 (1996). 15. J. S. Tyo, “Enhancement of the point-spread function for imaging in scattering media by use of polarizationdifference imaging,” J. Opt. Soc. Am. A 17, 1–10 (2000). 16. K. M. Yemelyanov, S. S. Lin, E. N. Pugh, Jr., and N. Engheta, “Adaptive algorithms for two-channel polarization sensing under various polarization statistics with nonuniform distributions,” Appl. Opt. 45, 5504–5520 (2006). 17. G. Horváth and D. Varjú, Polarized Light in Animal Vision, Springer-Verlag, Berlin (2004). 18. N. Shashar, S. Sabbah, and T. W. Cronin, “Transmission of linearly polarized light in seawater: implications for polarization signaling,” J. Exper. Biology, 207, 3619–3628 (2004). 19. R. Wehner, “Polarization vision a uniform sensory capacity?,” J. Exper. Biology 204, 2589–2596 (2001). 20. F. Cozman and E. Kroktov, “Depth from scattering,” In Proc. IEEE CVPR, 801–806 (1997). 21. K. Tan and J. P. Oakley, “Physics-based approach to color image enhancement in poor visibility conditions,” J. Opt. Soc. Am. A 18, 2460-2467 (2001). 22. E. Namer and Y. Y. Schechner, “Advanced visibility improvement based on polarization filtered images,” In Proc. SPIE 5888, 36–45 (2005). 23. Y. Y. Schechner, S. G. Narasimhan, and S. K. Nayar, “Polarization-based vision through haze,” Appl. Opt. 42, 511–525 (2003). 24. D. Miyazaki, M. Saito, Y. Sato and K. Ikeuchi,“Determining surface orientations of transparent objects based on polarization degrees in visible and infrared wavelengths,” J. Opt. Soc. Am. A 19, 687–694 (2002). 25. V. Gruev, A. Ortu, N. Lazarus, J. V. der Spiegel and N. Engheta, “Fabrication of a dual-tier thin film micropolarization array,” Opt. Express 15, 4994–5007 (2007). 26. N. Gupta, L. J. Denes, M. Gottlieb, D. R. Suhre, B. Kaminsky, and P. Metes, “Object detection with a fieldportable spectropolarimetric imager,” Appl. Opt. 40, 6626–6632 (2001). 27. C. K. Harnett and H. G. Craighead, “Liquid-crystal micropolarizer array for polarization-difference imaging,” Appl. Opt. 41, 1291–1296 (2002). 28. J. S. Tyo, D. L. Goldstein, D. B. Chenault, and J. A. Shaw, “Review of passive imaging polarimetry for remote sensing applications,” Appl. Opt. 45, 5453–5469 (2006). 29. J. S. Tyo and H. Wei, “Optimizing imaging polarimeters constructed with imperfect optics,” Appl. Opt. 45, 5497–5503 (2006). 30. J. Wolfe, R. Chipman, “High speed imaging polarimeter,” In Proc. SPIE 5158, 24–32 (2003). 31. L. B. Wolff, “Polarization camera for computer vision with a beam splitter,”J. Opt. Soc. Am. A 11, 2935–2945 (1994). 32. C. F. Bohren and A. B. Fraser, “At what altitude does the horizon cease to be visible?,” American Journal of Physics 54, 222-227 (1986). 33. D. K. Lynch, “Step brightness changes of distant mountain ridges and their perception,” Appl. Opt. 30, 3508-3513 (1991). 34. E. J. McCartney, Optics of the Atmosphere: Scattering by Molecules and Particles, John Willey & Sons (1975). 35. S. K. Nayar and S. G. Narasimhan, “Vision in bad weather,” Proc. IEEE ICCV, 820-827 (1999). 36. For clarity of display, the images shown in this paper have undergone the same standard contrast stretch. This operation was done only towards the display. The algorithms described in the paper were run on raw, unstretched data. The data had been acquired using a Nikon D-100 camera, which has a linear radiometric response. The mounted zoom lens used with the camera was set to focal length of ≈ 200mm, except for Fig. 9 in which it was ≈ 85mm. The camera was pointed at or slightly below the horizon. 37. H. Farid and E. H. Adelson, “Separating reflections from images by use of independent component analysis,” J. Opt. Soc. Amer A 16, 2136–2145 (1999). 38. S. Shwartz, M. Zibulevsky, and Y. Y. Schechner, “Fast kernel entropy estimation and optimization,” Signal Processing 85, 1045–1058 (2005). 39. S. Umeyama and G. Godin, “Separation of diffuse and specular components of surface reflection by use of polarization and statistical analysis of images,” IEEE Trans. PAMI 26, 639–647 (2004). 40. D. Nuzilland, S. Curila, and M. Curila, “Blind separation in low frequencies using wavelet analysis, application to artificial vision,” In Proc. ICA, 77–82 (2003). 41. S. Shwartz, E. Namer, and Y. Y. Schechner, “Blind haze separation,” In Proc. IEEE CVPR 2, 1984–1991 (2006). 42. E. H. Adelson, “Lightness perception and lightness illusions,” in The New Cognitive Neuroscience, 2nd ed. ch. 24 339–351, MIT Preess, Cambridge (2000). 43. R. A. Chipman, “Depolarization index and the average degree of polarization,” Appl. Opt. 44, 2490–2495 (2005). 44. S. J. Bell and T. J. Sejnowski, “An information-maximization approach to blind separation and blind deconvolution,” Neural Computation 7, 1129–1159 (1995). (C) 2009 OSA 19 January 2009 / Vol. 17, No. 2 / OPTICS EXPRESS 473 #93293 $15.00 USD Received 3 Mar 2008; revised 3 Aug 2008; accepted 7 Aug 2008; published 7 Jan 2009 45. J.-F. Cardoso, “Blind signal separation: statistical principles,” Proc. IEEE 86, 2009–2025 (1998). 46. A. Hyvärinen, J. Karhunen, and E. Oja, Independent Component Analysis, John Wiley and Sons, New York (2001). 47. D. T. Pham and P. Garrat, “Blind separation of a mixture of independent sources through a quasi-maximum likelihood approach,” IEEE Trans. Signal Processing, 45, 1712–1725 (1997). 48. P. Kisilev, M. Zibulevsky, and Y. Y. Zeevi, “Multiscale framework for blind source separation,” J. Machine Learning Research 4, 1339–1364 (2004). 49. E. P. Simoncelli, “Statistical models for images: Compression, restoration and synthesis,” In Proc. Conf. Sig. Sys. and Computers, 673–678 (1997). 50. An additional ICA ambiguity is permutation, which refers to mutual ordering of sources. This ambiguity does not concern us at all. The reason is that our physics-based formulation dictates a special form for the matrix W, and thus its rows are not mutually interchangeable. 51. T. M. Cover and J. A. Thomas, Elements of Information Theory, John Wiley and Sons, New York (1991). 52. T. Treibitz and Y. Y. Schechner, “Instant 3Descatter,” In Proc. IEEE CVPR 1861-1868 (2006). 53. P. Bofill and M. Zibulevsky, “Underdetermined blind source separation using sparse representations,” Signal Processing 81, 2353–2362 (2001). 54. M. Zibulevsky and B. A. Pearlmutter, “Blind source separation by sparse decomposition in a signal dictionary,” Neural Computation archive 13, 863 882 (2001


Introduction
Imaging in poor atmospheric conditions [1,2] affects human activities [3], as well as remote sensing and surveillance.Hence, analysis of images taken in haze is important.Moreover, such research promotes other domains of vision through scattering media, such as water [4][5][6] and tissue.It is widely acknowledged that exploiting polarization in imaging can enhance visibility in such conditions [7][8][9][10][11][12][13][14][15][16].Various studies suggest acquiring frames at distinct polarization states, and then mathematically manipulating the pair of frames to see better.This path is motivated by evidence of such a process in biological systems [17][18][19].
An additional approach to the problem of imaging in haze is to model and invert the imageformation process, based on a-priori knowledge about the scene [1,20,21].The polarization and inversion paths were both combined in an approach described in [22,23].There, the inversion of the image formation process capitalizes on the fact that one of the sources of image degradation in haze is partially polarized.Such analysis yields an estimate for the distance map of the scene, in addition to a dehazed image.Interestingly, recovery of a three dimensional range map based on polarization is also used for studying solid objects [24].The feasibility of inversion of haze effect based on analysis of polarization is enhanced by advances in polarimetric cameras [25][26][27][28][29][30][31].
To achieve such inversion, environmental parameters are required.In particular, it is important to know the parameters of stray light (called airlight [3,14,22,[32][33][34][35]) created by haze, which greatly decreases image contrast.These parameters can be determined from the image data itself.Originally [23], the required parameters were derived from measurements of pixels that correspond to the sky by the horizon (even automatically [22]).In this paper we refer to that method as sky-based.For example, a hazy scene is shown [36] in Fig. 1(a) and the sky-based dehazing result is shown in Fig. 1(b).
The prior parameter estimator [22,23] relies, thus, on the existence of sky in the field of view (FOV).This reliance has been a limiting factor, which inhibited the usefulness of the approach.Often, the FOV does not include a sky area.This occurs, for example, when viewing from a high vantage point, when the FOV is narrow, or when there is partial cloud cover on the horizon.In this work we address this problem.We manage calibration of the environmental parameters and consequently enable successful dehazing, despite the absence of sky in the FOV.Moreover, we propose a method that blindly separates the airlight radiance (the main cause for contrast degradation) from the object's signal.The parameter that determines this separation is the degree of airlight polarization.It is estimated without any user interaction.The method exploits mathematical tools developed in the field of blind source separation (BSS), also known as independent component analysis (ICA).
ICA has already contributed to solving image separation [37][38][39], particularly with regard to reflections.The problem of haze is more complex than reflections, since object recovery is obtained by nonlinear interaction of the raw images.Nevertheless, we show that the radiance of haze (airlight) can be separated by ICA, by the use of a simple pre-process.Dehazing had been attempted by ICA based on color cues [40].However, an implicit underlying assumption behind Ref. [40] is that radiance is identical in all the color channels, i.e. the scene is gray.This is untypical in nature.
We successfully performed calibration of the required atmospheric environmental parameters (including polarization), in multiple real experiments conducted in haze.Skyless dehazing then followed.We obtained blind parameter estimation which was consistent with direct sky measurements.Partial results were presented in Ref. [41].

Theoretical background
To make the paper self-contained, this section briefly reviews the known formation model of hazy images.It also describes a known inversion process of this model, which recovers visibility.This description is based on Ref. [23].As shown in Fig. 2, an acquired frame is a combination of two main components.The first originates from the object radiance.Let us denote by L object the object radiance as if it was taken in a clear atmosphere, without scattering in the line of sight (LOS).Due to atmospheric attenuation [23], the camera senses a fraction of this radiance.This attenuated signal is the direct transmission where is the transmittance of the atmosphere.The transmittance depends on the distance z between the object and the camera, and on the atmospheric attenuation coefficient β , where ∞ > β > 0. The second component is the path radiance (airlight).It originates from the scene illumination (e.g., sunlight), a portion of which is scattered into the LOS by the haze.Let a(z) be the contribution to airlight from scattering at z, accounting for attenuation this component undergoes due to propagation in the medium.The aggregate of a(z) yields the airlight Here A ∞ is the value of airlight at a non-occluded horizon.It depends on the haze and illumination conditions.Contrary to the direct transmission, airlight increases with the distance and dominates the acquired image irradiance at long range.The addition of airlight [42] is a major cause for reduction of signal contrast.
In haze, the airlight is often partially polarized.Hence, the airlight image component can be modulated by a polarizer mounted on the camera (analyzer).At one polarizer orientation the airlight contribution is least intense.Since the airlight disturbance is minimal here, this is the best state of the polarizer.Denote this airlight component as A min .There is another polarizer orientation (perpendicular to the former), for which the airlight contribution is the strongest, and denoted as A max .The overall airlight given in (Eq. 3) is given by Assuming that the direct transmission is not polarized, the energy of D is equally split among the two polarizer states.Hence, the overall measured intensities at the polarizer orientations The degree of polarization (DOP) of the airlight is defined as where A is given in Eq. (3).For narrow FOVs, this parameter does not vary much.In this work we indeed use a narrow FOV, hence assume that p is laterally invariant.Eq. ( 7) refers to the aggregate airlight, integrated over the LOS.Is p invariant to distance?Implicitly, this would mean that the DOP of a(z) is unaffected by distance.This is not strictly true.Underwater, it has recently been shown [18] that the DOP of light emanating from z may decay with z.Such depolarization [43] may also be the case in haze, due to multiple scattering.Multiple scattering also causes blur of D. For simplicity, we neglect the consequence of multiple scattering (including blur), as an effective first order approximation, similarly to Refs.[22,23].Note that It follows that It is easy to see from Eqs. (4,9) that an estimate for I total can be obtained by Dehazing is performed by inverting the image formation process.The first step separates the haze radiance (airlight) A from the object's direct transmission D. The airlight is estimated as Â = (I max − I min )/p .(11) Then, Eq. ( 4) is inverted to estimate D. Subsequently, Eq. ( 1) is inverted based on an estimate of the transmittance (following Eq. 3) These operations are compounded to Two problems exist in this process.First, the estimation (i.e., separation) of airlight requires the parameter p.Secondly, compensation for attenuation requires the parameter A ∞ .Both of these parameters are generally unknown, and thus provide the incentive for this paper.In past studies that used polarization for dehazing, these parameters were estimated based on pixels which correspond to the sky near the horizon.
A method to calibrate the parameters when the sky is unseen was theoretically proposed in Ref. [23].However, it has not demonstrated practical feasibility.It required the presence in the FOV of at least two different classes of multiple objects, each class having a similar (unknown) radiance in the absence of haze.For example, the classes can be buildings and trees.We found that method to be impractical.It can be difficult to identify two sets, each having a distinct class of similar objects.Actually, sometimes scenes do not have two such classes.Moreover, to ensure a significant difference between the classes, one should be darker than the other.However, estimating the parameter based on dark objects is prone to error caused by noise.Therefore, in practical tests, we found the theoretical skyless possibility mentioned in Ref. [23] to be impractical.
We now recap the assumptions underlying our model, as expressed by the above expressions.First, single scattering is dominant.Hence, blur and depolarization due to multiple scattering do not have a strong influence.Second, the distribution of scattering particles in the scene is spatially homogeneous.In reality, these assumptions do not hold accurately, particularly as this paper deals with methods to use in field experiments.Therefore, they are used as an approximation model, that yields algorithms having practical effectiveness in the field.The errors of this approximation may cause small but noticeable deviations, as discussed in Sec. 6.

Skyless parameter calibration and dehazing
This section introduces several methods to recover the environmental parameters p and A ∞ when the sky is not in view.The method presented in Sec.3.1 requires the use of just a single class of objects residing at different distances.The consecutive methods assume that the parameter p is known.This parameter can be blindly derived by a method described in Sec. 4. Consequently, there is reduction of the information needed about objects and their distances.The method presented in Sec.3.2 only requires the relative distance of two areas in the FOV, regardless of their underlying objects.The method described in Sec.3.3 requires two similar objects situated at different, but not necessarily known distances.Table 1 summarizes the requirements of each of these novel methods.

Distance-based dehazing
In this section, we develop a method for estimating p and A ∞ based on known distances to similar objects in the FOV.An idea to estimate atmospheric parameters by marking selected scene points was suggested in [20].Motivated by this idea, suppose we can mark two scene points (x k , y k ), k = 1, 2, which, in the absence of scattering, would have a similar (unknown) radiance.For example, these can be two similar buildings which have an unknown radiance L build .The points, however, should be at different distances from the camera z 2 > z 1 .For example, the two circles in Fig. 1(a) correspond to two buildings, situated at known distances of 11km and 23km.Using Eqs.(1,2,3,6), the image values corresponding to the object at distance z 1 are Similarly, for the object at distance z 2 , Let us denote It can be shown that C while Eqs.(16,17) yield where Dividing Eq. ( 19) by Eq. ( 20) yields the following constraint Let a zero crossing of G(V ) be at V 0 .We now show that based on this V 0 , it is possible to estimate p and A ∞ .Then, we prove the existence and uniqueness of V 0 .Estimation of the parameters is done in the following way.It can be shown that and Following Eqs.(5,23,24), an estimate for A ∞ is obtained by Based on Eqs.(14,15), define Using Eqs.(7,25,26), Thus, Eqs.(25,26,27) recover the required parameters p and A ∞ .Two known distances of similar objects in the FOV are all that is required to extract parameters used for polarization-based dehazing, when the sky is not available.
Let us now prove the existence and uniqueness of V 0 .
• Recall that ∞ > β > 0, therefore 0 < V < 1 (Eq.21). • • G| V =1 = 0.This root of G is not in the domain.• The function G(V ) has only one extremum.The reason is that its derivative is null only when • This extremum is a minimum.It can be shown that ∂ 2 G/∂V 2 > 0.
Due to these facts, Eq. ( 22) always has a unique root at V 0 ∈ (0, 1).Typical plots of G(V ) are shown in Fig. 3. Due to the simplicity of the function G(V ), it is very easy to find V 0 using standard tools (e.g., Matlab).The solution V 0 can be found even when z 1 and z 2 are only relatively known, i.e., it is possible to estimate the parameters A ∞ and p based only on the relative distance z = z 2 /z 1 , rather than absolute distances.For example, in Fig. 1(a), z = 2.091.Denote Then, Eq. ( 22) is equivalent to the constraint Similarly to (22), also Eq. ( 31) has a unique root at Ṽ0 ∈ (0, 1).Hence, deriving the parameters is done similarly to Eqs. (25,26,27).Based on Ṽ0 , A ∞ is estimated as Similarly to Eq. ( 26), Then, Eq. ( 27) yields p.

(e).
There is a minor difference between Figs. 1(b) and 1(e).This is discussed in Sec. 7. The absolute distances z 1 , z 2 or their ratio z can be determined in various ways.One option is to use a map (this can be automatically done using a digital map), assuming the camera location is known.Relative distance can be estimated using the apparent ratio of two similar features that are situated at different distances.Furthermore, the absolute distance can be estimated based on the typical size of known objects.

Distance-based dehazing, with known p
In some cases, similar objects at known distances may not exist in the FOV, or may be hard to find.Then, we cannot use the method presented in Sec.3.1.In this section we overcome this problem, by considering two regions at different distances z 1 < z 2 , regardless of the underlying objects.Therefore, having knowledge of two distances of arbitrary areas is sufficient.The approach assumes that the parameter p is known.This knowledge may be obtained by a method we describe in Sec. 4, which is based on ICA.Based on a known p and on Eq. ( 11), Â is derived for every coordinate in the FOV.As an example, the estimated airlight map Â corresponding to Scene 1 is shown in Fig. 4. The two rectangles represent two regions, situated at distances z 1 and z 2 .Note that unlike Sec.3.1, there is no demand for the regions to correspond to similar objects.
• The function G p (V ) has only one extremum: its derivative is null only once.
• This extremum is a maximum.It can be shown that ∂ 2 G p /∂V 2 < 0.

Feature-based dehazing, with known p
In this section we describe a method to estimate A ∞ , based on identification of two similar objects in the scene.As in Sec.3.1, these can be two similar buildings which have an unknown radiance L build .Contrary to Sec. 3.1, the distances to these objects are not necessarily known.Nevertheless, these distances should be different.As in Sec.3.2, this method is based on a given estimate of p, obtained, say, by the BSS method of Sec. 4. Thus, an estimate of Â(x, y) is at hand.
It is easy to show [23] from Eqs. (11,12,13) that where is a constant for these objects.Buildings at different distances have different intensity readouts, due to the effects of scattering.Therefore, they have different values of I total and A. According to Eq. ( 43), Îtotal as a function of Â forms a straight line.Such a line can be determined using two data points.Extrapolating the line, its intercept yields the estimated radiance value Lbuild .
Let the slope of the fitted line be S build .We can now estimate A ∞ as Based on Â∞ and p, we can recover Lobject (x, y) for all pixels, as explained in Sec.3.1.As an example, the two circles in Fig. 1(a) mark two buildings residing at different distances.The values of these distances are ignored, as if they are unknown.The corresponding dehazing result is shown in Fig. 1(c).

Blind estimation of p
Both Secs.3.2, 3.3 assume that the parameter p is known.In this section, we develop a method for blindly estimating p. First, note that Eq. ( 13) can be rewritten as This is a nonlinear function of the raw images I max and I min , since they appear in the denominator, rather than just superimposing in the numerator.However, the image model illustrated in Fig. 2 has a linear aspect: in Eqs.(4,10), the sum of the two acquired images I min , I max is equivalent to a linear mixture of two components, A and D. This linear interaction makes it easy to use tools that have been developed in the field of ICA for linear separation problems.This section describes our BSS method for hazy images.The result of this BSS yields p.

Facilitating linear ICA
To facilitate linear ICA, we attempt to separate A(x, y) from D(x, y).ICA relies on independence of A and D. Thus, we describe a transformation that enhances the reliability of this assumption.From Eq. ( 9), the two acquired images constitute the following equation system: where Thus, the estimated components are where Eqs. (47,49) are in the form used by linear ICA.Since p is unknown, then the mixing matrix M and separation matrix W are unknown.The goal of ICA in this context is: given only the acquired images I max and I min , find the separation matrix W that yields "good" Â and D. A quality criterion must be defined and optimized.Typically, ICA would seek Â and D that are statistically independent (see [44][45][46][47]).Thus, ICA assumes independence of A and D. However, the airlight A always increases with the distance z, while D tends to fall, in general, with z.Thus, there is a negative correlation between A and D. To observe this, consider the hazy Scene 2, shown in Fig. 6.The negative correlation between A and D, corresponding to this scene is seen in Fig. 7.There are local effects that counter this observation, in places where the inherent object radiance L object increases with z.Thus, the significant negative correlation mentioned above occurs mainly in the lowest spatial frequency components: D decays with the distance only roughly.On the other hand, in some frequency components we can expect significant independence (Fig. 7).
Hence, the assumption underlying ICA is enhanced by transforming the images to a representation that is more appropriate than raw pixels.We work only with linear transformations as in [48], since we wish to maintain the linear relations expressed in Eqs. ( 47)-( 50).There are several common possibilities for linear band-pass operations.We opted for a wavelet transformation (see for example [49]), but the derivation is not limited to that domain.Define where W is the same as defined in Eq. ( 50).We now perform ICA over Eq. ( 52).As we shall see in the experiments, this approach is very effective.The assumption of statistical independence in sub-band images is powerful enough to blindly deliver the solution.In our case, the solution of interest is the matrix W, from which we derive p.Based on p, the airlight is estimated, and can then be separated from D(x, y), as described in Sec. 2.

Scale insensitivity
When attempting ICA, we should consider its fundamental ambiguities [46].One of them is scale: if two signals are independent, then they remain independent even if we change the scale of any of them (or both).Thus, ICA does not reveal the true scale of the independent components.A special case of scale ambiguity is the sign ambiguity, for which the scale is −1.This scale (and sign) ambiguity can be considered both as a problem, and as a helpful feature.The problem is that the estimated signals may be ambiguous.However, in our case, we have a physical model behind the mixture formulation.As we shall see, this model eventually disambiguates the derived estimation.Moreover, we benefit from this scale-insensitivity.As we show in Sec.4.3, the fact that ICA is insensitive to scale simplifies the intermediate mathematical steps we take [50].

Optimization criterion
Minimization of statistical dependency is achieved by minimizing the mutual information (MI) of the sources.The MI of Âc and Dc can be expressed as (see for example [51]) Here H Âc and H Dc are the marginal entropies of Âc and Dc , respectively, while H Âc , Dc is their joint entropy.However, estimating the joint entropy from samples is an unreliable calculation.Therefore, it is desirable to avoid joint entropy estimation.In the following, we bypass direct estimation of the joint entropy, and in addition we describe other steps that enhance the efficiency of the optimization.
Let us look at the separation matrix W (Eq. 50).Its structure implies that up to a scale p, the estimated airlight Â is a simple difference of the two acquired images.Denote Ãc as an estimation for the airlight component Âc , up to this scale Similarly, denote as the estimation of Dc up to a scale p, where here Hence, the separation matrix of Dc and Âc is Minimizing the statistical dependency of Âc and Dc means that Ãc and Dc should minimize their dependency too.We thus minimize the MI of Dc and Ãc , as a function of w 1 and w 2 .This way, the number of degrees of freedom of W is two.Minimizing Eq. (58) yields estimates ŵ1 and ŵ2 which are related to w 1 and w 2 by an unknown global scale factor.This aspect is treated in Sec.4.4.
As mentioned, estimating the joint entropy is unreliable and complex.Yet, Eqs.(47,49) express pointwise processes of mixture and separation: the airlight in a point is mixed only with the direct transmission of the same point in the raw frames.Following [46], for pointwise mixtures Eq. ( 58) is equivalent to Here H I max c ,I min c process.Moreover, note from Eq. ( 54), that Ãc does not depend on w 1 , w 2 .Therefore, H Ãc is constant and can also be ignored in the optimization process.Thus, the optimization problem we solve is simplified to where the term log |w 2 + w 1 | expresses log | det( W)| for the matrix given in Eq. (57).At this point, the following argument can come up.The separation matrix W (Eq. 57) has essentially only one degree of freedom p, since p dictates w 1 and w 2 .Would it be simpler to optimize directly over p?The answer is no.Such a move implies that p = ( ŵ1 + ŵ2 )/2.This means that the scale of ŵ1 and ŵ2 is fixed to the true unknown value, and so is the scale of the estimated sources D and Â. Hence scale becomes important, depriving us of the ability to divide W by p. Thus, if we wish to optimize the MI over p, we need to explicitly minimize Eq. ( 53).This is more complex than Eq.(60).Moreover, this requires estimation of H Âc , which is unreliable, since the airlight A has very low energy in high-frequency channels c.Thus, minimizing Eq. (60) while enjoying the scale insensitivity is preferable to minimizing Eq. ( 53) over p.

Back to polarization calibration
The optimization described in Sec.4.3 (which will later be simplified in Sec.4.5) yields estimates ŵ1 and ŵ2 .We now use these values to derive an estimate for p. Apparently, from Eq. ( 56), p is simply the average of ŵ1 and ŵ2 .However, ICA yields ŵ1 and ŵ2 up to a global scale factor, which is unknown.Fortunately, the following estimator is invariant to that scale.This process is repeated in each color channel.Once p is derived, it is used for constructing W in Eq. (50).Then, Eq. ( 49) separates the airlight Â and the direct transmission D. This recovery is not performed on the sub-band images.Rather, it is performed on the raw image representation, as in prior sky-based dehazing methods.
We stress that in this scheme, we bypass all inherent ICA ambiguities: permutation, sign and scale.Those ambiguities do not affect us, because we essentially recover the scene using a physics-based method, not a pure signal processing ICA.We use ICA only to find p, and this is done in a way (Eq.61) that is scale invariant.

Efficient optimization using a probability model
As written above, Eq. (60) yields ŵ1 and ŵ2 , from which p is subsequently derived.In this section, we take steps that further simplify the estimation of the cost function (60).This would allow for more efficient optimization.
Recall that we use sub-band images at various spatial frequencies.In natural scenes, subband images are known to be sparse.In other words, almost all the pixels in a sub-band image have values that are very close to zero.Hence, the probability density function (PDF) of a subband pixel value is sharply peaked at the origin.A PDF model which is widely used for such images is the generalized Laplacian (see for example [49]) where ρ ∈ (0, 2) and σ are parameters of the distribution.Here µ(ρ, σ ) is a normalization constant.The scale parameter σ is associated with the standard deviation (STD) of the distribution.
(C) 2009 OSA 19 January 2009 / Vol.17, No. 2 / OPTICS EXPRESS 487 However, we do not need this scale parameter.The reason is that ICA recovers each signal up to an arbitrary intensity scale, as mentioned.Thus, optimizing a scale parameter during ICA is meaningless.We can thus set a fixed unit scale (σ = 1) to the PDF in Eq. ( 62).This means that whatever Dc (x, y) is, its values are implicitly re-scaled by the optimization process to fit this unit-scale model.Therefore, the generalized Laplacian in our case is This prior of image statistics can be exploited for the entropy estimation needed in the optimization [53,54].Entropy is defined (see for example [51]) as where E denoted expectation.Substituting Eq. ( 63) into Eq.( 64) and replacing the expectation with empirical averaging, the entropy estimate is Here N is the number of pixels in the image, while ν(ρ) = log[µ(ρ)].Note that ν(ρ) does not depend on Dc , and thus is independent of w 1 and w 2 .Hence, ν(ρ) can be ignored in the optimization process.The generalized Laplacian model simplifies the optimization problem to The cost function (66) is a simple expression of the variables.Recall that expressions such as Eq. ( 53) or (60) use entropies, which may imply complex estimation of probability density distributions based on image histograms.In contrast, Eq. (66) bypasses the need for entropies, densities and histograms: it relies directly on the pixel values Dc (x, y) in the sub-band images.Eq. (66) appears to be relatively simple to optimize.However, we prefer a convex formulation of the cost function, as it guarantees a unique solution, which can be reached efficiently using gradient-based methods.Consider the following problem This is an approximation of Eq. (66), in which ρ = 1.We explain in the appendix that this approximation is unimodal and convex.Furthermore, the appendix discusses why this approximation is reasonable.Thus, Eq. ( 67) is the core of our ICA optimization.For convex problems such as this, convergence speed is enhanced by use of local gradients.See [48] for the differentiation of the absolute value function.

A note about channel voting
In principle, the airlight DOP p should be independent of the wavelet channel c.However, in practice, the optimization described above yields, for each wavelet channel, a different estimated value p.The reason is that some channels better comply with the independence assumption of Sec.4.1, than other channels.Nevertheless, there is a way to overcome poor channels.Channels that do not obey the assumptions yield a random value for p.On the other hand, channels that are "good" yield a consistent estimate.Hence the optimal p is determined by voting.Table 2.The [red green blue] values of the parameter p given in percent, estimated using the methods described in Secs.3.1 and 4. The results are compared to p sky , which is based on sky pixels.In Scene 4, there were no similar features residing at known distances, to be used in Sec.effects are effectively uniform to infinity.However, along an infinite LOS, atmospheric parameters do change.Eventually, this LOS passes beyond the atmosphere, in outer space, due to the Earth curvature.The direct sky measurement method thus assumes that most effects accumulate within a finite effective distance.Hence, the sky-based measurement, which was used in the The desired situation of having | Dc | ρ convex occurs only if ρ ≥ 1. Apparently, we should estimate ρ at each iteration of the optimization, by fitting the PDF model (Eq.62) to the values of Dc (x, y).Note that this requires estimation of σ as well.Such parameter estimation is computationally complex, however.Therefore, we preferred using an approximation and set the value of ρ, such that convexity is obtained.Note that ρ < 1 for sparse signals, such as typical sub-band images.The PDF representing the sparsest signal that yields a convex function in Eq. (66) corresponds to ρ = 1.Thus we decided to use ρ = 1 (see also [53][54][55]).By this decision, we may have sacrificed some accuracy, but enabled convexity.In contrast to the other steps described in sections 4.1,4.3 and 4.5, the use of ρ = 1 is an approximation.How good is this approximation?To study this, we sampled 5364 different images Dc (x, y).These images were based on various values of p, c and on different raw frames.Then, the PDF model (Eq.62) was fitted to the values of each image.The PDF fit yielded an estimate of ρ per image.A histogram of the estimates of ρ over this ensemble is plotted in Fig. 12.Here, ρ = 0.9 ± 0.3.It thus appears that the approximation is reasonable.

Fig. 1 .
Fig. 1.Dehazing of Scene 1 (distant up to 34 km).(a) The best polarized image.The two circles mark buildings.The rectangles mark arbitrary points at different distances.(b) The environmental parameters are estimated using the sky (sky-based dehazing).(c) Result of a feature-based method assisted by ICA (d) Result of a distance-based method assisted by ICA (e) Distance-based result.

Fig. 2 .
Fig. 2. [Dashed rays] Ambient light is scattered towards the camera by atmospheric particles, creating airlight A. It increases with object distance.[Solid ray] The light emanating from the object is attenuated as the distance increases, yielding the direct transmission D. Without scattering, the object radiance would have been L object .

2 > C 1 .
Note that C 1 and C 2 are known, since I max k and I min k constitute the acquired data at coordinates in the FOV.Now, from Eqs.(14,15)

Fig. 3 .
Fig. 3.The function G(V ) at each color channel, corresponding to distances z 1 = 11km and z 2 = 23km in Scene 1.Note that G| V =0 > 0 and G| V =1 = 0. Since G(V ) has a single minimum, it has only a single root in the domain V ∈ (0, 1).

Fig. 5 .
Fig. 5.The function G p (V ) at each color channel, corresponding to distances z 1 = 11km and z 2 = 23km in Scene 1.Note that G p | V =0 < 0 and G p | V =1 = 0. Since G p (V ) has a single maximum, it has only a single root in the domain V ∈ (0, 1).

7 .
(C) 2009 OSA 19 January 2009 / Vol.17, No. 2 / OPTICS EXPRESS 484 #93293 -$15.00USD Received 3 Mar 2008; revised 3 Aug 2008; accepted 7 Aug 2008; published 7 Jan 2009 The direct transmission D has a strong negative correlation to the airlight A. These images correspond to Scene 2. In a wavelet channel of these images, A c , D c have much less mutually dependency.
D c (x, y) = W {D(x, y)} (51) as the wavelet (or sub-band) image representation of D.Here c denotes the sub-band channel, while W denotes the linear transforming operator.Similarly, define the transformed version of A, Â, D, I max and I min as A c , Âc , Dc , I max cand I min c , respectively (see example in Fig.7).Due to the commutativity of linear operations, (C) 2009 OSA 19 January 2009 / Vol.17, No. 2 / OPTICS EXPRESS 485

(Fig. 9 .Fig. 10 .Fig. 11 .
Fig. 9. Dehazing of Scene 3, whose distances are up to 13.5 km.The scene contains smoke.(a) The best polarized raw image.(b) Sky-based dehazing.(c) Result of a featurebased method assisted by ICA (d) Result of a distance-based method assisted by ICA (e) Distance-based result.

(Fig. 12 .
Fig. 12.A histogram of ρ, based on PDFs fitted to data of 5364 different images Dc (x, y), which were derived from various values of p, wavelet channels c and different raw images.In this histogram, ρ = 0.9 ± 0.3.

Table 1 .
The requirements of prior knowledge in the different methods.