Image Restoration by Learning Morphological Opening-Closing Network

: Mathematical morphology is a powerful tool for image processing tasks. The main difficulty in designing mathematical morphological algorithm is deciding the order of operators/filters and the corresponding structuring elements (SEs). In this work, we develop morphological network composed of alternate sequences of dilation and erosion layers, which depending on learned SEs, may form opening or closing layers. These layers in the right order along with linear combination (of their outputs) are useful in extracting image features and processing them. Structuring elements in the network are learned by back-propagation method guided by minimization of the loss function. Efficacy of the proposed network is established by applying it to two interesting image restoration problems, namely de-raining and de-hazing . Results are comparable to that of many state-of-the-art algorithms for most of the images. It is also worth mentioning that the number of network parameters to handle is much less than that of popular convolutional neural network for similar tasks. The source code can be found here https://github.com/ranjanZ/Mophological-Opening-Closing-Net


Introduction
Mathematical Morphology was developed initially for the analysis of geometrical structures. But its strong theoretical foundation has facilitated its use in various domains, like digital image processing and graphs, to name a few. Although this is quite an old field, there has been a renewed interest in using morphological operators for various image processing tasks in recent years. This is mostly due to their effectiveness in solving various challenging problems [36,41] involving shapes. Depending on the task at hand, the choice and size of the Structuring Element (SE) greatly affects the performance of the methods, apart from the choice of the morphological operator [9]. Also, a single operator seldom suffices to obtain the desired results. Usually, different operators need to be operated in a particular sequence. Choosing correct sequence of correct operators along with associated SEs can be extremely hard in practice. Recently Mondal et al. [30,32] have shown that the linear combination of elementary morphological operations, i.e., dilations and erosions can approximate any smooth functions and the function to be approximated can be learned using back-propagation similar to neural networks. Inspired by these findings, in this paper we propose a technique to automatically learn the sequence of elementary operations along with associated SEs required to solve some real-life problems. This may be achieved while having access to pairs of input and output images. Morphological operations are based on min-max algebra resulting in the inherent non-linearity of the system, which eliminates the need for activation function. Dilation and erosion operations increase the possible number of decision boundaries, due to which the network can learn effective representation using fewer parameters compared to CNN. Most of the computer vision models for tasks such as object detection, segmentation and classification are trained using data captured in well-controlled environments. Images captured in adverse environments degrade the image quality, and it becomes necessary to restore the data before using them in computer vision tasks.
Here we propose and implement morphological network replicating the effect of opening and closing (a combination of dilation and erosion) through automatic learning of SEs. To show the effectiveness of our approach, we have applied our proposed network on two image restoration problems: image de-raining and image de-hazing. We do not use any highly optimized or highly complex architecture for the tasks to demonstrate versatility of the proposed network. Rest of the paper is organized as follows. Section 2 describes some previous works related to our objective. The proposed method is presented and explained in Section 3. After that, we have presented experimental results with discussion on de-raining and de-hazing problems in Section 4. Finally, concluding remarks and future scope of work are given in Section 5.

Related Works
Mathematical morphology is based on min-max algebra which results in inherent non-linearity within the operation. It has a rich theoretical framework. Mathematical morphology is used in many image processing tasks. Morphology is very efficient in removing noise [42] and contrast enhancement [33]. Bovik [7] shown that it can be used for template matching. Morphological operations are also able to extract feature from images like edges [12] and peak/valley blob. It is highly suitable for many shape-oriented problems, including character recognition [27] and shape analysis [4]. Based on the combination of basic dilation & erosion operations, many complex image filters can be built. Convolutional Neural Network(CNN) are very popular in extracting image features and solving different problems in image processing like image denoising [49,50] , semantic segmentation [5,34]), and object detection [16,20,37]. Image de-raining and de-hazing are ill-posed problems, considering that one needs to accurately determine the depth of (partially) obscured surface as well as restore color. There have been several approaches in dealing with hazy images. He et al. [18] proposed a simple method, called as 'dark channel' prior, where the statistic of the outdoor haze-free image was used to find the minimum value in any one of the colour channels. This information was used to determine the haze depth and transmittance and produced the de-hazed image. Ancuti et al. [3] used a fast method to estimate transmittance and airlight by identifying hazy region based on the difference in the hue of image and its inverse. Transmittance was also estimated by Meng et al. [28] from boundary constraints on a radiance cube. Fattal [10] used the concept of color line prior, based on the fact that colors of pixels over a small image patch have a linear structure in RGB space. This line is displaced from the origin for hazy images due to airlight, which can be used for estimating transmittance. Tang et al. [44] learned the mapping between the hand-crafted features and transmittance. Learning features from patches, and mapping between features and transmittance can also be achieved by CNN using regression [8]. Ren et al. [38] proposed CNN network at multiple scales to extract features and to establish their mapping to transmittance. Li et al. [23] used a modified haze equation where the parameters were unified to a single variable, which was estimated by a CNN.
For image de-raining, Luo et al. [26] assumed sparse rain streaks having similar orientations. They proposed a discriminative approach that approximated the clean background and removed rain streak components. Chen et al. used a low patch-rank prior to capture rain pattern. Patch-based GMM priors to model rain streak and background by decomposing input image was proposed by Li et al. [25]. Fu et al. [14] used CNN to learn the mapping between rainy image and clean image directly. Fu et al. [15] decomposed input image into background and rain in different layer, from which CNN is used to remove rain streaks. Recently Mondal et al. [31] proposed a network with dilation and erosion operations which learns the structuring elements to remove the raindrops in a grayscale image.

Proposed Method
Convolution operation and elementary morphological operations such as dilation and erosion are all neighbourhood operations defined in terms of the kernel. However, morphological operators are non-linear operators, while convolution operator is linear. Though both types of operators can be defined on n-dimensions, in this work we consider morphological operations only on two-dimensional (2D) images. In this section, we build layers using dilation and erosion operators. The networks built using the dilation and erosion layers are subsequently applied for de-raining and de-hazing the images.
In mathematical morphology, the objects or its parts in an image are considered to be sets, and the operators are defined in terms of set theoretic translation, union, intersection and complementation to examine/extract various properties of these objects, such as shape and texture, using small geometrical probe known as structuring element (SE). In binary images, the pixels belong to either foreground or background (i.e., complement of foreground) set. These sets are viewed as sets in 2D space. Similarly, a grayscale image X defined over domain D = {(x, y)} may be represented as a 3D set U X = {(x, y, v) | v ≤ X(x, y)}, called umbra of X. Structuring element W defined over S = {(x, y)} may also be represented by a 3D set U W . Dilation of U X by U W is defined as union of all U X translated to every pixels of U W . To get back dilated grayscale image top surface of the dilation result is extracted. We may get eroded grayscale image in similar manner. However, a concise and more straightforward formulation of grayscale dilation and erosion is proposed by Sternberg [43] as follows.

Morphological dilation and erosion layers
Let gray-scale image X is of size M × N. The Dilation (⊕) and Erosion (⊖) operations on X are defined, respectively, as [43] where W d (x, y) ∈ IR and We(x, y) ∈ IR are the structuring elements of dilation and erosion operators respectively defined on domain S = { (l, m) | l ∈ {1, 2, 3, .., a}; m ∈ {1, 2, 3, .., b}}. Note that actual shape of the geometric probe (SE) may not always cover entire rectangular domain S, then During implementation a large number is used in place of ∞, and that number is at least 1 (one) plus the maximum possible pixel value in gray scale image. Both dilation and erosion are many-to-one mappings. That means may not imply W 1 (x, y) = W 2 (x, y). For example, let C denotes a curve whose length is greater than the diameter of a disk SE D. Then it can be shown that where δD denotes the boundary of D. In discrete domain, connectivity of C and D should be chosen appropriately (e.g., if C is 8-connected, then D should be 4-connected and vice versa). Similarly, may not imply W 1 (x, y) = W 2 (x, y). For example, let A be a simple blob (i.e., a connected component without any hole) and D is a disk, then it can be shown that The examples (as depicted in equations (3) and (4)) can also be verified by taking the SE as δD ∪ D ′ , where D ′ ⊂ D. This notion is useful in learning SEs for compound operators, like opening and closing. Now, similar to convolution layer, morphological layers can be formed using dilation and erosion operators. We call a layer with dilation (resp. erosion) operation as dilation-layer (resp. erosion-layer). We also define the output of dilation-layer as dilation feature map and that of erosion-layer as erosion feature map. Note that at each layer multiple dilation or erosion may be applied using different SEs. Applying k dilation operation will generate k dilation feature map in the next layer. We denote dilation applied on the input with k number of SE of size a × b by D k a×b . Similarly, erosion operation by k number of structuring element is denoted by E k a×b . Note that both dilation and erosion are increasing operations and also are dual to each other.

Morphological opening and closing layers
Elementary morphological operations, i.e., dilation and erosion, are applied in many image processing tasks, such as edge detection. However, opening and closing operations are far more useful and important as these are filters. Being filters these operations satisfy essential properties like increasing and idempotent. Moreover, opening is anti-extensive, while closing is extensive. Like dilation and erosion, opening and closing are also dual to each other. In fact, many operations that are increasing, idempotent and anti-extensive are termed as 'opening'. Some examples are area opening [45] and path opening [21] which follows the properties. Traditionally, opening and closing are defined as a compound operator by concatenating dilation and erosion operators. Thus, these operations are defined in terms of SEs as Closing : So, morphological opening or closing network can be constructed by cascading morphological dilation and erosion layers as defined in equations (5) and (6). Equations (3) and (4) suggest that X ∘ Wo may be resulted in by dilation and erosion with SEs different from Wo. In other words, for a class of problems where Wo, W ′ o and W " o may not be equal to each other. Thus, we may call ((X ⊖ W 1 ) ⊕ W 2 )(x, y) (for some W 1 and W 2 ) an opening operation if it is Similar argument is true for closing operation also. Note that to develop opening (or closing) network we could have used and trained the same SEs for dilation and erosion layers following equation (5) (resp. equation (6)). In that case we need to specify the use of opening or closing networks and their order beforehand to solve the given problem. In this work we have trained the SEs in a sequence of dilation and erosion layers independently so that the opening/closing networks (consequently, alternate sequential filters) evolve along with appropriate SEs to solve the problem in hand. Moreover, multiple dilation/erosion operations with different SEs W k (k = 1, 2, 3, · · · ) may be applied on an image or on already computed feature map to produce multiple dilation or erosion feature maps. These multiple feature maps are expected to highlight different types of features in the image based on the profile of the SEs.
All the SEs are initialized randomly while building the network and are trained using back-propagation based on the training samples (images). It may be noted that the erosion and dilation operations use min and max operations, respectively. So, though not fully differentiable, these are at least piece-wise differentiable. Hence, there occurs no problem in back-propagating the gradient to update the SEs. Next we briefly present how back-propagation takes place in the morphological network to train the SEs.

Back-propagation in Morphological Network
Back-propagation algorithm is used to update the SEs in the morphological opening-closing network. We have already stated that our opening and closing networks are built using dilation and erosion layers. Second, dilation and erosion are dual operations, so describing the training of SEs for either of them is sufficient to describe the other. For simplicity, we consider here a single dilation layer. The propagation of gradient thorough the network is very similar to that of a neural network. To start with we recall the expression of the gradient. Suppose an input image X having a dimension of M × N × c is passed through a dilation layer that produces an output feature map Y. The structuring element S in the dilation layer has size A × B × c and, with appropriate padding, the size of Y is M × N. The output at location (x, y) of Y can be expressed as : where (l, m, n) denotes the index of the SE S. The SE is trained using back-propagation so that Y approaches desired dilation feature mapỸ. Let L be the loss or error between output of the dilation layer and the desired dilation feature map, i.e., When SE S is properly trained, Y →Ỹ which implies L → 0. It is evident from equations (8) and (9) that Y and, consequently, L depend on SE S. Using chain rule due to partial derivatives we have Thus the structuring element is updated as where α is the learning rate. If multiple SEs S k (x, y, z) (k = 1, 2, 3, · · · ) are used, multiple dilation feature maps Y k (k = 1, 2, 3, · · · ) are obtained, but the back-propagation strategy would be same. The idea is straightaway extended to multi-layer network. Suppose an intermediate feature map denoted by X i is passed through i-th dilation layer and produces an output feature map X i+1,k after dilating with SE S i,k in i-th layer. Then, keeping all other parameters same, equation (8) can be re-written as Let L is the final loss of the network by comparing desired output and the predicted output. As before we can with respect to the k-th SE by the following equation using chain rule: The term can be obtained by computing recursively starting from the final layer. Finally, k-th is updated using equation (11). Similarly, the gradient for the erosion layer can be derived. A worked-out example of gradient calculation for the erosion layer is shown in [13].

Experiments with Morphological Networks
In order to establish the efficacy of our concept of learning morphological opening and closing networks along with associated structuring elements (kernels), we have carried out some initial experiments before applying to real-life problems. However, we first need to show experimentally that the opening network realized by concatenating erosion and dilation layers with not necessarily the same SEs may satisfy desired properties of opening operation, i.e., increasing, idempotent and anti-extensive. In other words, a network made of an erosion layer followed a dilation layer having different SEs satisfies essential properties of opening and, thus, can work like opening network.

Verifying properties of opening operation
First, for training and testing the network to be developed, we have taken 400 natural images from flickr [35] and converted those to grayscale images of size 416×416. Each image is opened with a disk SE of radius 7 to generate the groundtruth of opened image. Second, we construct an opening network by concatenating an erosion layer followed by a dilation layer (see figure 1(a)), where each layer has its own structuring element trained by back-propagation to minimize the loss. We take SEs of size 20 × 20 and initialize with random numbers from uniform distribution. Suppose image X is input to the network and at the output we obtain image (X).
The experiment is carried out following a 10-fold cross-validation strategy. That means the total image set is divided into 10 groups, and images of 9 groups are used to train the network while the images of the remaining group are used as test images. It is repeated 10 times with each group as the test set and the result of all 400 images are accumulated while those are considered as test images. A sample pair of SEs trained in the network is shown in figures 1(b) and 1(c).
Verifying 'Increasing' property: where N is a noise image whose pixel values range from 0 to 1. Finally, X ′ is clipped between 0 and 1 inclusive to get the final noisy imageX with bright noise. Thus X <X, and increasing property ensures that (X) ≤ (X).  So the opening network fails to satisfy this property if (X)(x, y) − (X)(x, y) > 0 for any (x, y). Hence, for u-th image Xu we calculate error E I u for increasing property as where |Xu| denotes the size of image Xu. Histogram of this error {E I u |u = 1, 2, · · · 400} is shown in figure 2(a). The mean and standard deviation of E I u are 0.000 and 0.000, respectively, which implies that the network satisfies the increasing property.
Verifying 'Idempotent' property: As before an Image X feed to trained opening network produces (X). Now if we feed (X) again to the network, we get ( (X)). To hold the idempotent property, (X) should be equal to ( (X)). In other words, the network fails to satisfy idempotent property if | (X)(x, y)− ( (X))(x, y)| ≠ 0, for any (x, y). Thus, for each u-th image Xu we calculate error E d u as Histogram of this error {E d u |u = 1, 2, · · · 400} is shown in figure 2(b). Mean and standard deviation of E d u are 0.019 and 0.004, respectively. Since these values are not equal to zero (0) as before, we can at most say that the network very closely satisfies the idempotent property and expect it to satisfy fully if the network is trained better with a larger number of images. Verifying 'Anti-extensive' property: This property may be verified in exactly similar way for increasing property because anti-extensive property states that (Xu)(x, y) ≤ Xu(x, y). Hence, corresponding error Ea u may be computed as Histogram of this error {Ea u |u = 1, 2, · · · 400} is shown in figure 2(c). The mean and standard deviation of Ea u are 0.000 and 0.000, respectively, which implies that the network satisfies the anti-extensive property. Note that we could have trained the same SE for both erosion and dilation layers to realize opening (and similarly closing) network, but in that case, we need to define the sequence of operations to solve a given problem. On the other hand, training the SEs independently for each layer allows us to develop the required network to solve the problem at hand. Next, we show how different networks evolve from the elementary morphological layers based on the trained SEs.

Implementing various morphological network using common framework
Here we again take the same set of grayscale images and try to simulate various morphological networks using a sequence of dilation and erosion layers. To generate the ground truth, first, we apply a few morphological operations in a conventional way on the set of images. Then we build a network architecture as a sequence of dilation and erosion layers. We train the morphological network, especially, the structuring elements using the back-propagation algorithm and mean squared error loss between the output by conventional method and the output produced by the network.
In Table 1, we have shown the SEs that are learned based on assigned task and simulated input and output pair. The first column indicates the morphological operations used to generate the output image (Y) from the input image (X). The second column shows the network architecture we have employed to achieve the operation stated in the first column. After training the network (for both the required operation and the associated SE), it is expected to generate the desired output Y given the input X. The learned SEs are shown in the third column. Note that in case of redundant or unnecessary layer, corresponding SE would be an impulse, which realizes an identity transform. For example, in the first row, we try to simulate 'dilation' operation, while the network architecture consists of an erosion layer followed by a dilation layer. After convergence, the trained SE for erosion is found to be approximately an impulse; while SE for the dilation layer is a disk of desired dimension. Similar network is used to simulate erosion and opening as shown in the second and third rows, respectively. The trained SEs justifies their objectives. In the fourth and fifth rows, networks with four layers are used to simulate opening and closing. In the fifth row, it can be seen that the SEs of the first layer (erosion) and the fourth layer (dilation) are approximately impulses as these layers are redundant and unnecessary, and so are expected to implement identity transform; whereas in the second (dilation) and the third (erosion) layers the learned SEs are similar to disks. However, in the fourth row, combination of first and second layers or combination of third and fourth layers can simulate opening. Secondly, opening is an idempotent operation. So we see trained SEs are more or less similar and are of significant size in all four layers.
In this arrangement We obtain a single dilated or eroded feature map after each layer. However, by using multiple SEs at each layer, multiple feature maps can be generated for performing complex tasks like deraining and de-hazing.
Though in this simulation experiment, the input and output of the network are both graylevel images, this may be extended to colour image where both input to our network and corresponding output are in (R,G,B) format. We do not apply any strategic treatment to handle colour information, rather let the network to learn the three-dimensional SE of size A × B × 3 so that it can produce desired output. Second, we have used multiple SEs in each layer. It may help to generate and process features from the colour image and produce desired output. Moreover, in this experiment we use a single SE at each layer, so single feature map is obtained after each layer. By using multiple SEs at each layer, multiple feature maps can be generated for performing complex tasks like de-raining and de-hazing as described in the next sections.

Image De-raining
The degradation of rainy images depends on several factors such as raindrop size, track of raindrops, rain density and lighting condition. There may be other types of noises present simultaneously in the input image. Morphological filters such as opening and closing are capable of removing noise from an image while preserving the edges. Removing raindrops from an image can be considered as removing mostly bright noise of particular shape and size from the image. This suggests that alternate sequence of dilation and erosion layers, forming opening/closing operations, in the morphological network should be able to realize the deraining operations by learning the appropriate SEs. However, it is challenging to know the size and shape of the raindrops and other factors beforehand.
Hence, for the purpose, we propose, a morphological network architecture consisting of a sequence of multiple pairs of dilation and erosion layers, which may result in alternating sequential filters (ASF) in parallel paths. Schematic diagram of the network is shown in Fig. 3. The output feature maps from the two paths are then linearly combined to get the output maps. This step is essential for recovering undesired removal of some features by opening or closing filters. Finally, sigmoid activation function is applied to it to produce the final output (image). In Table 2 we have shown three architectures, which we have used for our experiments.  Erosion Opening Opening Closing In [31], the authors have shown that opening network, as expected, is more effective in removing bright noise. Usually closing network removes dark noise, which does not have much to do in this application. However, to make our network more general, we take a linear combination of the output of opening path and closing path and make a single generic architecture. Since the output of the de-rained image should be of the same size as the input image, appropriate zero padding is used in each layer. For the color image processing, a number of feature maps after each dilation layer and erosion layer using as many SEs. Multiple feature maps help in propagating the color information to the output. During training, we initialize all the SEs and the weights of the linear combination layers with random numbers. To train the network, we define a loss using the structural similarity index measure (SSIM) between predicted output P and the ground truth T of the data defined as Here µ P and µ T are the mean for image P and T, respectively; while σ P and σ T are their standard deviations. The term σ PT is the co-variance between the images, and c 1 and c 2 are constants, whose values are set to 0.0001 and 0.0009, respectively to avoid divide by zero. We use the structural dissimilarity (DSSIM) as the loss function considering all small patches of the output and those of corresponding ground truth for training the network. So the loss function is defined as where I out and I gt are the output image and the ground truth image, and P i out and P i gt are the i th patches, respectively. There are a total of M patches. In our experiment, we have taken the patch size as 10 × 10. We minimize the loss function Loss total by back-propagation algorithm and learn the parameters of the network.  Input Output

Image De-Hazing
When light rays travel through a turbid medium, it undergoes a phenomenon known as scattering, wherein the light is scattered in different directions due to its interactions with particles, such as dust and aerosols floating in it. Haze occurs when the concentration of these particulate matters exceed a certain threshold. Any image taken in hazy conditions suffers from visibility degradation such as reduced contrast, saturation attenuation and color shifting. Besides, scattered environmental light appears like a veil over the scene. These days, image de-hazing has become one of the trending problems. Here we intend to restore such hazy images using the opening-closing network. The observed hazy image can be physically modelled using the following equation [22] where I(x) is the observed or recorded intensity of the hazy image at location x and J(x) is the intensity of corresponding non-hazy (ideal) image at location x. A is the airlight, which characterizes the constant environmental illumination. t(x) is the transmittance coefficient that determines the amount of light reaching the observer (camera) from the objects after travelling through the medium. Transmittance t(x) intuitively measures the amount of haze present in a particular location x and, in general, depends upon the depth of the scene. We modify equation 19 as where the bias like term K(x) is the space-variant airlight representing (1 − t(x))A(x). Also note that, since  (resp. dark) features whose scaling affects the contrast. The morphological network is employed to estimate the airlight map as well as the transmittance map in order to recover the haze-free image. As shown in Fig. 4, we have taken two separate paths: one for opening and the other closing. The outputs of the paths undergo linear combination. The network produces the transmittance map t(x) and the airlight map K(x) as output.
Since 0 ≤ t(x) ≤ 1 and 0 ≤ K(x) ≤ 1, in the last layer of transmittance path and also of airlight path we have employed sigmoid activation function to limit the estimated values of t(x) and K(x) within the said range.
Once t(x) and K(x) are estimated, we can determine de-hazed image as During training of the network, with given t(x) and K(x), hazy image is generated as Now, given a pair of hazy and haze-free clear images, the network learns the SEs and also the weights of the linear combination layer. Network givest(x) andK(x) at every iteration of the training process. We reconstruct the hazy imageÎ out and estimated de-hazed imageĴ out based on theset(x) andK(x) using the equations 21 and 22. We define the loss function L, which is very similar to bi-directional consistency loss [29] given by where, DSSIM is calculated using equation 18. We minimize the loss L and learn the network parameters. In the next section, we present experimental results to justify our claim.

Experimental results
In this section, we evaluate performance of the proposed network both quantitatively and qualitatively on de-raining and de-hazing tasks.

Experimental Setup and Data Set
We have implemented the network in Python using Keras with TensorFlow library at the backend. We carried out our experiments on Intel machine with a GPU of memory 12GB. For all the experiments, we have initialized the structuring elements randomly using standard glorot uniform initializer [17]. To minimize the loss, we have used RMSProp optimizer for all the networks. Training of image de-raining network is done with benchmark rain dataset [14]. This rain dataset has 1, 000 clean images. For each clean image, there are 14 different synthesized rainy images with different streak orientations and sizes. So, a total of 14000 sample images are available. We have considered 80% of the data for the training and 10% of the data for validation, and the rest 10% is used as test data. Images in the dataset are of different sizes. However, as the network takes fixed-size input, all the images are resized to 416 × 416 through bi-linear interpolation.
For de-hazing, we have used O-HAZE [1] dataset for training. This dataset was first proposed in O-HAZE [3] challenge. The dataset has 35 images for the training and five images for validation and five images for testing. In this set, images are of different sizes, and the sizes are of the order of 5500 × 3500 pixels. So we have resized each image to 1024 × 1024. However, while testing, we have taken the full resolution of the images. Since the O-HAZE dataset is very small, we have also used NYU dataset of D-Hazy dataset [2] for the training. For quantitative evaluation of our Opening/Closing network, we have used validation dataset of the O-HAZE as test set because their ground truths are available. We also test our network in Fattal and Middlebury dataset of D-Hazy [2].
We have trained both the de-rain and de-haze network until the loss is converged. Note that, the loss function involving max/min (due to dilation/erosion) operation is only piece-wise differentiable. In practice, we can find the sub-gradient of each morphological operation and back-propagate through the network. In the next two subsections, we have evaluated our Morphological network developed for De-hazing and Deraining network.

Results of Image De-Raining
Here we present the quantitative and qualitative evaluation of morphological network for image de-raining.

Quantitative Evaluation
In order to evaluate quantitatively the opening-closing network applied on rainy images, we have used two objective measures such as SSIM [46] and Peak signal-to-noise ratio (PSNR). In [31], image de-raining is done only on grayscale images. We have extended that work for de-raining of color images. The extension is not straightforward and a minor modification of network is needed to preserve hue in the image. We trained the network on the rainy image dataset until the convergence of training loss is achieved. Opening Net, Closing Net, and the opening-closing net are trained separately to study their relative performances. The architec- tures of these networks are illustrated in table 2. The estimated de-rained image from each network is quantitatively evaluated against the rain-free image available as ground truth in the dataset. The average values of SSIM and PSNR for grayscale and color test images, are reported in Table 3. In [31], we have shown that opening path (network) performs better than the closing path (network) as the opening filter removes bright noise. However, as shown in Table 3 closing network performs similarly well as opening network. We believe this is because we have taken multiple dilation/erosion layers and problem dependent learning of SEs has created effectively an opening path emphasizing bright noise removal. We have also compared our results with a standard convolutional neural network (CNN) of U-net architecture [40]. This Table also reveals that the opening-closing network gives results comparable to that of CNN with U-net architecture. However, we observe that CNN has more generalization capability than the opening-closing network, but needs to train a huge number of parameters, which incurs a very high cost. It can also be seen that in the PSNR metric, we do not get much improvement for both grayscale and color images. It may be because of training the network with SSIM loss only. Incorporating mean squared error in the loss may result in better PSNR.

Qualitative Evaluation
For qualitative evaluation, we have shown the results of different networks applied on a large number of images to nine evaluators (research fellows in our lab) and asked them to assign ranks to the output. Order of average rank is similar to quantitative evaluation. That is CNN comes out to be the best closely followed by opening-closing network. Results of opening network and closing network are comparable. Here we display results for only 3 grayscale images in Fig. 3 and 3 color images in Fig. 6. In both figures, we compare the output of each network with ground truth. In grayscale images (Fig. 3) it may be noted that though the opening-closing network did remove raindrops successfully, blobs of uniform intensity are present. This effect is a bit more severe in case of opening network and closing network (especially in the top row images). In the case of color images, as seen in Fig. 6, the opening-closing network can effectively de-rain and also reproduces the vivid colors with desired contrast. The output of the closing network has less contrast than the output of the opening network, whereas the opening-closing Net is able to address the issue. Secondly, the blob effect, as seen in grayscale images, are not visible here.
The results shown above are on images from benchmark rain dataset [14], where rain images are synthesized from clean images; hence groundtruth for images are available. We also show few results on real rain images in Fig. 7 for which the groundtruth are not available. The opening-closing network is able to clear the rain perfectly, whereas the output of CNN(U-Net) has surprisingly few small rain streaks in some images (see bottom row of the figure). This may be due to presence of large number of parameters in CNN (U-net) leading to over-fitting.

Results of Image De-Hazing
In this subsection, we evaluate the performance of the opening-closing network quantitatively and qualitatively with respect to the de-hazing problem. De-hazing problem is relatively more widely attempted. So    [47]. DCP method is based on the observation that in hazy images intensity of color channels is uniformly contributed by the air-light. In CAP method, they have approximated the haze by the difference of brightness and saturation. Recent deep learning based methods like MS-CNN and DehazeNet predict transmittance map using the network. AOD-Net reformulates the haze equation using a single parameter and estimates it using CNN based network. DCPDN algorithm incorporates the concept of image pyramid inside the network.

Quantitative Evaluation
In De-hazing, we quantitatively compare the performance of our opening-closing network with other state-ofthe-art methods. We have trained the opening-closing network using the O-HAZE training dataset and tested the network on O-HAZE validation dataset. The result is reported in Table 4. The PSNR and SSIM values of the results of different algorithms for each of the 5 images of validation dataset are shown in the table. It is seen that the output of the opening-closing network gives the best results in terms of SSIM in most of the images. However, in terms of PSNR, it ranked 2nd best in the table next to that of [11]. We have also trained the network using NYU portion of D-Hazy dataset and tested it on fattal [11] and Middlebury portion of D-Hazy dataset. We report these results in Table 5 and Table 6, respectively. From Table 5 it is revealed that the performance of the opening-closing network on Fattal dataset is not as good as that on O-HAZE dataset (Table 4). Probably it happens because NYU portion of D-Hazy dataset has only white airlight, while in Fattal dataset the airlight may not be white. However, in Table 6, we see that the performance of our morphological network is comparable to that of state-of-the-art algorithms on Middlebury portion of D-Hazy dataset.

Qualitative Evaluation
For qualitative evaluation of the de-hazed image, we have used the same strategy as has been used for derained images. However, comparing the quality of de-hazed images is so difficult that grading is often inconsistent. We show the output of the opening-closing network on only three challenging images from O-HAZE   dataset along with groundtruth. The result of our network (fourth column) along with input image (first column) and the groundtruth (fifth column) are shown in Fig. 8. The second and third columns show estimated transmittance map and airlight, respectively. The average subjective quality over results of our method on this dataset given by the evaluators is 'reasonably good' and tends to be of 'low brightness'. It should be noted that most of the state-of-the-art methods assume one of the two parameters (i.e., transmittance and airlight) is either known or can be determined heuristically, and the other (commonly transmittance) is estimated. On the other hand, the opening-closing network estimates both the parameters simultaneously from the input image and de-haze the same by applying the appropriate transformations. Apart from the known benchmark datasets, we have also applied our morphological network on some real-world hazy images popularly used by the research community. Results on three images are shown in Fig. 9. We observe that our network is able to effectively de-haze the images and manages to reproduce the colors faithfully. However, in some instances, it was unable to reproduce lighter shades of some color. This can be attributed to the fact that we haven't used a highly optimized network for de-hazing images. In the real scene, the amount of haze is increased when depth is increased. The depth effect can be clearly seen in the transmittance map (second column (b)). Although the opening-closing network is able to de-haze the image, sometimes it fails and cannot obtain the haze-free image correctly. In Fig. 10 we have reported a few failure cases of opening-closing network. In the first row and last row in Fig. 10, it can be seen that the output image is darker. This may be because of the overestimation of the transmittance map. In the 2nd row, the estimated haze-free image has its table color green from white. From the estimated airlight and transmittance map, we can see that it considers the white table as haze. and airlight (c) maps are shown along with output (d). The network is able to remove haze considerably but fails to preserve the color in the image(2nd row). Whereas in 1 st and 3 rd it removes the white haze but at the same time it makes the image darker.

Conclusions
In this paper, we have renewed the concept of learning morphological structuring elements (SEs) and shown that the SE of dilation and erosion can be learned. We have built a network that creates the effect of opening and closing with dilation and erosion layers placed sequentially in two different parallel paths and then by taking a linear combination of their outputs. To establish the efficacy of the proposed network, we have exploited it on two classes of image restoration problems, namely de-raining and de-hazing. Like CNN training methods, the proposed network learns SEs and various weights as parameters that lead to desired morphological network in terms of both operations and SEs. Once learned, the network then can estimate degrading parameters from the test input image and, subsequently, restore it. Although we get comparable results with state-of-the-art methods, we believe improving loss function can give better results. Second, to minimize the loss function, the structuring elements are learned by back-propagation algorithm, which is done by exploiting subgradient method as dilation/erosion operators are defined in terms of piece-wise differentiable max/min operators. Employing soft max/min may lead to fully differentiable loss function and may thereby simplify the back-propagation.