Moving shadow detection based on stationary wavelet transform

Many surveillance and forensic applications face problems in identifying shadows and their removal. The moving shadow points overlap with the moving objects in a video sequence leading to misclassification of the exact object. This article presents a novel method for identifying and removing moving shadows using stationary wavelet transform (SWT) based on a threshold determined by wavelet coefficients. The multi-resolution property of the stationary wavelet transform leads to the decomposition of the frames into four different bands without the loss of spatial information. The conventional discrete wavelet transform (DWT), which has the same property, suffers from the problem of shift invariance due to the decimation operation leading to a shift in the original signal during reconstruction. Since SWT does not have the decimation operation, the problem of shift invariance is solved which makes it feasible for change detection, pattern recognition and feature extraction and retrieves the original signal without the loss of phase information also. For detection and removal of shadow, a new threshold in the form of a variant statistical parameter—“skewness”—is proposed. The value of threshold is determined through the wavelet coefficients without the requirement of any supervised learning or manual calibration. Normally, the statistical parameters like mean, variance and standard deviation does not show much variation in complex environments. Skewness shows a unique variation between the shadow and non-shadow pixels in various environments than the previously used thresholds—standard deviation and relative standard deviation. The experimental results prove that the proposed method works better than other state-of-art-methods.


Introduction
Visualisation applications are shaped mainly to acquire moving points in a video sequence. They are not given enough knowledge about the moving shadows which are always next to the moving pixels. Serious flaws like merging of more than one object, incorrect object shape, deformation of its contour, and cues may happen if moving shadows are not removed from the target object. Proper induction of the moving object segmentation algorithms may help the video surveillance applications, object tracking applications etc., to perform well up to the customer satisfaction.
Sanin et al. [1] classified the moving shadow detection methods under various categories. Readily available cues which make a distinction between the target object and its shadow along with the place where it is cast (background) are discussed to get a clear understanding of the shadow detection concept.
To analyse the shadow detection algorithm, we can use the shadow detection rate, shadow discrimination rate, and shadow localisation [2]. The role of all the shadow detection and removal is to locate shadow regions which separate shadow from foreground and remove it from foreground object. Various shadow detection algorithms are available for identifying shadows in still images. They cannot be used for moving shadows where the intension is to not only identify shadows but also the change detection between the frames. Many algorithms use only the time-variant information to analyse the moving shadow pixels. Sanin et al. [1] discussed few shadow detection algorithms which consider the physical and geometrical properties of the shadows to identify them. They are categorised under model-based techniques. Many of the shadow detection algorithms consider the colour-based methods [2][3][4][5][6][7][8] and texture-based [4][5][6] methods to differentiate shadows from the foreground objects. They are categorised under property-based techniques.
The red-green-blue (RGB) values, hue and saturation values in hue-saturation-value (HSV) colour space and grey-level value of a shadow are inferior to the values in the corresponding background pixel [9,10]. The variation between those values, of shadow, and background shows a gradual growth between adjacent pixels [11]. There is always slight difference in the intensity values between the neighbouring pixels of the background and also they are lower than the shadow pixels. Also, there is a universal acceptance that the RGB values of a shadow are always lower than the background in the respective pixels. This property is holded even when they are transformed to other colour spaces like HSV, hue, saturation, and intensity (HSI) etc. For example, the hue and saturation component of the shadow pixels transformed from RGB to HSV colour space have lower values than that of the corresponding background pixels [1]. With respect to object and background, grey-level of shadows are lower.
The shadow and the background have similar texture. Texturerich object has a texture-less shadow [9,11]. The entropy value or the Gaussian filter derivatives are helpful in extracting the texture of the image segments [12]. There is a difference in the illumination of shadow and background which also helps to differentiate shadow pixels. Lesser boundary when compared with background and not many interior edges when compared with objects are used to identify shadow pixels. However, still, the shadows remain attached to their objects [13]. Even though object and shadow have the same movement, their position separates them [14]. Skewness is a variant statistical feature with respect to shadow and non-shadow areas for locating shadows, provided a proper edge detection algorithm is used [12]. An invariant feature across shadow boundary is the image gradient.
For majority of the moving shadow detection methods, chromatic analysis is the first step. The hue intensity ratio [15] is used to categorise shadow and non-shadow pixels. Also c1c2c3 colour space is used to distinguish shadow areas by introducing a variation in the colour space transformation. However, still if the RGB values remain same, the variation in the transformation formula will not show any difference. Colour space-based methods [16] such as HSI, HSV, YIQ, and YCbCr analyse the intensity and colour properties of shadows in aerial images and calculate a hue intensity ratio for each pixel. The shadow regions are extracted based on a threshold, which is not clearly classifying the dark blue and dark green surfaces. To improve the shadow detection results, a consecutive thresholding [17] can be done. Based on the lighting conditions, an invariant colour model [18] which identifies the shadow can be used, but this may not work for all types of images. HSI colour space along with the colour attenuation relationship [19] is analysed to detect the shadow in colour aerial images. The pixels having similar RGB values are identified, but those pixels are simply added to either shadow or non-shadow pixels by considering their neighbours alone. The features of the hue singular pixels are not analysed.
Guan [3] explored the properties of the HSV colour model for shadow detection and removal using dyadic multi-scale wavelet transform. The standard deviation of the wavelet coefficients from the value component helps to identify the shadow pixels from the foreground pixels. Since the wavelet transforms decompose the input into high-and low-frequency values, the threshold values get updated automatically. Combining the saturation component with the value component, the threshold identifies the region of the shadows to be removed. Khare et al. [20] also used relative standard deviation as a threshold to detect the shadow regions in the wavelet domain. The algorithms discussed so far still have the inefficiency of accurately detecting and removing the shadows. Since the wavelet transforms analyse the image in frequency domain, we are able to extract an automatic threshold from its coefficients avoiding the manual intervention.
In the proposed approach, we have also used the HSV colour space as it corresponds closely to the human perception of colour and separates chromaticity and luminosity easily. We also used variant statistical parameter 'skewness' of the wavelet coefficients to detect and remove the shadows. In the rest of the paper, Section 2 describes the feature decomposition using stationary wavelet transform (SWT) and Zernike moments (ZM); Section 3 explains the proposed approach. Section 4 explains the experimental results and comparisons of the proposed method with other state-of-the-art methods. Finally, conclusions of the work are given in Section 5.

Feature decomposition
In this section, we provide the details of the techniques used to decompose the features of the moving object and shadow used in our proposed method. The SWT is used to decompose the frames into approximation, horizontal, vertical, and diagonal coefficients. The ZM have the nature of reducing the duplicates from these coefficients.

Stationary wavelet transform
In this section, the discussion about SWT and the reasons for using it for detection and removal of shadow from moving objects is done. Guan [3] and Khare et al. [20] used discrete wavelet transform (DWT), which lacks in-phase information and translation-invariance creating problems in reconstruction of the image. So, we propose a method which uses SWT.
Fourier transform (FT) analyses a signal by decomposing it into constituent sinusoids of different frequencies. It has a major drawback of losing the temporal information. However, cooperation of short-time Fourier transform with a predefined window helps to retrieve the temporal information along with their frequencies.
DWT has an advantage over the FT in terms of localisation in frequency domain as well as in spatial domain. The DWT can be applied on a discrete signal containing N samples. The signal is decomposed into low-frequency band (L) using low-pass filter and high-frequency band (H) using high-pass filter. Each band is subsampled by a factor of two. In the case of two-dimensional signal (image), each row of an image is filtered by a low-pass filter and high-pass filter. However, the decimation operation of DWT makes it a shifted version, but not equivalent to the original signal.
SWT solves this problem of shift invariance [21]. SWT differs from conventional DWT in terms of decimation and shift invariance, which makes it feasible for change detection, pattern recognition, and feature extraction. In SWT, the input signal is convolved with low-pass filter 'l' and high-pass filter 'h' in a similar manner as in DWT, but no decimation is performed to obtain wavelet coefficients of different subbands. As there is no decimation involved in SWT, the number of coefficients generated is as twice as that of the samples in the input signal that helps for a better reconstruction of the given image.

Zernike moments
ZM introduced by Teague [22] reduces the information redundancy present in geometric moments. ZM characterise the properties of an image by removing the redundancy and overlapping information between the moments [23]. Owing to these characteristics, ZM have been utilised as feature set in different applications such as object classification, shape analysis, content-based image retrieval etc.
ii. ZM are robust to noise and minor variation in shape.
iii. The orthogonal property of ZM makes it to have minimum information redundancy. iv. The lower order ZM represents the global shape pattern and their details are present in the higher order ZM. v. ZM represent an image in a better way than any other types of moments.
ZM is a set of complex polynomials which form a complete orthogonal set over the interior of the unit circle of x 2 + y 2 ≤ 1 [27,28]. These polynomials are of the form where m is positive integer and n the integer subject to constraints m − n even and n < m, r the length of vector from the origin to pixel (x,y), and θ the angle between the vector r and x-axis in the counter-clockwise direction, and R mn r the Zernike radial polynomials in r, θ polar coordinates and defined as The above-mentioned polynomial in (2) is orthogonal and satisfies the orthogonality principle. ZM are the projection of image function I(x,y) onto these orthogonal basis function. The orthogonality condition simplifies the representation of the original image because generated moments are independent [29].
The ZM of order m with repetition n for a continuous image function I(x,y) that vanishes outside the unit circle is In the case of digital image, the integrals are replaced by summation, given as follows: Both SWT and ZM have common properties such as: shift invariance [21], translation invariant [25], rotation invariant [26] etc. A combination of these two features in one methodology will help us to produce an accurate shadow detection results in comparison to use of any one of them.

Proposed approach
This section first gives a detailed justification of why the threshold -'skewness' is used in our proposed method. The histogram of images exhibits the existing thresholds like mean, variance, and standard deviation along with the skewness threshold. Then it explains the flow of the proposed method in detail.

Threshold selection
An optimal threshold has to be determined for the accurate shadow detection process. Guan [3] used standard deviation σ as threshold, Khare et al. [20] where x is one of the pixel value, μ the sample mean, and σ the standard deviation of the n values. Skewness is a measure of the degree of asymmetry of a distribution. The skewness value can be positive or negative, or even undefined [30]. A negative skewness indicates that the tail on the left side of the probability density function is longer than the right side and the bulk of the values lie to the right of the mean. A positive skewness indicates that the tail on the right side is longer than the left side and the bulk of the values lie to the left of the mean as shown in Fig. 1.
The motivation for selecting skewness as one of the distinguishing features to represent shadow is, it always shows a variation from the skewness of the creating object and the casted background [31].
For an image having homogeneous reflective surface, the skewness of the luminance histogram and subband histograms are correlated with the shadows regions present in it [30]. In (5), μ is the average luminance of the object and σ is the standard deviation of the luminance of the object along with its neighbours in wavelet domain. The acceleration of x − μ with the cubic value enhances the edges around the object present in a scene; the summation and normalisation make the edges of shadows blurry since they are soft and fasten the process of shadow detection.
So, the skewness of the wavelet coefficients has the boundary information of the objects which helps us to clearly segregate them from the shadows. Also, the existing thresholds may not show any difference between object and shadow for those pixels whose values of saturation and hue are undefined. The mean μ and standard deviation σ of such patterns as shown in Fig. 2 are identical leading to misclassification. They differ only in the sign of skewness because of the cubic acceleration and normalisation of the luminance value of the object.
Therefore, skewness is selected as a stable threshold in our work for classification of moving shadow pixels from the objects. Though the thresholds σ and σ/ μ are able to detect and remove shadow, their performance degrades when the foreground object and the background share the same or dark colour.
In the proposed method, we have applied the threshold on stationary wavelet coefficients of value component of HSV to detect moving object with shadow. For shadow removal, we have applied logical AND on thresholded stationary wavelet coefficients of value component and those of saturation component. Instead of manual calibration of threshold which needs some predefined parameters, we propose a variant statistical parameter, i.e. skewness as a new threshold for detection and removal of shadow in SWT domain. Guan [3] and Khare et al. [20] applied the thresholds to the discrete wavelet coefficients (DWT) that losses the phase information. The loss of phase information in DWT and the difficulty faced when there are similar patterns of input frames are rectified by applying SWT and skewness as a threshold, respectively.

Proposed algorithm
The proposed approach considers a reference frame as the background model from the video sequence and the consecutive frames are processed one by one to detect the foreground and shadow pixels. The reference frame and the current frame are converted from RGB colour space to HSV colour space. The absolute difference of the two frames is taken with respect to hue, saturation, and value component. They are represented as ΔH, ΔV, and ΔS, respectively. The SWT is applied to the absolute difference components of value (ΔV) and saturation (ΔS) to get the wavelet  coefficients denoted as W ΔV and W ΔS . Then for reducing the redundancy, ZM are applied to the wavelet coefficients. Also, the variant statistical parameter 'skewness' is calculated for the wavelet coefficients for the value and saturation component denoted as (Skew) WΔV and (Skew) WΔS, respectively. They act as an automatic threshold to classify the moving pixels as foreground or shadow pixels.
The pixels having a greater value of its wavelet coefficients of value component W ΔV than the automatic threshold 'variant statistical value -skewness' of the value component (Skew) WΔV are categorised as the moving pixels which include the foreground and shadow pixels. Equation (6) represents this situation Also, to remove the shadow pixels, we consider the automatic threshold 'variant statistical value -skewness' of the saturation component (Skew) WΔS . Equation (7) represents this situation clearly. The block diagram of the proposed method is shown in Fig. 3 Finally, the reconstruction of the shadow detected image and shadow removed image is done by the inverse wavelet transform. Binary closing morphological operations are applied to smoothen the reconstructed images.

Experimental results and discussion
To make our performance effective and systematic, an extensive result on several well-known benchmarks for which ground truth data was available is presented. The chosen benchmarks consist of indoor and outdoor scenes from UCSD CVRR Laboratory, CAVIAR Test Scenarios, Institute for Infocomm Research (I2R), and Wang et al. [3] created change detection benchmark data sets. A regional language movie's song of our country named 'Sarvam' is also used to test the proposed method with the other existing methods. Specifically, Intelligent Room, Hallway, Lobby Hall, Bus Station, and Cubic are typical indoor environments while Highway I and IV, Campus, Bungalow, People In Shade are examples of an outdoor scenario of roadways. All the video sequences selected for evaluation has >1000 frames and the details of the objects and shadows are given in Table 1. The visualisation of the proposed method for only few frames of some video sequences has been shown in Figs. 4 and 5.
The performance of the proposed method against previously used thresholds is discussed. Table 2 shows the proposed threshold values, i.e. skewness of the wavelet coefficients (vertical,  sequence type  outdoor  outdoor  indoor  indoor  outdoor  outdoor outdoor/indoor indoor  outdoor  shadow strength medium  high  low  very low  high  low  medium  low  high  shadow size  large  small/large  large  small  small/large  small  small/large  large  medium  object class  vehicle  people  people  people  vehicle  vehicle  people  people  people  object size  large  medium  medium  small  large  small  medium  medium  medium  object speed  medium  high  low  low  high  high  low  medium  low  noise level  medium  low  medium  low  low  medium high medium high horizontal, and diagonal) for various video sequences. The exact threshold value of each video calculated from the ground truth images is also shown along sequence. The previously used thresholds standard deviation and relative standard deviation is also shown in the table. In order to analyse the proposed method objectively and quantitatively, the shadow detection rate η and shadow discrimination rate ε [17] are considered In (8), the subscript symbol S stands for shadow and F for foreground, where TP S and TP F , respectively, represent the numbers of shadow pixels and foreground pixels correctly recognised. FN S and FN F , respectively, represent the numbers of shadow pixels and foreground pixels falsely recognised. In addition, comparison with several existing methods to prove the superiority from the aspects of quality and quantity of our method, including Sanin et al. [9], Cucchiara et al. [8], Guan [3], and Khare et al. [20] Tables 3 and 4 using their shadow detection rate (η) and shadow discrimination rate (ε) for different videos. The computation time per frame of SWT coefficients and SWT coefficients redundancy reduced with ZM along with other state-of-art-methods is given in Table 5. The results prove that the proposed method outperforms the existing methods and can be applied to various outdoor and indoor environments having a uniformly changing background. Nowadays, in-vehicle camera surveillance is playing a major role in monitoring the driver's attitude for which the proposed method can be used. The proposed approach can be enriched with the reference background image updation through one of the learning approaches to apply it for various moving camera applications. The performance of the proposed method shows degradation when there is object movement in shades or limited lighting conditions. The HSV conversion of such frames creates many hue singular pixels, i.e. pixels having similar RGB value or nearing zero value. They cannot show any decision when compared with the threshold leading to the misclassification of foreground pixels as shadows or vice-versa. Fig. 6 shows the application of the proposed method to one such video sequence called People In Shade. The shadow detection rate (η) and discrimination rate (ε) of this video are less when compared with the other video sequences.

Conclusion
In this article, an SWT and ZM-based shadow detection method is proposed. A new threshold based on the variant statistical feature, calculated with the help of the wavelet coefficients, is used to   classify the moving objects and shadows. When compared with DWT, the shift-invariance and multi-resolution property of SWT supports the reconstruction of the image from the subbands without the loss of phase information. The redundancy information is reduced by applying ZM to the wavelet coefficients. Also, the determined threshold from the wavelet coefficients helps in a unique discrimination and removal of the shadows. The promising results of the proposed method highlight the advantages of both the SWT and the variant statistical threshold. The performance of the method degrades with high speed of the object and non-stationary background; therefore, this work can be extended for the nonstationary background for the future work.