Adaptive Histogram Equalization Based Image Forensics Using Statistics of DC DCT Coefficients

The vulnerability of digital images is growing towards manipulation. This motivated an area of research to deal with digital image forgeries. The certifying origin and content of digital images is an open problem in the multimedia world. One of the ways to find the truth of images is finding the presence of any type of contrast enhancement. In this work, novel and simple machine learning tool is proposed to detect the presence of histogram equalization using statistical parameters of DC Discrete Cosine Transform (DCT) coefficients. The statistical parameters of the Gaussian Mixture Model (GMM) fitted to DC DCT coefficients are used as features for classifying original and histogram equalized images. An SVM classifier has been developed to classify original and histogram equalized image which can detect histogram equalized image with accuracy greater than 95 % when false rate is less than 5 %.


Introduction
Digital images are generally used to establish the occurrence of some man-made or natural incidents.These days images are one of the biggest and largest means of communication through different media and often required authenticity test.The availability of a large number of software products in multimedia devices is making creation and manipulation of digital images very cheap and convenient.Digital images can be manipulated by tampering source of information and/or the content of information.Research area which cov-ers finding origin and authenticity of an image is known as Digital Image Forensics (DIF) [1], [2] and [3].The forensics of source involves finding camera or source from which an image is generated or captured, and the forensics of authentication involves finding traces of manipulation in an image.To create invincible tampered images, sampling, interpolation, contrast enhancement, median filtering, addition of noise and saving in lossy compression format are some examples of the commonly involved techniques.These methods alone or chain of these methods are not tampering techniques, but they are required in order to hide any visual traces of the tampering.Therefore, detection of the presence of any such operation leads to the detail investigation of an image.
In literature of contrast enhancement-based DIF, mainly two types of classical contrast enhancements, such as power-law transformation and Global Histogram Equalization (GHE), are considered [4], [5], [6], [7], [8], [9] and [10].In our work, we have focused on detection of histogram equalization operation.The GHE increases the global contrast of an image by equally distributing intensity levels.But these days, Adaptive Histogram Equalization (AHE) [11] and [12] is getting popular due to its performance in comparison to Global Histogram Equalization (GHE).In AHE, an image is divided into tiles and thereafter, HE is applied to each tile which leads to enhancement of smaller details by increasing local contrast of the image.The equalization process is followed by bilinear interpolation to smooth out the boundaries due to the tiling of images.Although, this provides better enhancement than global counterpart, enhancement of noise content in homogeneous regions of background is a major concern.Generally, various versions of AHE are available to suppress the enhanced noise [11].The most popular and acceptable variation of AHE is contrast limited AHE (CLAHE) [12], [13], [14], [15] and [16].The   ) for image histograms in HE for CLAHE are considered based on applications.In addition to natural images [16] and [17], CLAHE also found its application in enhancement of medical images [13] and [14] and underwater images [15].
Since the use of CLAHE is spreading, the tools to detect the presence of AHE are also required.In this paper, the statistical characterization of block DC coefficient of 2D 8 × 8 DCT coefficients is employed to classify original and HE images.HE images include GHE, AHE and CLAHE images.The block DC coefficient of 2D 8 × 8 DCT coefficients is defined as an array made up of all DC coefficients from 8 × 8 block of DCT coefficients.In image forensics, the DCT coefficient analysis is employed to detect tampering in JPEG compressed images [18] and [19].Further, the statistical characterization of DCT coefficients has been popular for improvements in encoder/decoder for JPEG standard [20], [21], [22] and [23].It may be noted that statistical characterization of block variance and AC DCT coefficients are recently employed to detect global contrast enhancement (or power-law transformation) [24].
In this work, a tool is developed using statistical parameters obtained from fitting of block DC coefficient by Gaussian Mixture Model (GMM).The estimated parameters with other statistical parameters are applied to train a 10-fold cross-validation Support Vector Machine (SVM) [25].The trained SVM classifier can be used to classify original and HE images.The proposed tool is independent of T and can detect the presence of HE for a large set of CL, all considered pdfs (uniform distribution, exponential distribution, Rayleigh distribution ) for image histograms and range of gray levels ("full" and "original") used for equalization.By means of Receiver operating curves (ROC), the efficacy of the proposed tool for the classification of original and HE images is shown.Additionally, the performance of the tool in detection of CLAHE images for different CLs, T s, image histogram equalization pdfs, and range of gray levels used in equalization is also discussed.
The paper is divided into four sections.Section 2. describes the statistical characterization of block DC DCT coefficients by GMM.The HE detection system and its detection algorithm with ROC curves of developed classifiers are explained in Sec. 3. Section 4. concludes the paper.

Statistical Characterization of DC DCT Coefficients
The block DC DCT coefficients obtained by employing type -II DCT are characterized using GMM in order to classify original and HE images.

Gaussian Mixture Model
A GMM is a pdf which is the weighted sum of densities of Gaussian components defined as where x is D-dimensional continuous-valued data vector, w i for i = 1, 2, 3, ..., K are the weights of mixture and K is the number of components.In Eq. ( 1), g is a D-dimensional Gaussian density which can be expressed as where X = (x − µ i ), µ i is mean vector, and i is covariance matrix.It may be observed from Eq. ( 1), the sum of mixture weights (w i ) for all components is equals to 1.The parameters of GMM are: mean vector, covariance matrix, and mixture weights.These parameters can be collected to form a feature set (θ) given by, The model settings, like the number of components, type of covariance matrix, and sharing of parameters, depends on the size of available data for estimation of parameters and application.In our work, the diagonal covariance matrix for all components is used, and the parameters in Eq. (3) are estimated using Maximum Likelihood Estimation (MLE).

Statistical Characterization
The DC coefficients of images are calculated in grayscale domain by employing type -II 8 × 8 block DCT of JPEG standard.The type-II DCT is defined as where x, y and u, v represent spatial and frequency domain variables, respectively.The value of α u (u) and The DCT coefficient at u = 0, v = 0 is known as DC coefficient which is defined in type -II DCT as where I b 0,0 is the DC coefficient of an image block, b represents number of the block.It may be noted that I b 0,0 is eight times of the mean value of 8 × 8 block of an image.The block DC coefficient I 0,0 can be defined as I 0,0 = I 1 0,0 , I 2 0,0 , ..., I B 0,0 , ( where B is number of 8 × 8 non-overlapping blocks in the input image.The histograms of I 0,0 of original image, histogram equalized image, and adaptive histogram equalized image are characterized by GMM as shown in Fig. 1.The statistical characterization of I 0,0 of CLAHE images for different values of CL and T is shown in Fig. 2. The statistical characterization of I 0,0 of CLAHE images for different pdfs of histograms is shown in Fig. 3.The effect of the range of gray levels on characterization of I 0,0 is shown in Fig. 4. In the legend of Fig. 4, "full" represents the entire available range of gray levels (i.e., 256 for 8-bit image), and "original" represents the range of gray levels in original image.The characterizations shown in Fig. 1, Fig. 2, Fig. 3 and Fig. 4 are obtained with optimum number of components found using Akaike Information Criterion (AIC) [26].Further, to measure the efficacy of characterization, we have employed Jensen-Shannon (JS) divergence [27] and [28] and Kolmogorov-Smirnov (KS) statistic [29] and [28].The values of JS and KS in different cases of HE for a UCID [30] image are shown in legends of Fig. 1, Fig. 2, Fig. 3 and Fig. 4. A consistency in values of JS and KS concludes GMM as a good model of fit across original and different types of HE images.

Parameter Analysis of CLAHE
To find the optimum number of components (K) for GMM, we have calculated the mean JS and KS values with varying K (1, 3, 5, . . ., 10) for all 1338 images of UCID database.It is worth mentioning that KS hypothesis test is performed with α = 0.01.The efficacy of GMM for characterization of I 0,0 can be observed from Tab. 1, Tab. 2, Tab. 3, Tab. 4 and Tab. 5.

Effect of CL:
With an increase in the value of CL from 0.005 (large clipping) to 1 (no clipping), the shift in skewness from right to left of p(I 0,0 ) is observed (Fig. 2 and Fig. 3).Lower values of CL represent low contrast, and higher values of CL represent higher contrast.Additionally, a decrease in kurtosis among distributions can also be observed in Fig. 3.The effect of CL in characterization of p(I 0,0 ) is tabulated in Tab. 1, Tab. 2, Tab.

Effect of image HE pdfs:
As may be observed from Tab. 5, the maximum number of KS hypothesis tests is passed when the image histogram equalization pdf is a Rayleigh distribution.

Effect of range of gray levels in HE:
The mean values of JS and KS statistic are the smallest (Tab.5) when the chosen gray levels are in full range with uniform and exponential distribution.But in case of Rayleigh distribution, the chosen ranges of gray levels have no effect on the mean values of JS and KS statistics.
One can observe from Tab. 1, Tab. 2, Tab. 3, Tab. 4 and Tab. 5 that the optimum values of JS and KS statistic are obtained with K = 3 and 5.In our work, we have chosen K = 5 because the maximum number of KS hypothesis tests for original images is passed with K = 5.It is noteworthy that the mean values of JS and KS statistic are calculated after removing 5 % outliers from the original images database.

System Model for Analysis and Detection
The system model used for analysis and detection is shown in Fig. 5.The GHE, AHE and CLAHE operations are performed in spatial domain with 256 gray levels and saved in TIFF format.For CLAHE, the T are 8 × 8, 16 × 16, 32 × 32 and 64 × 64 and the CL are 0.005, 0.01, 0.03, 0.05, 0.07, 0.09, 0.1, 0.3.Additionally, the CLAHE images are also generated with exponentially and Rayleigh distributed histograms for all considered T s and CLs.For the range of gray levels in enhanced images, both "original" and "full" ranges are considered.For feature extraction (Fig. 5), the 2D 8 × 8 block DCT of the grayscale image is computed.
The DC DCT coefficient of each block is collected to generate I 0,0 (Eq.6).It is observed (Fig. 1, Fig. 2, Fig. 3, Fig. 4 and Fig. 5) that p(I 0,0 ) is varying with histogram equalization operations.Therefore, statistics of I 0,0 can be used as features for classification.
Construction of the feature set involves the calculation of 10 − f old cross-validation accuracies of SVM classifiers for respective feature set.The accuracies corresponding to different feature sets for classification between original and GHE, original and AHE, original and CLAHE (CL = 0.05, D = D1, D2, D3, T = 8 × 8, R1) images are shown in Tab. 6.

Detection Algorithm and Results
The proposed detection algorithm for classifying original and histogram equalized images is shown in Fig.The achieved TPR is > 95 % with FPR < 5 % in most of the cases.The CL = 0.005 represents less contrast.Thus, the results at CL = 0.005 fall to 90 % with exponential distribution.In case of GHE and AHE, detection accuracy is more than 99 % with false alarm less than 1 %, which is comparable to existing methods [6] and [9], as shown in Tab. 8.The detection results for AHE and CLAHE are not reported in [6] and [9].The proposed method is applicable with high efficacy for all types of histogram equalization operations.Tab.8: Accuracy (TPR, FPR) of proposed method and existing methods in HE based image Forensics.

Conclusion
The histogram equalization is a commonly used contrast enhancement technique.Its adaptive and contrast limited variant CLAHE is also spreading its footprint in application based image enhancements.
We have developed a novel method to detect the presence of three commonly found   with all variants of histogram equalization.This tool does not involve image histograms based methods for detection of contrast enhancement and involves DC DCT coefficients, which is least affected by JPEG compression after enhancement.

Tab. 1 :
3 and Tab. 4 for different values of T .Effect of T : The effect of the number of tiles T used in CLAHE is shown in Fig. 2. It may be observed from Fig. 2 that kurtosis of p(I 0,0 ) increases with increase of T .The Tab. 1, Tab. 2, Tab. 3 and Tab. 4 show variation of up to 2 % in mean JS and KS statistic values with varying T .Mean values of JS and KS and KS Hypothesis result for T = 8 × 8 (Org = Original).
(a) For analysis and detection.(b) Feature extraction block.

6 . 7 .
To test the efficacy of the proposed algorithm with important parameters of GHE, AHE and CLAHE, we have created different simulation environments.The used database is UCID (TIFF format) with 1338 images.The performance of proposed classifiers is shown in Fig. 7, Fig. 8, Fig. 9, Fig. 10 and Fig. 11 using 10 − f old cross-validation ROC curves.The images are passed through GHE and AHE operation, and for preparation of feature matrix for each classification, the original images are mixed with corresponding HE images.The obtained ROC curves for classification between original and each GHE and AHE are shown in Fig.The simulation environments used for CLAHE are tabulated in Tab. 7.

Fig. 8 :
Fig. 8: ROC curves: HE detection for enhancement with varying T for all CL.
CLAHE suppresses enhancement of noise by limiting the highest value in image histogram known as Clip Limit (CL).The CL can be defined as highest value allowed in a bin of histogram of an image tile.The density values which are greater than CL in a given image tile are redistributed.The CL and number of tiles (T ) are two important parameters of CLAHE.In literature, it is stated that different probability density functions (pdf)s (uniform, exponential, Rayleigh, etc. Mean values of JS and KS and KS Hypothesis result for T = 16 × 16.Mean values of JS and KS and KS Hypothesis result for image HE pdfs (D 1 = uniform distribution, D 2 = exponential distribution and D 3 = Rayleigh distribution) and range of gray levels (R 1 = "full" and R 2 = "original").
Tab. 3: Mean values of JS and KS and KS Hypothesis result for T = 32 × 32.Tab.4: Mean values of JS and KS and KS Hypothesis result for T = 64 × 64.c 2018 ADVANCES IN ELECTRICAL AND ELECTRONIC ENGINEERING c 2018 ADVANCES IN ELECTRICAL AND ELECTRONIC ENGINEERING Tab.6: 10 − f old cross-validation accuracies of SVM classifiers for different feature sets.