An Adaptive Image Calibration Algorithm for Steganalysis

: In this paper, a new adaptive calibration algorithm for image steganalysis is proposed. Steganography disturbs the dependence between neighboring pixels and decreases the neighborhood node degree. Firstly, we analyzed the effect of steganography on the neighborhood node degree of cover images. Then, the calibratable pixels are marked by the analysis of neighborhood node degree. Finally, the strong correlation calibration image is constructed by revising the calibratable pixels. Experimental results reveal that compared with secondary steganography the image calibration method significantly increased the detection accuracy for LSB matching steganography on low embedding ratio. The proposed method also has a better performance against spatial steganography.

Zhang et al. [Zhang, Hu and Yuan (2009)] use the Envelope of Histogram to detect the LSB Matching. Xia et al. ] use the Neighbourhood Node Degree Histogram to detect the LSB Matching. Lerch-Hostalot et al. [Lerch-Hostalot and Megias (2013)] proposed the LSB matching steganalysis by patterns of pixel differences. Chen et al. [Chen, Gao, Liu et al. (2016)] use characteristic function moment of pixel differences to detect the LSB Matching. Qin et al. [Qin, Xiang and Wang (2010)] present a review of detection of LSB matching steganography. The difficulty of steganalysis is that there is no original image for comparative detection. If we can find a good method to construct a reference image similar to the original cover image, it will be very helpful for steganalysis. Fridrich [Fridrich (2004)] constructs "calibrated" image by using the cropping and recompression image. She believes that the cropped stego image is perceptually similar to the cover image and its DCT coefficients have approximately the same statistical properties as the cover image. Ker [Ker (2005)] construct the calibration image by using a downsampled image to improve the detection of LSB matching steganography. Holotyak et al. [Holotyak, Fridrich and Soukal (2005)] uses the wavelet denoising method to estimate the cover image from the stego image, and estimate the secret message length for ±k embedding steganography. These are the more common processing methods at present. However, wavelet denoising and the downsampled image methods do not specifically consider the special application of steganography, and the algorithm complexity is large, and the actual effect is general, and the method of resteganography is adopted [Yu and Babaguchi (2008); Xia, Sun and Qin (2009) ;Cancelli, Doerr, Cox et al. (2008)]. Although the Xia's method ] achieves better detection performance than the Ker's [Ker (2005)] and Yu's [Yu and Babaguchi (2008)] methods, the applicability of the detection algorithm is very limited. Therefore, a targeted steganography image calibration technique is needed. Recently, researchers have put forward some new improved LSB matching steganography [Soleimanpour-Moghadam and Nezamabadi-Pour (2016) ;Hiary, Sabri, Mohammed et al. (2016); Sahu and Swain (2018) ;Tan, Qin, Xiang et al. (2019);Liu, Wang, Zhang et al. (2014);Qin, Li, Xiang et al. (2019)]. Tan et al. ] proposed the channel coding can use to the steganophy. Soleimanpour et al. [Soleimanpour-Moghadam and Nezamabadi-Pour (2016)] proposed pair-wise LSB matching steganography and Hiary et al. [Hiary, Sabri, Mohammed et al. (2016)] proposed a hybrid steganography system. Sahu et al. [Sahu and Swain (2018)] proposed an improved LSB Matching by combining bit differencing. Xia et al. [Xia and Li (2017)] even proposed the coverless LSB information hiding. Xiang et al. [Xiang, Li, Hao et al. (2018)] use synonym substitution and arithmetic coding to achieve the natural language steganography. Li et al. [Li, Qin, Xiang et al. (2018)] proposed the image matching algorithm can use to the steganography. Liu et al. [Liu, Wang, Zhang et al. (2014)] proposed the feature selection method and Qin et al. [Qin, Li, Xiang et al. (2019)] proposed the improved Harris algorithm to extract features and used the BOW (Bag of Words) model to generate the feature vectors also can use to the steganophy. Therefore, steganalysis is a long-term and arduous task. In this paper, we propose an adaptive calibration method for image steganalysis. To begin with, we analyzed the effect of steganography on the neighborhood degree of the cover image. Next, the calibratable pixels are marked by the analysis of the transformation between the ordinary pixel and sensitive pixel. At last, the strong correlation calibration image is obtained by revising the calibratable pixels. The obtained calibration image is used to extract relevant features for steganalysis.

The proposed calibration approach 2.1 Calibration mechanism
Neighborhood degree: Let ( , ) p i j be the pixel value of the image at the location ( , ) i j . The neighborhood degree of the pixel ( , ) i j is defined as follows: where ∆ denotes the cardinality of the set ∆ , { } is the set， K K, K K m n − ≤ ≤ − ≤ ≤ , m and n cannot be zero at the same time. That is to say, only the K×K neighborhood is considered. The neighborhood degree ( , ) d i j indicates the number of neighboring pixels which pixel value equals the ( , ) p i j .

Definition 1: Sensitive pixel set
Let S be the set of sensitive pixels of the image. We define the pixel ( , ) i j as the sensitive pixel if its neighborhood degree ( , ) Definition 2: Ordinary pixel set Let Θ be the set of ordinary pixels of the image. We define the pixel ( , ) i j as the ordinary pixel if its neighborhood degree ( , ) d i j δ < , then the ordinary pixel set Θ is defined as:

Definition 3: Calibratable pixel set
Let  be the set of calibratable pixels of the image. We define the pixel ( , ) i j as the calibratable pixel if the following conditions are satisfied: Then the calibratable pixel set  is defined as: By definition, we can see that, if the pixel ( , ) i j ∈ Θ , then the correlation between the pixel ( , ) i j and the surrounding pixels is weak; if the pixel ( , ) S i j ∈ , then the correlation between the pixel ( , ) i j and adjacent pixels is very strong. The minor steganography changes of sensitive pixels will be reflected on the neighborhood degree. The sensitive pixels most likely become ordinary pixels after steganography. Therefore in the process of calibration, we may consider the transformation between the ordinary pixel and sensitive pixel. Neighborhood degree value for calibration ordinary pixels within a subtle change is regarded as due to intensity changes induced by the natural image. For the embedding operation of spatial steganography, the general steganography algorithm is to modify the last bit planes of the image, so the search range of calibration can be reduced appropriately K ± , instead of searching all possible pixel values. This can reduce the amount of search and reduce the time complexity of the algorithm. After determining the search range, the calibratable pixel set  can be determined by the calibratable pixels. Then, the calibration image can be constructed by the adaptive image calibration algorithm. The adaptive image calibration algorithm is as follows: Algorithm: Adaptive Image Calibration (2) and Eq. 3

5:
Search the neighborhood pixels of ( , ) i j by pixel value ( , ) p i j K ± 6: if ( , ) i j ∈  by Eq.(6) then 7: Record the pixel ( , ) i j and the ( , ) ' p i j which make ( , ) i j be calibratable pixel 8: ( , ) ( , )' p i j p i j = 9: end if 10: end for 11: end for The algorithm judge whether the pixel is a calibratable pixel or not, find all pixels which may be modified and modify it to be the pixel which makes the neighborhood increases. The calibration algorithm does not use the complex frequency domain transform. Therefore, its computational speed is very fast, the algorithm's time complexity is 2 ( ) O t , and the space complexity is also 2 ( ) O n . The calibration algorithm is more practical than the other.

Analysis of the calibration
To observe the calibration results, we test the Lena image of 100% LSB matching steganography embedding, the peak signal-to-noise rate (PSNR) of LSB matching steganography image is 43.62, and the PSNR of the calibrated image is 43.97, so the PSNR of calibration stego image by our calibration algorithm is close to the original image. The original image is modified by steganography, the neighborhood degree of the image is reduced, so many calibratable points may be generated during the calibration process, and the normal original image calibratable points are relatively few. We know steganography disturbs the dependence between neighboring pixels and decreases the neighborhood degree. So after calibration, a strong dependence "cover image" can be obtained.

Effects of the calibration ratio for cover image and stego image
The actual test is carried out by two image libraries NRCS and FreeFOTO, in which NRCS contains 3,162 uncompressed images, and FreeFOTO library contains 10,408 compressed images. In the experiments, we found that the calibration ratio changed greatly for original and stego image. The average statistics value of the calibration ratio for original and stego image has been calculated the proportion of the total pixels. It is shown in Fig. 1.  Fig. 1, we know that the ratio of the calibratable pixels for stego image is larger than the cover image, especially for compression images. It is a good feature for steganalysis.

Effects of the sum of the calibration histogram difference
As is known to all, the histogram is an effective and commonly used statistical feature for steganalysis. Because steganography has a smooth effect on the histogram of the image, in the process of steganographic image calibration, we record the pixel values modified by the calibration algorithm and calculate the difference between these pixel values and the surrounding pixel values. We find that the difference in the cover image is greater than the difference of the stego image. This also is a good steganalysis feature The sum of calibration histogram difference is shown in Fig. 2.
C h x , R , we calculate them once using 3×3 and 5×5 neighborhood respectively once by, so that a total of 9 features are used for steganalysis. . After the LSB matching embedding, the neighborhood degree is reduced. Therefore, there are ( ( )) C h x be the COM of NDH after embedding and denote the alteration rate of NDH COM as Due to LSB matching steganography, the stego image's alteration rate is greater than the cover image's thus s c R R

> . Now the three features ( ( ))
C h x , ' ( ( )) C h x and R are calculated through Eq. (7) and Eq. (8). Additionally, compute these features twice using 3×3 and 5×5 neighborhood respectively for a given image. The sum of the neighborhood degree of image pixels is defined, According to the calibration algorithm, the SumD of an image reduces after LSB matching, i.e., According to our observation, CR and DCH are two effective features that can be added to the feature vector. Finally, the feature vector composed of eight features are constructed for classification.

Classifier
Because of the good classification performance of support vector machine, we choose it with the non-linear kernel (RBF) as the classifier in our experiments. Before training with classifiers, we normalize the features. For the feature, we calculate its maximum value and minimum value for training images. For any training image and test image, the feature i F is extracted and scaled as  min max min where max i F represents the maximum value and min i F is the minimum value in i F , respectively.

Image data sets
The accuracy of steganalysis varies greatly from different image sources, so in our experiment, we use two image data sets to test the performance of the proposed algorithm and compare the performance with other methods.
In the experiment, we use two image sets with uncompressed and compressed images respectively. NRCS Set: 3,162 high-resolution TIFF images are downloaded from http://photogallery.nrcs.usda.gov; all the images are uncompressed with size 2100×1500 or 1500×2100. For testing, we resample the images to 640×418 and convert it to grayscale. FreeFOTO Set: 10,408 JPEG images are downloaded from http://www.freefoto.com. All the images are compressed with quality factor 75 and with size 600×400 or 400×600. For testing, we also convert this image into grayscale before use. All of the above images were utilized as covers to generate stego images with LSB replacement, 2LSB replacement, LSB matching, BPCS steganography [Spaulding, Noda and Shirazi (2002)] and so on. The message lengths take 100%, 75%, 50% and 25% of the maximal embedding length (i.e., one bit per pixel). Therefore, for every Steganography, NRCS Set consists of 3,162×(1+4)=15,810 cover and stego images, and FreeFOTO Set consists of 10,408×(1+4)=52,040 images.

Training and testing image sets
Each image set above was divided into two parts: training and testing sets, to train and test the classifiers. The training image sets and testing image sets are composed of the 40% cover images and corresponding stego images randomly selected from the image data sets. For NRCS Set, the training set contains 1,264 cover images and corresponding 5,056 stego images. Among the stego images, images of four embedding rates 100%, 75%, 50%, and 25%, are included. The test image set includes 1,898 cover images and 7,592 stego images with four embedding rates. Similarly, for FreeFOTO Set, the training image set is composed of 4,163 cover images and corresponding 16,652 stego images. The test image set is made up of 4,163×(1+4)=20,815 images.

Detection performance
The Receiver Operation Characteristic (ROC) curve is selected to show the detection probability based on the false positive probability.
To evaluate the detection effect of the ROC curve, the AUC (Area Under the ROC Curve) [Qin, Xiang and Wang (2010)] is defined as follows: where D P is the probability of detection, FP P is the probability of false positive.
Detection performances are evaluated by 'detection reliability' ρ defined as [Fridrich (2004)] where A is the area under the receiver operating characteristic (AUC) curve. In this paper, the ROC curve is represented by plotting true detection probability versus false alarm probability.

Detection results
In this section, two groups of experiments are compared. The first group is to compare the proposed adaptive calibration method with the second steganography calibration method. The other group is to compare the proposed method with other spatial steganography algorithms. LSB matching is used to generate images with different embedding rates in different image libraries, which is used to test the effect of LSB matching steganalysis with the proposed adaptive calibration and secondary steganographic calibration methods. The AUC for the uncompressed images in NRCS and compressed images in FreeFOTO are shown in Fig. 3(a) and Fig. 3(b), where the four different abscissa points from left to right represent the message embedding rates of 100%, 75%, 50% and 25% with LSB matching, respectively. Compared with the proposed adaptive calibration with the secondary steganography from Fig. 4 and Fig. 6, the experimental results show that the adaptive calibration algorithm significantly increased the detection results for LSB matching steganography with low embedding ratio in both compressed and uncompressed image sets. As can be seen from Fig. 6, the adaptive calibration algorithm can effectively detect the steganography in the spatial domain. Whether the steganography based on bit plane or visual characteristics, the detection efficiency for LSB matching is the worst in the uncompressed image database, which also shows that the LSB matching is better than other steganography from the side. Although the 2LSB steganography avoids the histogram pairing phenomenon of LSB substitution, the adaptive calibration loss caused by much modification pixel is higher than LSB substitution, so the detection result of 2LSB steganography is the best.

Conclusions
In this paper, a new image construction method using adaptive calibration against spatial steganography is proposed. Firstly, we analyze the effects of LSB matching on the neighborhood degree for the cover image. Secondly, the calibratable pixels are found by the analysis of neighborhood degree, and the calibration image is reconstructed. Finally, features are extracted and used to train the support vector machine. The proposed adaptive calibration method is efficient to detect the LSB matching steganography on low embedding ratio and also to detect the other spatial steganography. It is a research hotspot to image steganalysis with deep learning. At the same time, some coverless steganography appears, and this is a challenge to the steganalyzers. Our future work is to research the coverless steganography and steganalysis with deep learning.