The Comparison of Classifiers in Image Steganalysis

In this paper, proposed steganalytic method utilized for the detection of secret message is based on extraction of statistical features from cover and stego images in JPEG file format together with calibration technique. The steganalyzer concept uses Support Vector Machines (SVM) classification or Bayes classifier for training a model that is later used by the same steganalyzer in order to identify between a clean (cover) and stego image. The aim of the paper was to compare detection accuracy (ACR) of the trained models for two types of classifiers: Support Vector Machines and Bayes classifier. In this paper, five models created between cover and stego images (images obtained by nsF5, Model Based 1, Model Based 2, Modulo Histogram Fitting with Dead Zone and Per-tubed Quantization steganographic method) was tested.


INTRODUCTION
Steganography is the art of hiding secret information in unsuspicious data (cover data).More accurately, it deals with establishment subliminal channels and transporting confidential messages through it.While steganography was related with transfer of physical objects in the past, nowadays, is focused on transfer data in the digital form such as digital images, videos, audios and texts [1].In the article still images were utilized.
The most popular method in the image steganography is LSB (Least Significant Bit).Secret message is embedded to least significant bits of either coded words.This substitution is performed in spatial or transformed domain.Steganographic methods utilized in the work are based on embedding information in DCT (Discrete Cosine Transformation) domain.
Steganalysis aims to detect the presence of hidden message inside apparently-innocent covers [2].It is performed by in advance-trained model obtained in training phase of steganalytic process.Training phase requires high computational complexity than embedding process of steganography.
The method of mentioned training is machine learning.Machine learning is a science discipline that belongs to artificial intelligence.It is inspired by human learning system and gives this ability of self-learning to machines.Machine learning is utilized to solve two main problems: classification and sequential problems.The former deals with making a decision to classify some problem to one of certain classes.If these classes are presented in training, it is supervised learning.In addition, in sequential problems learner knows start and finish position only and seeks road to achieve that.In this case we discuss unsupervised learning [3].
If steganalytic technique is adapted to steganographic method and its characteristics then this technique can achieve higher efficiency in the process of detection.Such a system of steganalysis is called targeted steganalysis.On the other hand, there is a blind steganalysis.It has no information about used steganographic method.Blind ste-ganalysis usually extracts more statistical features in spatial and transform domain for detection more than one steganographic tool.Even, it is appropriate to detect new not well-known algorithms, too.Both targeted and blind steganalysis extract features in training and testing phase of the process as well.Approach of steganalytic analysis together with extraction of the 274 statistical features was used in this paper.
The main part of steganalytic system is classifier.Classifier works in both training and testing phases of the steganalytic system.Classifier is able to put a testing object to the appropriate class using pre-calculated model in a training process.This work was aimed to comparison of efficiency of two well-known classifiers, SVM and Bayes classifier in specific tool of image steganalysis.
The paper is organized as follows.In Section 2, image steganalysis is described, including block diagram of testing and training phase.Descriptions of both tested classifiers are in the same section as well.In Section 3, experimental results are shown and the paper is concluded in Section 4.

IMAGE STEGANALYSIS
The steganalysis is scientific discipline and its primary function is detection of secret message in multimedia or detection of subliminal communication that is defined between two participants.If process of steganalysis is able to reveal secret communication, steganographic system is defined as broken and purpose of steganography is defeated.Steganalytic method is defined as successful, when stego image can be differentiated from cover image with higher probability as random guessing.Steganalysis can be supplemented by activity of extraction secret message's intelligence what requires a set of techniques for further analysis and increase of computational demands [5].
The main idea of steganalysis in static images is detection changes in statistic properties of cover image after embedding a secret message.Therefore, the calculation of those statistical features is very important in design of steganalytic method.The features distinguish the differ- ence between stego and cover image and represent the input for classifier block as is illustrated in Figure 1.

Fig. 1 Block diagram of proposed image steganalytic method
The image database consists of several thousand of images that were taken by different types of cameras using different camera's settings and resolutions.Stego images are created by embedding a secret message with several steganographic methods (e.g.nsF5 [6], MB [7] and others used in JPEG files).In next step, statistical features are extracted from stego or cover images, whereby we obtain two sets of statistical parameters that are separated according to identifier.
Proposed steganalytic method in this article includes 274 statistical features (reasons for the selection of these statistical parameters and more details are stated in article [8] : Next block in steganalytic scheme is classifier, where input of classifier is set of statistical features calculated in previous step.Result of classification process is trained model between cover images and stego images that were obtained by specific steganographic method.This paper was focused on the comparison of two classifiers: SVM and Bayes classifier.Details of these classifiers are stated in chapter 2.1.

Classifiers 2.1.1. Support Vector Machines
Support Vector Machines (SVM), purposed by Vapnik [9], is method of machine learning which is used to classify linear separated or non-separated problems.Based on input data, SVM computes parameters of the separated hyperplane to classify data to appropriate class.Problem of the training model is to find this optimal border by witch cover and stego characteristic features are divided.(Figure 2) where: xinput vector, wvector of weighting coefficients, boffset.Hyperplane is situated in the middle of range 2m, given by support vectors.
Upper-mentioned case is for linear separated problems.If the problem is not linearly separated, input vector is transformed to space with more dimensions.It is achieved using a kernel function (see Figure 3) [11].Now, the classifier searches for optimal separated plane in multidimensional space.Separated hyperplane in multidimensional space is defined: where: Φ(x)transformation of vector x to multidimensional space by the kernel function.

Naive Bayes Classifier
In machine learning, naive Bayes classifiers are a family of simple probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between the features.Naive Bayes has been studied extensively since the 1950s [12].
Naive Bayes classifiers are highly scalable, requiring a number of parameters linear in the number of variables (features/predictors) in a learning problem.Maximumlikelihood training can be done by evaluating a closedform expression, which takes linear time, rather than by expensive iterative approximation as used for many other types of classifiers.
The idea behind a Bayesian classifier is that, if an agent knows the class, it can predict the values of the other features.If it does not know the class, Bayes' rule can be used to predict the class given the feature values.In a Bayesian classifier, the learning agent builds a probabilistic model of the features and uses that model to predict the classification of a new example.
An advantage of naive Bayes is that it only requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification.Because independent variables are assumed, only the variances of the variables for each class need to be determined and not the entire covariance matrix [12].
The classifier works as follows.Given a set D of n dimensional vectors x (x1, x2, x3, ..., xn), and m classes : C1, C2, ..., Cm the Naive Bayes classifier predicts x belongs to class Cm if: for all j between 1 and m.The above conditional probability can be expressed using the Bayes theorem: As P(x) is constant, eq. 4 reduces to maximizing: While this approach is computationally expensive for large n, to ease the burden "class conditional independence" is assumed resulting in:

EXPERIMENTAL RESULTS
Our database contained 18 000 real images taken by different camera types (Nikon D3200, Nikon D3100, Nikon D3000, Nikon D60, Olympus FE-115, Olympus X-715, Panasonic Lumix DMC-FZ5, Samsung S730, Samsung Galaxy ACE, Sony Ericsson C702 and Sony Ericsson W580).This image set included images with various quality and resolution, whereby pictures were taken in different light conditions and various scenes.Created database was divided into two categories: training and testing part.
The selection was implemented on the basis of specific cameras in order to preserve the maximum of diversity.The 2000 images were selected from the group of training images and the 200 pictures were chosen from testing database.Image spatial resolutions were modified because of higher diversity.The image database included these types of resolutions: 320×240 (QVGA), 480×320 (HVGA), 640×480 (VGA), 800×600 (SVGA), 1024×768 (XGA), 1600×1200 (UXGA) and 1920×1080 (HD 1080).
The image database was divided into 8 parts with the 250 images so that the every part included all image resolutions, all types of camera, etc.Consequently, the secret message with variable length was inserted into images in specific groups (8 different sizes of secret message).Variable size of the secret message was applied because of increase in the sensitivity of the trained model.The size of secret message was expressed using parameter Paylod [%] (payload 100 % explains maximal size of the secret message for specific steganographic tool).This process was repeated for every tested steganographic method.
After embedding, the final database consisted of the 2000 images (for every steganographic tool) for the feature extraction.Consequently, these statistical parameters represented the input for classifier.A part of features' extraction is calibration technique that performs cropping of picture by 4 pixels in each direction.The calibrated image has very similar statistical features to cover image.The calibration was executed on image database in order to acquire difference statistics of DCT coefficients what means a feature vector.In training phase, there were created steganalytic models for binary classification, e.g.model cover -nsF5 stego images, cover -MB1 stego images etc. for every tested steganographic method.
In testing phase, there was realized experiment for the verification of detection accuracy of created models for specific steganographic methods using L-SVM classifier or Bayes classifier.L-SVM classifier was used in configuration with linear kernel function and Naive Bayes classifier was tested with the normal Gaussian distribution.
The steganalyzer performance is highly susceptible to embedded data rate.The tested steganographic methods possess with non-equal embedding capacity what did not allow us to show comparable results of final detection accuracy for all values of the secret messages.
Table 1 shows Accuracy (ACR) and True Positive Rate (TPR) of the trained model for different algorithms, payloads and classifiers.Table 1 and Figure 4 show that the better accuracy of detection was achieved using SVM classifier for all types of tested steganographic tools.On the other hand, Bayes classifier had advantage in a smaller computational complexity and smaller time required for training of model.For example, Bayes classifier was able to perform training of model Cover -MB2 with the 2000 images in less than 30 seconds.On other hand, SVM classifier achieved training time: 10 minutes in the same case.Specific test was executed using Intel Core i5 processor with the clock frequency 2,5 GHz.
The best results of ACR in testing process were attained for the model trained between cover and PQ stego images.On the other hand, model based on steganographic method MHF-DZ achieved the smallest level of detection accuracy.

Fig. 4 Comparison of accuracy for specific steganalytic model using L-SVM or Bayes classifier
Characteristics of created steganalytic models can be also illustrated using ROC (Receiver Operating Characteristic) curve.The basic parameter of this curve is AUC (Area under Curve).The AUC has a value from 0 to 1 and the higher value of AUC explains the better detection properties of the specific model.Authors in article [15] show, both empirically and formally, that AUC is indeed a statistically consistent and more discriminating measure than accuracy; what means that AUC is a better measure than accuracy for evaluating of learning algorithms.The Figure 5 and Figure 6 illustrate ROC curves for specific models of steganographic methods with SVM and Bayes classification for maximal capacity of secret message and for every tested steganographic method.

Fig. 2
Fig. 2 Linear separated problem classified by SVM Optimal separated hyperplane is defined as [10]: b wx 

Table 1
Accuracy (ACR) and True Positive Rate (TPR) of trained model for different algorithms, payloads and classifiers