A NEW APPROACH FOR SPEECH AUDIO STEGANALYSIS USING DELAY VECTOR VARIANCE METHOD Emrah YÜRÜKLÜ

We investigate the use of delay vector variance-based features for recorded speech steganalysis. Considering that data hiding within a speech signal distorts the properties of the original speech signal, we design a steganalyzer that uses surrogate data based delay vector variance (DVV) features to detect the existence of a stego-signal. We evaluate the performance of the proposed DVV features as steganalyzer with numerical results.


INTRODUCTION
Steganography is the art and science of hiding the presence of communication by embedding secret messages into innocent-looking digital signals.To achieve secure and undetectable communication, stego-signals, containing a secret message, should be indistinguishable from cover-signals, signals not containing any secret message.In this respect, steganalysis is a set of techniques that aim to distinguish between cover-signals and stego-signals.
Audio signal steganalysis methods were proposed in the literature for various approaches.The work by Westfeld (2003) is for LSB based embedding, while the work by Fridrich and Goljan (2002) addresses the steganalysis of MP3stega algorithm.The steganalysis of LSB based embedding and Hide4pgp algorithm is explored by Johnson et al. (2005) spread spectrum watermarking and stochastic modulation steganography is considered by Westfeld and Pfitzmann (1999).Both watermarking and steganographic hiding methods in time and frequency domains are considered by Westfeld (2003).Chaotic type features are investigated by Koçal et al. (2008) and found very useful for distinguishing stego signals from cover signals.
Chaotic phenomena in speech is still being in subject of various researches.Though linearity and stationarity of the audio are inherent assumptions in many of the reported approaches, there is also theoretical and experimental evidence for the existence of chaotic phenomena in speech signals which linear modeling cannot cover it (Altun et al., 2005;Özer et. al., 2003;Kokkinos and Maragos, 2005;Banbrook and McLaughlin, 1999).Delay vector variance (DVV) method which is based on surrogate time series (Theiler et al., 1999) is one of the techniques which can determine the level of nonlinearity in signals (Gautama et al., 2003).Assuming that data hiding leads to additive distortion on speech signal, it is anticipated that this process will change the nonlinear and chaotic structure of the speech signal, therefore DVV and chaotic-based features.
The change in chaotic structure means also the change in nonlinear structure, thus manifests as deviation from the numerical values of the DVV's of the cover signal, and therefore enables the detection of the possible existence of a hidden signal.The details of the data hiding effect on DVV will be given in the next section.By the motivation of the effect of data hiding on chaotic properties of speech signals, we propose a steganalysis technique that utilizes DVV features and the other chaotic features as well.
The rest of the paper is organized as follows: In Section 2 we give an overview of DVV features for speech signals and show how data hiding changes these features.DVV feature extraction, feature selection and experimental results are given in Section 3. Conclusions are drawn in Section 4.

DVV METHOD
For assessing the nonlinearity presence in time series, "surrogate time series" method is a widely-used technique which was offered by Theiler et al. (1992).Surrogate time series is produced with the same magnitude and the similar phase of Fourier transform of the original time series.There are some different approaches for producing surrogate time series, but the most widely used one is the method named iterative amplitude adjusted Fourier transform (iAAFT) which was proposed by Schreiber and Schmitz (1997).In this paper this approach is used for all produced surrogate time series.For detailed information please refer to Schreiber and Schmitz (1997).
By comparing original and surrogate time series' characteristics, it can be anticipated the level of nonlinearity in the time series.Some metrics based on surrogate times are defined for doing this comparison.

Nonlinearity Measurements
Besides the delay-vector-variance method, there are two other approaches to predict the nonlinearity of the time series in the study by Schreiber and Schmitz (1997); the third-order autocovariance and asymmetry due to time reversal (Gautama et al., 2003).The third-order autocovariance is a higher-order extension of the traditional autocovariance and can be given by; Here  is the time lag.Given time series is accepted as time reversible, if the probabilistic properties don't change regarding to time reversal.Invariance for the probabilistic features can be measured by using the equation of In the studies by Gautama et al. (2003) and Schreiber and Schmitz (1997) has been shown that with combination of DVV, these two methods are very helpful for two-tailed tests for nonlinearity.

DVV-Method
This method was first proposed in 2003 (Gautama et al., 2003) and is a generic nonlinearity predictor which uses phase spaces of time series.For a given embedding dimension, D E , set of delay vectors, x(n), are generated with time lag, ; Defining the proper values and the effects of selecting improper values for the embedding dimension, D E , and time delay, , is widely described in [6] and [10].For a given embedding dimension, D E , DVV method computes the mean target variance, 2 *  , for all of the sets of n  .

Here every n
 is group which is consisted of those delay vectors which are in a certain distance to x(n).The certain distance is varied regarding to the distribution of pairwise distances between delay vectors (Gautama et al., 2003).

DVV method can be summarized as follow;
 For the given embedding dimension, D E ; the mean, d  , and the standard deviation, d  , are computer aver the all pairwise Euclidean distances between delay vectors,  For the given embedding dimension, D E ; Here N is the sample count in time series of x and d r is the certain distance which has to be chosen from the interval [ d - where n d is a parameter that controls the span over which to compute the DVV-plot. For the given embedding dimension, D E ; the variance of every set of  , is computed.The average value of all sets ) ( d n r  is normalized by the variance of the time series, 2 x  .At the end the measure of the unpredictability is computed by; , contains more than 30 delay vectors, it is not taken into account while doing the computations (Gautama et al., 2003).
As a result of doing standardization of the distance axis, the DVV-plots are really straightforward to interpret.In fig. 1 DVV-plots of four benchmark signals are illustrated as a sample to explain DVV better.According to the plots, Henon Map signal (A) is the most predictable signal while colored noise (C) is the most unpredictable signal.Note that, all the x-axis are standardized as distances, so it allows to make the comparison easier (Gautama et al., 2004).

Figure 1: Solid curves represent the DVV-plots for the Hénon Map (A), Mackey-Glass (B), colored noise (C), and the laser time series (D). Average DVV-plots computed over 99 surrogates are shown
as dashed curves.

USING DVV AS A STEGANALYZER FEATURE
With evaluating fig. 1, we can have ideas about the nonlinearity dynamics of the signal.But, this approach is not enough as long as we can't have objective decisions every time that we make evaluations.Because of this reason, Schreiber and Schmitz have constituted the below method which produces numerical results (Schreiber and Schmitz, 1997); 1.The number of surrogate data, n_surr, is decided according to defined limit of significance,  (generally 0.05), In  n set, the square value of the neighborhood distances between all delayed vector couples is; The mean, µ d , and variance values, , in  n set are: Using the assumption of hiding data into cover-signals means that adding zero-mean and varianced white noise with magnitude of (n), if we reorganize the equation ( 7) for both cover and stego-signals; If we expand the term of d S 2 which is used for stego-signals; If we put As long as all the pairs of 〈 , 〉, 〈 , 〉, 〈 , 〉, 〈 , 〉 ve 〈 , 〉 are independent identically distributed, the mean values of eq.( 12) can be simplified as; Because of (n) is zero-mean and varianced white noise; The variance value which is used in DVV analysis and seen in eq.( 8) can be defined for cover and stego-signals as below; By using eq.( 14) and ( 15), the term can be written as; With the assumption of µ µ eq.( 17) can be simplified as; . 2 With this result, we can say that, variance values of stego-signals which are used in DVV analysis are always higher than the cover-signals'.
First three elements of the proposed F surr feature vector can be seen in fig. 2

EXPERIMENTAL RESULTS
We have performed tests over nine different data hiding methods.Here five of them are watermarking techniques, while the other four are steganographic techniques.The selection of watermarking methods is to extend the case to the widest possibilities, where a stego-signal can be deliberately changed before the receiver gets it (Simmons, 1984).The selected watermarking techniques are direct-sequence spread spectrum (DSSS), frequency hopping with spread spectrum (FHSS) and echo hiding (ECHO) by Bender et al. (1996) and DCT based watermarking method (COX) by Cox et al. (1997).Four steganographic methods are Steganos (2011), MP3Stego (2011), Steghide (2011), Hide4Pgp (2011) and Stochastic Modulation (Fridrich and Goljan, 2003).The rationality of using them is their popularity, free availability, and wide usage in related works.The performances for individual embedding methods are determined for to see whether the proposed feature set is useful for audio steganalysis.

Data Set
The database used for the test scenario is a subset of TIMIT speech database (2011), which is widely known in the evaluation of automatic speech recognition systems.TIMIT database is constituted of over 6300 utterances of 630 male-female speakers sampled at 16 KHz.2000 speech segments are randomly selected from TIMIT database for our experiments, ignoring dialects and male-female differences.For every single data hiding method, stego-signal subset is constituted with embedding data into utterances with using the data hiding method.

Embedding
We have embedded messages into these excerpts with nine embedding methods.The procedure of embedding messages is randomly selecting half of the set of the stego and coversignals for training, and leaving the other 50% for testing.An objective distortion measure is defined, Signal to Watermark Ratio (SWR), to define the payload level in watermarking methods: x n SWR x n y n where x(n) and y(n) signify cover and watermarked signals, respectively.For steganographic methods, for three different embedding rates performance of each individual method are evaluated: 100% and 50% of maximum allowed capacity.

Feature Set
The feature set proposed in the study has 4 elements which are all based on DVV-values as described in eq.( 6).To see the whole performance of chaotic based features, also a whole feature set based on chaotic features, which consists 26 elements, is defined as below; Here is the feature set of FNF (False Neighbour Fraction) which is based on the statistical values of FNF method (Kennel and Abberbanel, 2002), while  is i.LE value (Lyapunov Exponent) of the signal (Hilborn, 2000).Using these chaotic features in audio steganalysis is very advantageous as described by Koçal et al. (2008).is described as; where three elements of feature vector are the fraction of false neighbors, the average size of the neighborhood, and the root-mean-squared size of the neighborhood.The FNF and the LE values of the signals are calculated with TISEAN (Hegger et al., 1999) software package.

Feature Selection and Classifiers
For achieving highest detection rate to evaluate redundant features, sequential forward floating search method (SFFS) by Pudil and Novovicova, (1994) is coupled with support vector machine (SVM) classification method (OSU-SVM Matlab Toolbox, 2011).Radial basis function with Gamma = 4 is used in SVM as SVM function.

Simulation Results
For assessing the validity of proposed feature set, several tests have been carried out also considering the effect of payload.The performance of DVV based steganalyzer is given in Table I for TIMIT dataset.As performance metrics; results are given in the percentage of the performance, Missed Detection count (MISS) and False Alarm count (FA) as there are 1000 samples in performance tests.
By Özer et al. (2006), it is observed that SWR figures greater than 38dB for DSSS, 34dB for FHSS, 20dB for Cox, and 18dB for ECHO, become noticeable, namely audible.For this reason 20dB, 30dB and 40dB are selected as payloads.In detail, it can be seen that 30 dB payload causes performance drops to 55% in Cox method; but note that this level of payload is audible.For steganographic methods performance of each individual method are evaluated for two different embedding rates: 100% and 50% of maximum allowed capacity.There is no audibility concern for steganographic methods, because the maximum embedding rate used in the tests is equal to maximum allowed capacity which is the audibility threshold.As expected, performance decreases as payload decreases.

CONCLUSION
As conclusion, steganalyzers based on the proposed DVV features has high performances regarding mostly known steganography methods (Johnson et al., 2005;Koçal et al., 2008;Özer et al., 2003).If DVV-based features are used with the other chaotic based features, performance of the steganalyzer becomes more successful according the DVV-based steganalyzer.With these results it can be easily said that DVV-based features can be used as speech audio steganalysis and also for nonlinearity detection as well.
which are calculated over 2000 stego and cover-signals with using DSSS (Bender et al., 1996) and Stochastic Modulation (Fridrich and Goljan, 2003) steganography techniques.Here n surr value is 20 and the difference between feature values of cover and stego-signals can be easily seen from the figure.

Figure2:
Figure2: The first 3 elements of F surr of 2000 cover and stego-signals which are constituted with steganographic techniques DSSS (a) and Stochastic Modulation (b).n surr is 20 for F surr .

) 2 .
n_surr surrogate time series are produced from original time series; s 1 , s 2 ,… s n_surr , 3. DVV curves of original and surrogate time-series are constructed, 4. The difference between DVV curves of original and surrogate are calculated as RMSE (RMSE 1 , RMSE 2 ,…, RMSE n_surr .) 5.For every original time series (sound record), feature vector is constituted with mean, variance, skewness and kurtosis values calculated over RMSE values of surrogate time series;