Blind Audio Watermarking in Transform Domain Based on Singular Value Decomposition and Exponential-Log Operations

,


Introduction
The recent development in computational world and the wide availability of internet have facilitated the transmission and distribution of multimedia data.As a result, the protection of intellectual property rights of digital data has been the key problem.Digital watermarking has drawn an extensive attention and has been a focus in data security.It is a process of embedding watermark into original data objects such as audio, video, and image.It is widely used for several purposes including copyright protection, broad-cast monitoring, fingerprinting, data authentication, and medical safety.
In general, the digital watermarking can be broadly classified into robust and fragile (or semi-fragile) watermarking.For robust watermarking schemes, two important issues need to be addressed: one is to show a trustworthy evidence to protect rightful ownership and the other is to provide a good trade-off among perceptual transparency, robustness, and data payload.A comprehensive survey on audio watermarking can be found in [1], [2].Most audio watermarking methods utilize either a time domain [3], [4] or a transform domain such as fast Fourier transform (FFT) [5], discrete cosine transform (DCT) [6], discrete wavelet transform (DWT) [7], [10], and lifting wavelet transform (LWT).Bassia et al. [3] proposed a watermarking scheme in which watermark bits are embedded by modifying the audio samples directly.Lie and Chang [4] introduced a method in which group amplitudes are modified to achieve high robustness.However, both methods have low data payload.Megías et al. [5] suggested a watermarking method that embeds watermark in FFT domain, but it has low data payload.Zeng and Qiu [6] introduced a watermarking method based on DCT which embeds binary image as watermark, but the subjective evaluation of watermarked audio signal has not been assessed.In [7], authors presented a watermarking algorithm based on energy-proportion scheme which is robust against attacks; however, the SNR results of this algorithm are not satisfactory.Chen et al. [8] proposed an adaptive method based on wavelet based entropy, but robustness to resampling and low-pass filtering attacks are quite low.In [9], authors introduced a robust watermarking scheme in DWT domain which embeds watermark in the low frequency DWT coefficients using optimization-based quantization technique but the subjective evaluation of watermarked audio signals has not been done in this scheme.Wang et al. [10] proposed a method using DWT in which watermark bits are embedded into low-middle frequency wavelet coefficients and linear predictive coding (LPC) is utilized in detection process.But the SNR result is just above 20 dB and the robustness to re-sampling and low-pass filtering attacks are quite low.Erçelebi and Batakçı [11] presented a watermarking method in LWT domain in which a binary image is embedded as watermark; however, from the reported result, robustness against attacks is quite low.Khaldi et al. [12] proposed a watermarking scheme based on empirical mode decomposition (EMD) in which watermark bits are embedded into the intrinsic mode functions (IMFs).But the subjective evaluation of the proposed scheme was not conducted and the data payload is quite low.Recently, singular value decomposition (SVD) has been utilized as an effective technique in audio watermarking [13][14][15][16].In [13], [14], authors introduced a domain adaptive audio watermarking algorithm using SVD.In addition, the proposed segment-by-segment approach enhanced the delectability compared to the simple approach utilizing the whole original audio signal directly.However, the signal-to-noise ratio (SNR) result is not very good and the detection scheme is non-blind.The most recent SVD based watermarking methods proposed by Bhat et al. [15] and Lei et al. [16] provide high robustness; however the data payload of both methods is quite low.Moreover, some other techniques such as audio histogram technique [17], [18] and time spread (TS) echo method [19], [20] are becoming popular in audio watermarking field.Dhar et al. [21] introduced an audio watermarking method based on DCT, SVD, and Log-polar transform (LPT).But the robustness against MP3 compression is little low.The main limitation of the existing audio watermarking techniques is the difficulty to obtain a favorable trade-off among the imperceptibility, robustness, and data payload.To overcome these limitations, in this paper, we propose an audio watermarking scheme in discrete cosine transform (DCT) domain based on SVD, exponential operation (EO), logarithm operation (LO), and quantization.The main features of the proposed scheme include (i) it utilizes DCT, SVD, EO, LO, and quantization jointly, (ii) it uses a Gaussian map which contains the chaotic characteristic to enhance the confidentiality of the proposed scheme, (iii) watermark is embedded into the largest singular value of the exponential coefficients obtained from the DCT sub band with highest power, (iv) watermark extraction process is blind, and (v) it achieves a good trade-off among imperceptibility, robustness, and data payload.Simulation results indicate that the proposed watermarking scheme is highly robust against various attacks such as noise addition, cropping, re-sampling, re-quantization, and MP3 compression.Moreover, it shows good performance in terms of imperceptibility, robustness, and data payload compared with some state-ofthe-art methods [7][8][9], [11], [12], [14][15][16][17][18][19], [21].The SNR of the proposed scheme range from 27 to 36 dB, in contrast to the state-of-the-art methods whose SNR's range from only 12 to 28 dB.In addition, the bit error rate (BER) of the proposed scheme ranges from 0 to 3.1250 whereas the BER of the state-of-the-art methods range from 0 to 17.50.Furthermore, the data payload of the proposed scheme is 172.39 bps which is relatively higher than the state-of-theart methods.
The rest of this paper is organized as follows.Section 2 briefly describes the background information including DCT and SVD.Section 3 introduces our proposed watermarking scheme including watermark embedding and detection processes.Section 4 compares the performance of our proposed scheme with the state-of-the-art methods in terms of imperceptibility, robustness, and data payload.In addition, it also provides the error analysis of the proposed scheme.Lastly, the conclusion of the paper is presented in Sec. 5.

Background Information 2.1 Discrete Cosine Transform
The DCT has been widely used in signal and image processing, especially for lossy data compression [16].It represents the signal in the form of a series of coefficients obtained from a sum of cosine functions oscillating at different frequencies and amplitudes.DCT can be written as where x(n) is the audio signal with length of N samples and .
The DCT has the ability to compress the energy of a signal in a few samples, leaving the other samples very small in magnitude.This property can be utilized in audio watermarking to reduce the deterioration of watermarked signal.

Singular Value Decomposition
SVD is a mathematical tool mainly used to analyze matrices.In SVD transformation, a given matrix A is decomposed into three matrices.Let A be a p × p matrix with SVD of the form A = USV T where U and V are orthogonal p × p matrices and S is a p × p diagonal matrix with nonnegative elements.The diagonal entries of S are called the singular values (SVs) of A where S = diag (σ 1 , σ 2 ,…,σ n ), the columns of U are called the left singular vectors of A, and the columns of V are called the right singular vectors of A. The SVD has some interesting properties: (i) the sizes of the matrices from SVD operation are not fixed, and the matrices need not be square; (ii) changing SVs slightly does not affect the quality of the signal much; (iii) the SVs are invariant under common signal processing operations; (iv) the SVs satisfy intrinsic algebraic properties.

Proposed Watermarking Scheme
Let X = {x(n), 1 ≤ n ≤ L} be an original audio signal with L samples, W = {w(k, l), 1 ≤ k ≤ M, 1 ≤ l ≤ M } be a binary logo image to be embedded into the original audio signal, and w(k, l)  {0, 1}be the pixel value at point (k, l).

Watermark Preprocessing
Watermark should be preprocessed first in order to improve the robustness and enhance the confidentiality.This paper uses a Gaussian map that contains the chaotic characteristics to encrypt the binary watermark image for enhancing the confidentiality of the proposed method.It can be defined as follows: where y(1)  (0,1), a and b are real parameters (map's initial condition).Then a binary sequence z(i) is calculated by using the following equation: where T is a predefined threshold.The binary watermark image W is converted into a one dimensional sequence q, where q = {q(i), i = 1, 2, 3,…, M × M}.Finally q(i) is encrypted using z(i) by the following rule: where  is the exclusive-or (XOR) operation.After this random chaotic encryption, the original watermark is permuted and cannot be found by random search.In this study, the value of y( 1), a, b, and the encrypted watermark sequence u(i) are used as secret key K.

Watermark Embedding Process
The proposed watermark embedding process is shown in Fig. 1.The embedding process is implemented in the following steps: 1) The original audio signal X is first segmented into non-overlapping frames to calculate the DCT coefficients D i , where i indicates the frame number.
3) The DCT coefficients of each frame F i are divided into m numbers of sub bands B = {B j , 1 ≤ j ≤ m} with r numbers of coefficients in each sub band B j , where j indicates the sub band number.4) The power G = {G j , 1 ≤ j ≤ m} of the sub bands B = {B j , 1 ≤ j ≤ m} of the DCT coefficients of each frame F i is calculated using the following equation: Original audio signal where G i represents the power of each sub band B i , V k and k are the amplitude of the DCT coefficients and the index of the DCT coefficients, respectively in each sub band ) of the DCT coefficients of each frame F i .This sub band is selected because it is the significant band to embed watermark data.7) EO is applied to each element of B j(Highest) ) to calculate the corresponding exponential coefficients which are denoted by E j(Highest) ).For each element, the EO is represented by the following equation: The exponential coefficients E j(Highest) of each frame F i are rearranged into an N × N square matrix R i .This is done by dividing the coefficient set into N segments with N coefficients.8) SVD is performed to decompose each matrix R i into three matrices: U i , S i , and V i .The SVD operation is represented as follows: The binary watermark image is encrypted using the Gaussian map which contains the chaotic characteristics.10) The largest singular value S i (1,1) of each matrix S i is selected.Watermark information should be embed-ded into the most significant perceptual components of the audio signal in order to guarantee the robustness and imperceptibility of the proposed method.As S i (1, 1) contains the most power of the signal, therefore, it represents the significant perceptual component of the audio signal.The proposed method embeds watermark bits into the largest singular value S i (1, 1) of each matrix S i by using a quantization function.Let Y i = Round(S i (1, 1)/Q), where Q is a predefined quantization coefficient.The embedding equation is given as follows: where M = 2C, C is an integer, mod is the modulo operation, and S i ' (1,1) is the modified largest value.11) Re-insert each modified largest singular value S i ' (1,1) into matrix S i and inverse SVD is applied to obtain the modified matrix R i ' which is given by ' is then reshaped to create the modified exponential coefficients Eˊj (Highest) of each frame F i by performing the inverse operation of step 5. 12) LO which is the inverse operation of EO is performed on each Eˊj (Highest) to calculate the coefficients of modified sub band Bˊj (Highest) of each frame F i .The LO is represented by the following equation:

Watermark Detection Process
The proposed watermark detection process does not need the original audio signal to extract the watermark, which is shown in Fig. 2. The detection process is implemented in the following steps: 1) The DCT is performed on each frame F i * of the attacked watermarked audio signal.6) Perform chaotic decryption by using the secret key K 1 to find the hidden binary sequence with the following rule: 7) Finally, watermark image is obtained by rearranging the binary sequence q * (i) into a square matrix W * of size M × M.

Simulation Results and Discussion
In this section, several experiments were conducted to demonstrate the performance of the proposed watermarking scheme.The performance of the proposed scheme is assessed in terms of imperceptibility, robustness, and data payload.In this study, we used 40 audio clips belonging to four different audio groups as original audio signals, which are as follows: Group 1: 10 clips containing Pop music; Group 2: 10 clips containing Jazz music; Group 3: 10 clips containing Classical music; Group 4: 10 clips containing male and female speeches; All audio files are mono-channel and contain 262,144 samples (duration 5.94 sec).They are sampled at the rate of 44.1 kHz, quantized with 16 bits.By using a frame size of 256 samples, we have 1024 non-overlapping frames for each audio sample.In each frame of audio signal, we embed one bit binary watermark information.Thus, the length In this study, the selected value for y(1), a, b, and T are 0.6, 5.90, -0.39, and 0.25, respectively.For convenience, the selected value for the sub band number m and the coefficients of each sub band r are both 16.These parameters have been selected in order to achieve a good trade-off among the conflicting requirements of imperceptibility, robustness, and data payload.

Imperceptibility Test
Audio watermarking intends to embed an unperceivable and secure watermark into the original audio signal.Therefore, the watermarking scheme should be perceptually transparent.The imperceptibility of the watermarked audio is evaluated by using two ways: (1) subjective listening test, (2) objective test.

1) Subjective Listening Test
Subjective listening tests are essential for perceptual quality assessment.In the subjective listening test, ten participants were given the original and watermarked audio signals and were asked to report the dissimilarities between two signals, using a five point subjective difference grade (SDG) ranges from 5.0 to 1.0 (imperceptible to very annoying) as shown in Tab. 1.The average SG (i.e., mean opinion) scores for different watermarked sounds using the proposed scheme are shown in Tab. 2. From the test results, it is seen that the average mean opinion score (MOS) results are within 4.8 to 5.0 for all watermarked sound using the proposed scheme, indicating that watermarked audio signals are perceptually similar to original audio signals.
Subjective test was also done using ABX method.The subjects were ten male and female persons whose hearing ability is normal.Initially, each subject listened the original audio signal (A) and the watermarked audio signal (B) and that subject listened a third audio signal (X) in random order, which can be either A or B. The subjects were asked to verify which of A or B was the same as X.One time of identification was considered as one trial and five times of trials was done by each subject.A detection percentage of 50% indicates that the difference between the original and watermarked sounds was imperceptible.The evaluation results of all subjects were summarized in terms of percentage of correct detection and are shown in Tab. 2. We observed that correct detection scores range from 42 % to 52 %, indicating the high imperceptibility of the watermarked sound.

2) Objective Test
SNR is widely used to measure the objective quality of watermarked audio signal which is formulated as: where x(n) and x * (n) are the original and watermarked audio signals in time domain, respectively.After embedding watermark information, the SNRs of the watermarked audio signals using the proposed scheme are above 20 dB, shown in Tab. 2. According to the International Federation of the Phonographic Industry (IFPI) standard [15], audio watermarking should be imperceptible when SNR is over 20 dB.From Tab. 2, we observed that the proposed scheme satisfied the IFPI standard.
Objective test was also done using the objective difference grade (ODG).The perceptual evaluation of audio quality (PEAQ) measurement technique which is specified in ITU-R BS.1387 (International Telecommunication Union-Radio-communication Sector) standard [21] incorporates a psychoacoustic model.The output of the PEAQ algorithm is the ODG which corresponds to the subjective grade used to measure the differences between the original and watermarked audio signals.The ODG ranges from 0.0 to -4.0 (imperceptible to very annoying) which is shown in Tab. 1.The objective quality of the watermarked audio signals in terms of ODG is shown in Tab. 3. We observed that the ODG values range from -0.49 to -1.43, indicating that original and watermarked audio signals are perceptually similar.
Figure 4 shows the time domain representation of the original audio signal with a watermarked audio signal in which the watermark is imperceptible using the proposed scheme for the signal 'Speech'.

Robustness Test
Normalized correlation (NC) coefficient is used to compare the similarities between the original watermark W and the extracted watermark W * , which is calculated as: where k and l are the indices of the binary watermark image.The correlation between W and W * is very high if NC (W, W * ) is close to 1. On the other hand, the correlation between W and W * is very low if NC (W, W * ) is close to zero.
The bit error rate (BER) is used to measure the robustness of a watermarking scheme and is computed as where  is the exclusive or (XOR) operation.
The following signal processing attacks were performed to assess the robustness of the proposed scheme.
1) Additive White Gaussian noise (AWGN): AWGN is added to the watermarked audio signal.2) Cropping: Segments of 1000 samples (10 × 100) are re- moved from the watermarked audio signal at ten different positions and subsequently replaced by segments of the watermarked audio signal attacked additive white Gaussian noise.
3) Re-sampling: The watermarked audio signals are down-sampled back to 22.05 kHz and then upsampled back to 44.1 kHz.4) Re-quantization: Each sample of watermarked audio signals is re-quantized from 16 bits to 8 bits.5) Low-pass filtering: Low-pass filter with 10 kHz cutoff frequency is applied to the watermarked audio signals.6) MP3 compression: MPEG-1 layer 3 compression was applied.The watermarked audio signal is compressed at a bit rate of 32 kbps and then decompressed back to the wave format.
It is very important to select the significant sub band from the DCT coefficients for embedding watermark.The sub band with highest power is the significant sub band for embedding watermark.In this study, 256 DCT coefficients are divided into 16 sub bands (m = 16) with 16 DCT coefficients (r = 16) in each sub band.The 1 st 16 DCT coefficients belong to the 1 st sub band, the next 16 DCT coefficients belong to the 2 nd sub band, and the next 16 DCT coefficients belong to the 3 rd sub band and so on.We observed that most of the cases, the sub band with the highest power was found in the 1 st band.This is because the DCT has energy compaction characteristics.For this reason, most of the energies were concentrated in low frequency region.Figure 5 shows the robustness results (NC values) of the proposed method against different attacks for the audio signal "Speech" where the highest power, the 2 nd highest power, the 3 rd highest power, and the 4 th highest power represent embedding watermark bits into the DCT sub band with the highest power, the 2 nd highest power, the 3 rd highest power, the 4 th highest power, respectively of each frame.We observed that embedding watermark bits into DCT sub band with the highest power provides better robustness than embedding watermark bits into DCT sub band with the 2 nd highest power, the 3 rd highest power, and the 4 th highest power.In this situation, DCT sub band with the highest power can work better than the other DCT sub bands with lower power.This is because DCT sub band with the highest power indicates the optimal DCT sub band for embedding watermark.Table 4 shows the robustness results of the proposed scheme in terms of NC and BER against several attacks for the audio signal 'Speech'.The minimum NC value and the maximum BER value are 0.9481 and 6.4231, respectively.The extracted watermark images are visually similar to the original watermark.This clearly shows a good performance of the proposed scheme against different kinds of attacks.Table 5 shows similar results for the audio signal 'Pop', 'Jazz', and 'Classical', respectively.The NC values are all above 0.94 and the BER values are all below 7 %, demonstrating the high robustness of our proposed scheme against different attacks.

Security
For a secured watermarking scheme robustness against attack is important.To enhance the security, the proposed method utilizes chaotic encryption.Since the proposed watermark embedding and detection processes depend on the secret key K, it is impossible to maliciously detect the watermark without these keys.

Data Payload
The data payload is defined as the number of bits that can be embedded into the original audio signal within a unit of time and is measured by bits per second (bps).The data payload P can be represented as follows: where T is the duration of the original audio signal in seconds and B is the number of watermark bits to be embedded into the original audio signal.Usually, the data payload for any watermarking method should be more than 20 bps [15].The data payload of the proposed scheme is 172.3906bps.

Error Analysis
Two types of error may occur while searching the watermark sequence: a false positive error (FPE), or a false negative error (FNE).A false positive error occurs when an unwatermarked audio signal is declared as watermarked audio signal by the detector, on the other hand, a false negative error occurs when a watermarked audio signal is declared as unwatermarked audio signal by the detector.It is difficult to give an exact probabilistic model for false positive and false negative errors.Here, a simplified model based on binomial probability distribution is used in our proposed method to calculate the FPE probability and FNE probability which is same as the model utilized by the method reported in [15].
Let s be the total number of watermark bits and t be the total number of matching bits.The FPE probability P f p and FNE probability P f n can be calculated as: where is the binomial coefficient and P is the bit error rate (BER) probability of the extracted watermark.We observed that ( 18) is independent of BER.This indicates that FPE is independent of attacks.As we select s = 1024, by substituting the value of s (18) gives P f p = 2.6209  10 -88 .Figure 6 shows the FPE probability for s  (0,100].It is noted that P f p approaches 0 when s is larger than 30.
The approximate value of P can be obtained from BER under different attacks.From Tab. 5 and 6, we observed that all BER values are less than 0.07.Thus P is taken as 0.93.By substituting the value of s and P, (19) gives P f n = 5.4708  10 -42 .Figure 7 shows the FNE probability for s  (0,100].It is noted that P f n approaches 0 when s is larger than 30.The main difference between the proposed method and the method reported in [15] is that they provide different result for PFE and FNE probability because the value of s, t, and P are different for each method.For example, the P f n of the proposed method is 5.4708  10 -42 , whereas the P f n of the method reported in [15] is 6.47  10 -34 .

Conclusion
In this paper, we have introduced an audio watermarking scheme in DCT domain based on SVD, EO and LO.Simulation results demonstrate that the proposed scheme provides high robustness against different attacks such as noise addition, cropping, re-sampling, re-quantization, and MP3 compression.This is because watermark is embedded into the largest singular value of the exponential coefficients obtained from the DCT sub-band with highest power of each audio frame and slight variations of the largest singular values cannot affect the quality of the sound and also these values may change very little against different attacks.In addition, the proposed scheme achieves very low false positive and false negative error probability rates.Moreover, it provides good performance in terms of imperceptibility, robustness, and data payload compared with some recent state-of-the-art watermarking methods.These results verify the effectiveness of our proposed watermarking scheme for audio copyright protection.
There are several directions for future work on the proposed scheme introduced in this paper.In the future work, we will include synchronization code [5] and error correcting codes [22] to improve the robustness of the proposed scheme.Moreover, psychoacoustic model can be adopted to improve the imperceptibility of the proposed scheme.Furthermore, a comparison between the proposed scheme and several the state-of-the-art methods will be carried out in terms of imperceptibility and robustness with same audio signal and same parameter.In addition, computational complexity of the proposed scheme will be calculated.

) 13 )
After substituting the modified sub band Bˊj (Highest) for B j(Highest) , an inverse DCT is performed on D i ' to obtain the watermarked audio frame F i ' .14) Finally, all watermarked frames are concatenated to calculate the watermarked audio signal X ' .

Fig. 3 .
Fig. 3. (a) Binary watermark image.(b) Encrypted watermark image. of the watermark sequence is 1024.A binary logo image and the corresponding encrypted image by chaotic encryption of size M × M = 32 × 32 = 1024 are shown in Fig. 3.In this study, the selected value for y(1), a, b, and T are 0.6, 5.90, -0.39, and 0.25, respectively.For convenience, the selected value for the sub band number m and the coefficients of each sub band r are both 16.These parameters have been selected in order to achieve a good trade-off among the conflicting requirements of imperceptibility, robustness, and data payload.

1 :Fig. 5 .Tab. 4 .
Fig. 5. Robustness result of the proposed scheme where watermark is embedded in different DCT sub bands for the audio signal "Speech".Type of attacks NC BER (%) Extracted watermark

Fig. 6 .
Fig. 6.Probability of FPE for various values of s.

Fig. 7 .
Fig. 7. Probability of FNE for various values of s.
Subjective and objective difference grades.