Audio Watermarking using Doubly Iterative Empirical Mode Decomposition

This paper presents an adaptive audio watermarking algorithm based on Doubly Iterative Empirical Mode Decomposition. Initially audio signal is divided into number of frames, each of same length. Then each frame is decomposed adaptively into zero mean intrinsic oscillatory components called as Intrinsic Mode Functions (IMFs) by Doubly Iterative Empirical Mode Decomposition. To achieve good performance against various attacks, watermark bits along with the Synchronization Code (SC) bits are embedded into extrema of the last IMF obtained from each frame, using Quantization Index Modulation (QIM) by which good ratedistortion-robustness performance can be achieved. Simulation results shows that the audio watermarking scheme is robust against attacks like additive noise, resampling, filtering, cropping, requantization, echo-addition and MP3 compression. Keywords—Doubly Iterative Empirical Mode Decomposition; Intrinsic Mode Function; Synchronization Code; Audio Watermarking; Quantization Index Modulation.


INTRODUCTION
Copyright protection of digital media is done by embedding a watermark in original audio signal, called as host audio signal. According to International Federation of the Phonographic Industry (IFPI), the main requirements of audio watermarking are imperceptibility, robustness and data payload. Imperceptibility refers to inaudibility of watermark within the host audio signal. The quality of the audio should not be degraded after the addition of watermark. Imperceptibility is evaluated using both subjective listening and objective measures. According to IFPI recommendation, watermarked audio signal should maintain Signal to Noise Ratio (SNR) of 20dB. Robustness corresponds to ability to extract watermark bits from the watermarked audio signal subjected to various attacks. Data payload refers to amount of data that can be embedded into host audio signal per unit time, without degrading its quality and watermarking algorithm should offer data payload of more than 20 bps. The applications of watermarking are copyright control, identifying the owner, tracking transaction, copy control, content authentication and broadcast monitoring. Different watermark techniques have been proposed and are referred in [1]- [5]. The spread spectrum audio watermarking scheme referred in [5] is robust against various attacks but the limitation is lower transmission bit rate. The bit rate can be improved using watermarking schemes in wavelet domain. In [3], watermark bits are embedded into low frequency coefficient in Discrete Wavelet Domain (DWT) and time-frequency localization characteristics are exploited. In [4], watermarking is done in DWT domain based on Singular Value Decomposition (SVD). A limitation of performing watermarking in wavelet domain is fixed basis functions which may not necessarily match all real signals. To overcome the drawback, a new signal decomposition method which expresses the audio signal as an expansion of basis functions is introduced. These basis functions are signal dependent and are estimated using iterative procedure called as sifting. As priori choice of basis functions is not involved, the technique is advantageous. It works by breaking audio frame down into a number of zero mean signals, termed Intrinsic Mode Functions (IMFs). Each IMF represents an embedded characteristic oscillation on a separated time scale. Any time-varying audio signal x(t) can be expanded by EMD as where n represents number of IMFs and r n (t) is final residue. The IMFs are zero mean functions and are orthogonal to each other. The number of extrema and zero crossing must be either equal or differ at most by one, in an IMF [6].
In standard version of EMD, local extrema points are used as the interpolation points and natural cubic splines are used for interpolation operation during sifting procedure. The performance of EMD can be enhanced by the improved criteria for interpolation point selection. Instead of local extrema of signal, the extrema points of subsignal which is having higher instantaneous frequency are used. As extrema points are not known in advance, interpolation point selection criterion is exploited in doubly iterative sifting schemes, which leads to improved decomposition performance. Thus a novel EMD variant termed as Doubly Iterative Empirical Mode Decomposition is considered [7]. The decomposition is completed with finite number of IMFs and frequency goes on decreasing from one IMF to the next. Higher order IMFs are of low frequency, signal dominated functions and are more vulnerable to attacks. Thus last IMF is considered to embed watermark bits. Quantization Index Modulation (QIM) is used to embed the watermark bits because of its blind nature and good robustness [8]. Parameters of QIM are chosen such that inaudibility constraint is maintained. Experimental results show that the scheme is robust against attacks like addition of white Gaussian noise, resampling, requantization, filtering, cropping, echo addition and MP3compression. II.

PROPOSED WATERMARKING ALGORITHM
The host audio signal is segmented into number of frames each consists of 128 samples. Doubly Iterative Empirical Mode Decomposition is applied on each frame and corresponding IMFs are extracted as shown in figure 1.  The combination is then embedded into extrema of last IMFs of consecutive frames, a bit (either 0 or 1) per extrema. The number of bits embedded in extrema of last IMF of one frame to the following is not constant, as number of IMFs and number of extrema depend on amount of data of each frame. As the number of extrema per last IMF is small compared to length of watermark image to be embedded, all the bits cannot be embedded in last IMF of one frame. Thus last IMFs of consecutive frames are considered. The length of binary sequence to be embedded is 2N 1 +N 2 where N 1 is length of Barker code i.e. equal to 16 and N 2 is length of watermark bits. This binary sequence is length 2N 1 +N 2 is embedded in host audio signal P times. The value of P depends on the length of host audio signal. Then, the superposition of all the IMFs along with residue results in audio frame, which is inverse transformation of Doubly Iterative Empirical Mode Decomposition. Finally, all the frames are concatenated as shown in figure 3.  If the number of IMFs obtained before and after watermarking is same, it guarantees that last IMF always contains the watermark information. Binary watermark bits are extracted from the extrema of consecutive last IMFs by searching for Synchronization Codes as shown in figure 5. As the host signal is not required to extract watermark, the watermarking scheme is blind.

A. Synchronization Code
Synchronization Code is used to locate the embedding position of the watermark bits in host audio signal.
Let U={1111100110101110} be original Synchronization code and V be unknown sequence of length same as U. If the number of positions in which U and V differ is ≤ τ , a predefined threshold, then V is considered as Synchronization Code.

B. Watermark Embedding
A binary sequence whose i th bit is denote by m i ∈ {0,1}, is formed by combining Synchronization Code with the watermark image before embedding. Watermark embedding is detailed as follows: Step 1: Audio signal is segmented into frames, each of 128 samples.
Step 2: Each frame is decomposed into IMFs by Doubly Iterative Empirical Mode Decomposition.
Step 3: The binary sequence {m i } is embedded P times into the extrema of last IMF by Quantization Index Modulation (QIM) using rule [8]: where e i and e i * are the extrema of last IMF of audio signal before and after embedding watermark and represents floor function. S denotes embedding strength which is chosen appropriately for the maintenance of inaudibility.
Step 4: Audio frame is reconstructed using modified last IMF and all the frames are concatenated to obtain watermarked signal.

C. Watermark Extraction
The steps for watermark extraction are detailed as follows: Step 1: Watermarked audio signal is segmented into frames.
Step 2: Each frame is decomposed into IMFs by Doubly Iterative Empirical Mode Decomposition.
Step 3: Extrema {e i * } of last IMF is extracted.
Step 4: Binary bits {m i * } are extracted from the extrema using following rule [8]: Step 5: Consider sliding window of size L=N 1 =16. The start index of the extracted binary data, Y, is set to INDEX=1.
Step 6: Similarity between extracted binary segment V = Y (INDEX ∶ L) and U is evaluated. If the number of bits in which they differ is ≤ τ, then Y (INDEX : L) is considered as Synchronization Code and go to Step 8.
Step 7: INDEX value is increased by 1. Slide the window to next L=16 samples and go to Step 6.
Step 10: P watermarks are extracted and bit by bit comparison is made, for correction. Finally desired watermark is extracted. Watermark embedding and extraction processes are as shown in figure 6. The performance is evaluated in terms of data payload, Signal to Noise Ratio (SNR) between the host audio signal and watermarked signal, probability of error, Bit Error Rate (BER), and Normalized cross-Correlation (NC). According to IFPI standard, SNR of watermarked audio should be more than 20 dB. The SNR is calculated using following formula SNR X, X = 10 log 10 Watermark detection accuracy is evaluated using BER and NC defined by where  represents EX-OR operation and M×N is watermark image size. W represents original watermark image and W represents extracted image. Normalized cross-Correlation is used to evaluate the similarity between the original watermark and extracted one is calculated by Large value of NC near to 1 represents presence of watermark and lack of watermark is identified when value of NC is low. While searching for Synchronization codes two types of errors occur namely, False Positive Error (FPE) and False Negative Error (FNE) and associated probabilities are given by where p is length of SC and τ is threshold [4]. The probability that SC is detected in false position is called probability of False Positive Error. The probability that a watermarked signal is declared as unwatermarked is called probability of False Negative Error. Data payload or information embedding rate of audio watermarking scheme is the number of bits embedded into one second audio section and calculated using DP = M L bps (9) where length of audio signal is L seconds and watermark data is of size M bits.

IV. EXPERIMENTAL RESULTS
Simulations are performed on different audio signals sampled at 44.1 kHz with the length of about 10 seconds in WAVE format, namely FOLK.wav, JAZZ.wav, CLASSICAL.wav and LATIN.wav. A binary logo image of size M×N = 36×36 = 1296 bits (here M=N) is embedded watermark and is represented as W and is shown in figure 7. The 2D binaty image is converted in to 1D sequence of 1296 bits. 16 bit Barker sequence 1111100110101110 is used as Synchronization Code (SC) and is considered before and after watermark bits as shown in the figure 2. Each audio signal is segmented into number of frames each with 128 samples and threshold value τ is set to 4. The paramerter S is set to 0.98 to achieve imperceptibility. Figure 8 shows portion of audio signal FOLK.wav and its watermarked version which can not be visually distinguished from original audio. Perceptual quality assessment is done using subjective listening tests involving 15 persons and objective evaluation tests by measuring Signal-to-Noise Ratio of the watermarked audio signal. Participants were asked to grade the dissimilarities between original and watermarked audio files. 5-grade impairment scale [11] is used and is shown in Table I. Grade 5 corresponds to excellent quality of audio signal and Grade 1 corresponds to bad quality. According to IFPI standard, SNR of above 20 dB denotes good quality of audio signal. Table II show calculated value of SNR of different test audio signals which are above 20 dB confirming to IFPI standard and Average Mean Opinion Score (MOS) of audio signals.

A. Robustness test
The watermarked audio signal is subjected to various attacks such as Noise: White Gaussian Noise is added to watermarked signal till the Signal to Noise Ratio of resulting signal is 20 dB. Cropping: Audio segments, each consisting of 512 samples are removed from 15 random positions and are added with noise. The segments which are contaminated with WGN are then replaced in watermarked signal. Resampling: The watermarked signal is originally sampled at 44.1 kHz. It is re-sampled at frequencies 22.05 kHz and 11.025 kHz and then restored back at frequency 44.1 kHz. Filtering: Low-pass Butterworth filter of order 2 with cut-off frequency 11.025 kHz is used.
Requantization: Re-quantization down to 8 bits/sample is done and then back to 16 bits/sample.

Compression (64 kbps and 32kbps):
The watermarked signal is compressed and then decompressed using MPEG layer 3. Echo-addition: An echo signal with a decay of 40% and a delay of 100 ms and is added to the watermarked audio signal.  Echo addition 0 0.999 Figure 9 shows variation of probability of False Positive Error (P FPE ) with respect to p, which represents length of Synchronization Code. P FPE tends to 0 when p ≥ 16, confirming chosen length of SC. Figure 10 shows variation of P FNE with respect to length of embedding bits.
For embedding bit length ≥ 25, P FNE tends to 0. Since we are using watermark image of size 1296 bits, probability of FNE is very low.   V. CONCLUSION In this paper, an adaptive and efficient watermarking scheme based on Doubly Iterative Empirical Mode Decomposition is proposed. As watermark bits are embedded in extrema points of last IMF, good performance against different kind of attacks is achieved. Binary data bits to be embedded include Synchronization Code and watermark bits, and are embedded into extrema based on QIM. Experimental results demonstrate that original and watermarked audio signals are indistinguishable. Watermark extraction algorithm is efficient and blind since host audio signal is not required during extraction. Proposed watermarking scheme is robust against various attacks like addition of WGN, cropping, resampling, filtering, requantization, MP3 compression and echo addition. This watermarking scheme has higher information embedding rate compared to other schemes and involves easy calculations. The probability of False Positive Error and False Negative Error is very low for chosen length of Synchronization code and watermark bits. The embedded data is the information related to the owner of audio file, the proposed algorithm is useful in applications like copyright protection and tracking the owner. The embedding strength S is kept constant during experiments. Parameter S should be chosen adaptively depending on magnitude values of host audio signal, to further improve the performance. Future work includes designing watermarking method for adaptive embedding problem.