Robust audio watermarking based on transform domain and SVD with compressive sampling framework

ABSTRACT


INTRODUCTION
Information and communication technologies are growing rapidly, indicated with a lot of data traffic on the network internet.The internet can be found very easily by users and even become a daily need.The easiness of using it causes the internet as a source that used for accessing any kind of information and digital contents that makes the spreading of music, song and other audios file can't be controlled.It becomes special attention for certain parties because it can lead to losses such as illegal distribution, copyright infringement, and illegitimate sharing.It has become a necessity to maintain the intellectual characteristics of digital content from certain media.Watermarking becomes one of the solutions to solve the problem.Watermarking is a signal or a digital pattern that is embedded into digital images, video, or audio [1].The result of the watermarking does not always perfect as expected.In general, it happened because of noise that can interfere or change the audio file.In this study proposes a watermarking scheme on audio.
There are several studies related to audio watermarking using one transformation method.Embedded robust watermark on an audio signal based on the LWT method can resist kind of attacks, such as low-pass filter, Gaussian noise, resampling, salt and pepper noise, and compression.This method can secure the copyright and generate the integrity of audio signal [2].Dhar and Simamora [3] proposed LWT method is combined with SVD and Fast Walsh Hadamard Transform that produce some analysis showed that those  ISSN: 1693-6930 TELKOMNIKA Telecommun Comput El Control, Vol. 18, No. 2, April 2020: 1079 -1088 1080 methods provide a high resilience against attacks including re-sampling, noise, compression on MP3 and cropping.However, the resulting audio quality is not optimal.The next related research is about the DCT method.The result of the research is the method [4] has a quite high resilience against MP3 compression attack.Another related study is the Lifting wavelet method that has a deficiency for self-adaptability, this scheme without using SNR has not yet achieved resistance to low pass filtering, additive noise, noise removal [5].From the experimental results, authors [6] have proved the watermark embedding does not make quite a bad audio signal, but there has to be another improvement on the part of the extracted image.In [7], it has a high resistance to attack by the LWT method, but the drawback is that the resulting SNR has not fulfilled the desire, just approached it.Based on many research results, SVD has been proven effective in digital watermarking, including in image watermarking techniques [8,9] and audio [10][11][12][13].
Ozer et al. [10] proposed an audio watermarking scheme using short-time fourier transform (STFT) based on SVD.In this method, the system is difficult to detect watermarks that are attacked by simple noise.Bhat et al. [11] introduced the audio watermarking method based on DWT and quantization index modulation (QIM).The technique is strong against several attacks.However, the resulting signal to noise ratio (SNR) is slightly above 20 dB.The authors [11,12] proposed an audio watermarking scheme in transform domain and based on SVD.The system designs chaotic sequences on binary watermarks to increase confidentiality.Al-Nuaimy et al. [13] applies the watermarking method to the Bluetooth-based automatic speaker identification system.However, improvements in endurance are needed.An audio watermark has good resistance and good SNR values when the DWT and DCT methods are combined, and the comparison with LWT-DCT-SVD is not better than the DWT-DCT method, the shortcomings of the paper are the incremental bit synchronization and CS processes on the watermark still an error when it is being extracted [14].In subsequent papers, the proposed scheme produces robust resistance to hybrid attacks and desynchronization, but its shortcomings still have many imperfections [15].In [16], the proposed scheme has the robustness to volumetric scaling attacks which is a crucial drawback of the conventional QIM-based watermarking algorithm.The scheme also sufficiently resistant to common methods of signal processing as attested by the results of tests in section IV.In addition, the extractor of this method does not require the original audio signal for watermark extraction.The paper also try to improve the resistance of the method by improving the adaptive arrangement of the efficiency coefficients and also by increasing the synchronization of the extractor in the watermark extraction [17].
In this paper, LWT-DCT-SVD method is selected in order to get a better result of the watermark.So, the performance of audio watermarking will be improved such as audio quality improvement, higher robustness to the attack, and higher capacity.The watermarking process is divided into two stages that are embedding and extracting stage.The embedding stage is for inserting the watermark bits on the audio host.The first process is to read the audio host signal that will through an combined with synchronization bits.Then apply the LWT process is applied for a subband frequency selection.After that, the DCT process is applied with the frequency subband where it is transforming the audio host signal with a selected frequency from a time domain to a frequency domain.The output of the DCT process will apply an SVD process where the result is a manipulation matrix there are , , and  matrix. and  matrix will be forwarded to SVD Reconstruction, while the  matrix will be modified by watermark for embedding process.At QIM,  matrix will be inserted with the watermark, previously through a converting process of two-dimensional matrix process that contained in the watermarks to one dimension in the pre-processing.After that, the acquisition of compressive sampling is applied from an original matrix to the smaller matrix.And the extraction stage is to take watermark bits on the watermarked audio.In the extraction stage, watermarked audio is read and the detection synchronization bit is applied and the result will be sent to the next process is the same as the embedding process until getting  ̂ matrix from the SVD process.Next, the  ̂ matrix from SVD process will be extracted by QIM extraction process.After that, the CS reconstruction process produce compressed watermark.
This paper are described as follows: section 2 describes a basic theory of audio watermarking and the method embedding, section 3 describes an audio watermarking model, the embedding and the extraction process, section 4 describes the result and analysis of several performance parameters, while the conclusion describes in section 5.

RESEARCH METHOD 2.1. Lifting wavelet transform (LWT)
Wavelet transform is a linear transformation that almost similar to the Fourier transform, with one important difference: wavelet transform allows the placement of time within the components of different frequencies from the signal provided.Wavelet transform is the time-frequency decomposition method.Lifting wavelet transform is one of type from wavelet transform [2].The lifting scheme is a simple method TELKOMNIKA Telecommun Comput El Control  Robust audio watermarking based on transform domain and SVD with.... (Ledya Novamizanti) 1081 to adjust the design of biorthogonal wavelet.Lifting Wavelet Transform process is divided into three steps, there are:

Split
Splitting the data in LWT has a process as follow split the data into two smaller subsets (detail) there are  −1 for even sequence and  −1 for odd sequence.The signal is: the length of each subset is half of the Original subset.The split process in LWT is calculated as follows: the formula for ILWT is used to process returns the signal as the original signal.The equation of split process in ILWT is as follow:

Prediction
Prediction step is to predict the subset based on the local correlation in the original data.Predict is to use the even and odd sequence.Then, replace the detail as the difference between data and prediction.If predictions are reasonable, the differentials in the data ( −1 ) will be small, and that contains information much less than the original subsets( −1 ).The prediction process is calculated as follow: with () is the anticipation function that can be expressed prediction operator ().The function of () can choose the corresponding data ( −1 ).The prediction process in ILWT is calculated as follow:

Update
Update is applied after prediction steps.Update and maintain some global properties of data with original data.Update is one of the average value subset produce features may not be same with the original data.So, it takes a process of renewal to maintain the characteristics of the original data.Therefore, we apply the update process.Update process is calculated as follows: with  −1 is the low frequency of .The results in the constructed wavelet transform are also different.The formula of prediction process in ILWT can be defined as follows:

Discrete cosine transform (DCT)
DCT is a transformed function which very popular used in signal processing.DCT transform the signal from the time domain to the frequency domain and is able to show fragments of audio signals in the summary of the cosine function in different frequencies.DCT is used to convert the data into a sum of cosine wave tray of different frequencies [18].DCT process can be defined as follows: with  is the number of samples and () of the audio signal.DCT can reduce distortion to the signal watermarking because DCT has a feature that has an energy buffer on some samples signal.The formula of IDCT process is defined by formula (12).

Singular value decomposition (SVD)
Singular Value Decomposition is a method that used to decompose a matrix.For example, we have  matrix for the SVD process. matrix is decomposed into three components, there are two orthogonal matrix and a diagonal matrix containing singular values.There are some main properties of SVD.The first, singular value on the audio has good stability so that when the audio is given slightly attacks, the singular value in the audio does not change significantly.The second, the audio quality will not be affected even though there was a little change to the singular value.And the last, singular value serves to represent the properties of the audio.There are several stages for SVD general properties such as transpose, flip, rotation, scaling [19].Given any  ×  matrix .Algorithm to find matrics ,  and  is calculated using with   as the orthogonal matrix,   as diagonal matrix and   as orthonormal matrix [20].
The Inverse Singular Value Decomposition (ISVD) can be defined as follows :

Compressive sampling (CS)
Compressive Sampling is a method of compression to take the least amount of random sample followed by the projection transformation process.Basically, this technique is a way to simplify the signal.An example of when we have signals that are millions of bytes we can zoom out to only a few hundred kilobytes.CS can reconstruct a signal with using a number of random measurement is called the sampling matrix and signal should be rare [21].A signal using a number of random measurements called matrix sensing and the signal is rare.A signal  ∈   is k-sparse when all elements of k and  are non-zero.For y is a calculation vector, and  is a sampling matrix of M × N, then the embedding process is calculated using the following formula: Basically, CS consists of two components: the recoverability and stability.Recoverability discusses about the type of measurement matrix and recovery procedures.Its function is to ensure the restoration of the right of all signals which are rare-k (k-nonzeros) and to ensure the appropriate measurement of how much to ensure recovery.On the other hand, stability serves to address the issue of resilience in recovery and while it is the measurement of the noise and/or improper sparsity [22].The subject of the stability of CS learn issues of how accurate could restore approach CS signal in such situations.The results of the stability that has been set for the model  1 minimization can be defined as follows [22]: with  =  × .Signs || || is the norm or length of the elements contained therein, or if ||  || as in equation means the norm of element  of a normed vector space.Where the value of  is a matrix length generated after the extraction process.The output of this signal is the minimum value of the sparse signal generated from the previous calculation [22].

Quantization index modulation (QIM)
Quantization index modulation (QIM) is an efficient method of computing watermarking with additional information.This method can be applied to the time or frequency domain, or after the transformation process.There are two stages of the QIM embedding watermarking.The first, modulate the watermark in the index that the result is called a quantizer.The second, quantify signal hosted on a specific frequency with quantizer value that matches the watermark that embedded into the host signal [23].

Spread spectrum
Spread spectrum is a communication method which communication signals are distributed across the available frequency spectrum, for spread information signals over a wider bandwidth to prevent information interruptions.The term spread spectrum is used because on this system the transmitted signal has a bandwidth that wider than the bandwidth of the information signal.SS on watermarking, The embedded watermark information linearly adds the modulated SS sequence to the host's audio signal [24].
The collaboration between LWT, DCT, and SVD as the pre-processing for a host audio is an effective transform method to yield the signal with strong energy.This will impact a very good medium for hiding the information by QIM method.In the same time, the watermark is compressed by CS before hidden by QIM method, thus the watermark payload will increase.The SS method in this paper is used to add the ability of the audio watermarking method to be robust to delay attack.

PROPOSED ALGORITHM
In the proposed audio watermarking technique, the system divided into 3 processes, such as: watermark, embedding, and extraction.Watermark is data that inserted to audio host.Embedding is a watermark insertion process into an audio signal host.Extraction is information retrieval process.The process of the three stages is explained in this section.

Processing stage for watermark
The embedding process is presented as follows: Step 1 : Read a watermark which is an image matrix   .
Step 2 : Convert a matrix from the two-dimensional matrix into one dimension at pre-proccessing, so that it produces ().
Step 3 : Apply the compressive sampling acquisition process of one-dimension matrix into the smaller size matrix.Using (15), it produces   ().

Embedding stage
The proposed watermark embedding process is shown in Figure 1.The embedding process is presented as follows: Step 1 : Read of signal audio host (()) into the matrix of one dimension.
Step 2 : Add synchronize bits to audio host.This process uses Spread Spectrum method.
Step 4 : Apply the DCT process, where the process of transformation from a time domain to frequency domain (  ()).Calculate using formula (10) and (11).
Step 5 : Apply SVD process, with the   () decomposed become , , and  matrix.The  and  matrix forwarded to SVD Reconstruction, while the  Matrix do the QIM process.Calculate using formula (13) Step 6 : Embed the watermark that already through several stages into  matrix, then the result produces the matrix that has been combined  ̂ matrix.
Step 7 : Apply the ISVD process for mix the matrix   and  ̂, then the result produces   ̂().
Step 8 : Apply the IDCT process, where the transformation process to convert the frequency domain signal   ̂() to the time domain, then the result produces   ̂().
Step 9 : Apply the ILWT process to restore the signal as the original signal.The result of the process is an audio signal that has been watermarked, watermarked audio  ̂().
Step 10 : Combine the Synchronization signals with the audio host.
Step 11 : Calculate the Signal to Noise Ratio (SNR) and Objective Different Grade after getting the  ̂().

Extraction stage
The proposed watermark extraction process is shown in Figure 2. The extraction process is presented as follows: Step 1 : Read of watermarked audio  ̂() into the matrix of one dimension.
Step 2 : Apply detection synchronization bit to prevent error on the embedding.
Step 3 : Apply the DCT process, same as the embedding process.Calculate using formula (12).
Step 4 : Apply the SVD process.where the result is a manipulation become matrix ,  and .Matrix  and  forwarded to SVD Reconstruction, while the  matrix modified by watermark.Calculate using formula (14) Step 5 : Take the  matrix to extract the watermark bit    ̂().
Step 7 : The process of converting bits from one dimension to two dimensions in pre-processing, then get the watermark  ̂().
Step 8 : Perform calculations for the value of Bit Error Rate (BER).

Figure 2. Extraction process
The extraction process is presented as follows: Step 1 : Read of watermarked audio  ̂() into the matrix of one dimension.
Step 2 : Apply detection synchronization bit to prevent error on the embedding.
Step 3 : Apply the DCT process, same as the embedding process.Calculate using formula (12).
Step 4 : Apply the SVD process.where the result is a manipulation become matrix ,  and .Matrix  and  forwarded to SVD Reconstruction, while the  matrix modified by watermark.Calculate using formula (14) Step 5 : Take the  matrix to extract the watermark bit    ̂().
Step 7 : The process of converting bits from one dimension to two dimensions in pre-processing, then get the watermark  ̂().

RESULT AND ANALYSIS
This section explains the results of CS performances on watermark, audio watermarking scheme performance before and after attacks, and the comparison with another researches.The analysis results can be seen from the value of BER, SNR and ODG using MATLAB simulation.The audio files used in this watermarking system are *.wav files with a sampling frequency of 44100 Hz and stereo sound channels.The audio used is piano, guitar, bass, drums and conversational sounds accompanied by a little music.

CS performance on watermark
The system is tested by using different watermark sides and different compression ratios.Then the best compression ratio chosen to be used in the next test scenario.This study uses a watermark in a binary image with a size of 10 × 10 pixels.Figure 3 shows the watermark image used in the system.Table 1 displays the results of CS testing.  1, the greater the compression ratio, the faster the time required.Based on Figure 4, the higher the pixel value, the better the image quality will be.If the pixel value is large and compressed with a small bit compression ratio, then the embedded watermark will be greater.In this study, the 32-bit side and 0.03 compression ratio are used, so that the watermark entered is smaller, so the time taken is smaller.Thus, the greater the value of the bit and pixel compression ratio used, the better the quality of the resulting watermark extraction.Figure 4 shows the watermarks result with different ratio compression.

System performance before attack
There are some parameters in the systems, such as: frame length (Nframe), quantization bit depth (nbit), subband threshold (thr), and wavelet decomposition level (N).All parameters are optimized to reach optimal performance.Optimal performance in watermarking is how to find parameter values which yield Figure 5 displays the trade-off between SNR, C and BER with variable sample number per frame (Nframe) parameter in range [128,256,512,1024].SNR is displayed with y-axis range 62-80 DB, BER is displayed with y-axis range 0-0.3, and payload is displayed with y-axis range 0-350 bps.The figure shows the watermarking trade-off that when SNR is getting better or higher, then BER is getting worse or higher, and the payload is getting worse also.Table 2 shows the range of optimized parameters and the optimal parameters as the optimization process result.

Performance system after attacks
There are 5 different genre of songs that tested with five attacks.The optimization process is to obtain BER, SNR, ODG and C with all parameters that are changed.The optimal parameters are found when getting the highest SNR, the lowest BER, and the highest payload.This optimization process is a mandatory step to do in an audio watermarking because the imperceptibility, robustness, and watermark are not directly proportional to each other.We usually call it as a trade-off.If the imperceptibility parameter is getting better, then the robustness parameter and capacity is even worse, and this applies otherwise.Table 3 is the average yield resistance (BER) of the best parameter test using various attacks.Table 4 shows the performance system after being given attack using the optimal parameter displayed in Table 2. Table 4 also displays the performance comparison with the previous research.Balance performance means the standard performance for an audio watermarking, such as SNR must be more than 20 dB, ODG should be more than -1, BER should be less than 20%, and watermark payload can be adjusted to as high as possible when it is possible.

CONCLUSION
SVD based robust audio watermarking with compressive sampling framework on the watermark has been proposed.In this scheme, compressive sampling is used to compress the watermark so that more information can be inserted into the audio host.SVD The system is able to produce high resilience with BER 0 and capacity 127.26.The best parameters for this work are 256 frame length, 2 quantization bits, and 16 bit depth, 0.003 alfa, threshold 0.9 , and wavelet level is 1.The audio watermarking algorithm is higly robust againts LPF, resampling, and linear speed change attacks.

Step 8 :
Perform calculations for the value of Bit Error Rate (BER).TELKOMNIKA Telecommun Comput El Control  Robust audio watermarking based on transform domain and SVD with.... (Ledya Novamizanti) 1085

Figure 4 .
Figure 4. Watermark image with ratio compression 0.025 to 0.03

Table 1 .
CS performance with 32 pixel

Table 2 .
Performance result of optimal parameters

Table 4 .
Performance comparison