Ternary Echo Hiding in Echo Files

We propose a new family of echo hiding procedures designed to work with audio ﬁles with ternary watermarks. Most attention is paid to the case when the image is used as a watermark. The possibility of using a melody for this purpose is also mentioned. A human is supposed to be a detector to prove the presence of a watermark in the audio ﬁle. The approach employs inﬁnite impulse response (IIR) ﬁlters of a particular form that provides a capability insertion a few symbols into a fragment of the container. The suggested method’s payload and resistance to various attacks exceed the parameters of the classical implementation of echo hiding.


I. INTRODUCTION
Enforcing the author's rights is still an ongoing problem in multimedia.The author, who only starts the career, is induced to upload the created composition in open access.To protect the work from roguery, the creator employs watermarks embedded in the product.The goal is to do this gently without significantly distorting the host signal.Various methods for inserting watermarks into audio files can be found in review [1] as well as in the books [2], [3], [4].It should be noted that embedded data have not to be sensitive to attacks through general transformations of the encoded audio signal, such as filtering, resampling, or lossy data compression.
Echo hiding is a well-known method for embedding data into an audio signal [5].The method has various uses, including providing copyright protection and info integrity.The echo hiding watermarking is getting popular last time [6], [7], [8].The reason for this is the simplicity of the insertion watermark and the lack of a clean file for the extraction of the watermark.Single echo hiding, bipolar echo hiding, backwardforward echo hiding, bipolar backward-forward echo hiding, and time-spread echo hiding methods were developed recently.In echo hiding audio watermarking method, data are embedded into cover audio by adding up delayed versions of the audio signal back to itself.In digital signal processing terms, this process corresponds to finite impulse response (FIR) filtering with an impulse response that consists of a delta impulse at time zero and a time-shifted and weighted delta impulse.To embed multiple echoes, one can use more than one delayed delta impulses.
Typically, before embedding a watermark, the watermark signal is converted to a binary sequence [1].In [9], [10] it is shown, that the ternary form of the watermark is a practical and natural choice in certain cases.These are watermarks that can be recognized by the Human Auditory and Visual Systems since watermarks are music and images fragments converted into ternary sequences.The conversion of a music file into a ternary form is rather simple.Let M usic[k], k = 0, 1, . . ., L− 1 be a fragment of musical file written in wav format.Let us choose a threshold T hr and convert M usic into a ternary sequence When playing a new sequence, we receive poor quality, but the main tune of the original music can be recognized.There is a problem with optimal setting the value of T hr in (1).This issue is investigated in [9].It should be pointed out that (1) is not the only method of converting a music file into a ternary form.Another natural example is the image used to construct the watermark.The transformation of the grayscale image into a binary form is standard procedure.The development of an algorithm that converts grayscale pictures into ternary images is the subject of the paper [10].The idea of the method is as follows.There are two threshold T hr 0 , T hr 1 .
But the matrix T P ict[u, v] does not fit ternary presentation of a picture as a watermark since the values T hr 0 , T hr 1 depend on picture.Instead, we implement standard representation of picture by means of the matrix SP ict[u, v] defined by (3) : While embedding a picture as a watermark, we convert the picture into standard form and then change the values 0, 127, 255 for −1, 0, 1, respectively.An example of an image and its standard form is shown in Fig 1 .One can see that there are more than three levels of brightness of the pixels in the standard form of the picture in Fig. 1.That is the result of the work of the viewer exploited for insertion pictures into documents.
In our paper, we present a new kind of echo hiding in audio files that fits the ternary form of watermarks.Recall the basic ideas realized in the echo hiding procedure [5].Let be a fragment of the audio file where the watermark is embedded.With the introduction of the echo, some element s k changes to sk = s k +a•s k+p where p is an integer number, and a is a small value.Position p in the modified fragment can be revealed through cepstral analysis or autocorrelation of the fragment.That method can be generalized by the leverage of an arbitrary finite impulse response (FIR) filter Here b i are small values.S is a set of indices that are used for coding a watermark.In our paper, we suggest expanding this approach by replacing the FIR filter in ( 5) with a specialshape infinite impulse response (IIR) filter.In what follows, we think of trit and symbol in ternary sequence as synonyms.Throughout, we use the following notation: if A is an array, then F A is a result of discrete Fourier transforming (DFT) of A. Let us suppose that a watermark is represented as a ternary sequence W atr = a 0 , a 1 , . . ., a M where a i ∈ {−1, 0, 1}.Our goal is embedding of one or more symbols of the watermark into the Fragm (4) of the host file.We start with the case where only one symbol of the watermark is embedded into the fragment.
A. Simple IIR filter in echo hiding procedure where p, q are natural numbers.This is a difference equation that defines how the output signal of the IIR filter is related to the input signal.To find the transfer function of the filter, we first take DFT of each part of ( 6), we get In the arising formula w = 2 • π • j.Let F F ragm(n), n = 0, 1, . . ., N −1 be the result of DFT of the whole fragment (4).
Then the output of F F ragm filtering is the modified fragment M F ragm that has the form Here, the operator • denotes the elementwise product of two sequences of the same length, and IDF T means the inverse transform for DFT.The fragment M F ragm replaces F ragm in the host file.While extracting the watermark, we use the cepstral transform.The standard cepstral transform applied to the modified fragment M F ragm produces For small values of c the meaning log(1 ), the cepstrum of M F ragm has four splashes at the points p, N − p, q, N − q.Assuming p = q and b = −a in (9), we denote the resulting cepstral function as Cepstr.We have and the sign of Splash coincides with the sign of the parameter a under a small value of |IDF F T (log |F F ragm|)| at this point.The value of Splash is two times more than one in the cepstrum corresponding to the hiding procedure according to (5).A special case of ( 10) is the situation where p = q = N/2 and b = −a.In this case, Splash ≈ 4 * Cepstr(N/2) and is four times more than the splash in cepstrum related to (5).
If Symbol ∈ {−1, 0, 1} then the embedding of the Symbol into a fragment is realized by the IIR filter ( 6) with p = q, a = c • Symbol, and b = −a.The coefficient c influences the transparency of the embedding procedure.
All these assertions are demonstrated in Fig. 2. All the symbols are embedded into the same fragment of the host.

B. Embedding a few symbols of the watermark into a single fragment
At this point, we display an extension of the method for embedding a few symbols into a single fragment of the host.Suppose that among M symbols of a watermark, which must be inserted into the same fragment, only K symbols are nonzero.For ease, suppose these are K first symbols of the watermark.Let P ositions = p 0 , p 1 , . . ., p K−1 be the positions in the fragment spectrum where the nonzero symbols will be placed in and p K , p K+1 , . . ., p M −1 be the positions assigned to zero symbols.These data, corresponding to nonzero items, are used in the development of the IIR.The insertion procedure is presented in Algorithm 1.Here the transfer function is It follows from (11) that T r(n, p, c, N ) = 1/T r(n, p, −c, N ).
The IIR filter used to embed nonzero Symbols is a series of connected simple IIR filters described in the previous section.An example of embedding four symbols 1, 1, 0, −1 in the positions 106, 108, 110, 112 of a fragment of length 240 is shown in Fig. 3. else if S = 1 then 10:

III. TRANSPARENCY OF EMBEDDING
for n = 0 to N − 1 do 11: end for 13: else 14: for n = 0 to N − 1 do 15:

A. Simple IIR filter
If p = q and b = −a then the transfer function has the form (11).We have  and An example of comparing the real value of SNR with the theoretical one is presented in Table I.
From Table I, it follows that the estimate (13) is very close to the real value.

B. Series of simple IIR filters
It can be expected that transparency depends on the type of the embedded symbols.We can not present a simple formula for evaluating SNR for the full file since the symbols inserted in various fragments differ, and the SNR varies.To demonstrate the situation, we present some experimental results, which are collected in Table II.Here M = 4 and we use short notations for SNR after embedding symbols: . One can see that there is a significant difference in the distortion of a fragment, depending on inserted symbols.

IV. EXTRACTION OF WATERMARK
Let a be a symbol that is inserted at the position p of the spectrum.From (10) we obtain |Cepstr(p) − Cepstr(p)| ≈ 2 • c, where Cepstr(p), Cepstr(p) denote the cepstra associated with a = 0 and a = 0 respectively.Since the positions p 0 , p 1 , . . .p K−1 are arbitrary, we select them with step 2. We assume that Cepstr(p − 1) and Cepstr(p + 1) are close to Cepstr(p).That is the basic idea realized in Algorithm 2. The values of cepstrum are randomly distributed, and the choice c/2 as Bound value provides better results.Using Algorithm 2, one can extract only a single symbol from the fragment at a given position of the spectrum.This algorithm can be extended to the case where a few trits are inserted into fragment, the algorithm must be implemented for each position.An alternative approach to the problem is implemented in Algorithm 3 where the value of c is excluded from calculation.Here AllM F ragms is the list containing all modified fragments of the container and P os -the positions in spectrum utilized for insertion M symbols.We use notation for I = 1 to M do , which realize the mentioned procedures.Since all hiding procedures are based on the modification of random spectrum values, one can not hope that all watermarks are restored accurately.We have to have a criterion for evaluating the quality of the algorithm intended to extract the watermark.The most natural digital value for evaluating an extraction procedure is the trit error rate (TER).That is the ratio of the number of incorrectly extracted trits to the total symbols in the watermark.Let us measure the quality of the two presented above algorithms.To this end, we leverage the image in standard form in Fig. 1.
Length of fragment = 300, the number of trits embedded in  III.One can see that both algorithms have the same quality.

V. ATTACKS
The advantage of applying the picture as a watermark is shown in Fig. 4.Although TER is 27% (more than the quarter of the trits is restored incorrectly), the image is recognized without problems.We continue our experiments with the same container, figure, M = 3, c = 0.1, and length of fragment equals 300.

A. Filtering attack
Since the filtering of containers significantly changes the spectrum of the signal, the inserted watermark can not be recognized because T ER > 55% in this case.

B. Additional noise attack
The random noise with uniform distribution, produced by the random function from [11], is added to the watermarked container.The extraction of the watermark is performed through Algorithm 2 and Algorithm 3. The results are assembled in Table IV.

C. Compression of container attack
That is the most straightforward attack.The container written in wav format is converted to mp3 with various bitrates and converted to wav format again.The initial bitrate of the container is 705 kilobit per second (kbps).We use package PyDub for manipulation with a container [12].The results are placed in Table V.That is the first case where Algorithm 3 shows its advantage against Algorithm 2.

Fig. 1 :
Fig. 1: Example of the original picture and its standard form

TABLE I :
Compare of Real SNR with Theoretical One in the Case of Single filter

TABLE II :
SNR Calculated in the Case Insertion Four Symbols in Fragment

TABLE III :
Compare of Quality of Algorithm 2 and Algorithm 3

TABLE IV :
TER after Additional Noise AttackDependence on the level of noise SNR (dB) Alg2, TER (%)

TABLE V :
TER after Compression