Quality Scalability Aware Watermarking for Visual Content

Scalable coding-based content adaptation poses serious challenges to traditional watermarking algorithms, which do not consider the scalable coding structure and hence cannot guarantee correct watermark extraction in media consumption chain. In this paper, we propose a novel concept of scalable blind watermarking that ensures more robust watermark extraction at various compression ratios while not effecting the visual quality of host media. The proposed algorithm generates scalable and robust watermarked image code-stream that allows the user to constrain embedding distortion for target content adaptations. The watermarked image code-stream consists of hierarchically nested joint distortion-robustness coding atoms. The code-stream is generated by proposing a new wavelet domain blind watermarking algorithm guided by a quantization based binary tree. The code-stream can be truncated at any distortion-robustness atom to generate the watermarked image with the desired distortion-robustness requirements. A blind extractor is capable of extracting watermark data from the watermarked images. The algorithm is further extended to incorporate a bit-plane discarding-based quantization model used in scalable coding-based content adaptation, e.g., JPEG2000. This improves the robustness against quality scalability of JPEG2000 compression. The simulation results verify the feasibility of the proposed concept, its applications, and its improved robustness against quality scalable content adaptation. Our proposed algorithm also outperforms existing methods showing 35% improvement. In terms of robustness to quality scalable video content adaptation using Motion JPEG2000 and wavelet-based scalable video coding, the proposed method shows major improvement for video watermarking.


I. INTRODUCTION
Scalable coding has become a de facto functionality in recent image and video coding schemes, e.g., JPEG2000 [1] for images; scalable extensions of Advanced Video Coding (AVC) [2] and High Efficiency Video Coding (HEVC) [3] for video.The scalable coders produce scalable bit streams representing content in hierarchical layers of increasing audiovisual quality and increasing spatio-temporal resolutions.Such bit streams may be accordingly truncated in order to satisfy variable network data rates, display resolutions, display device resources and usage preferences.These adapted bit streams may be transmitted or further adapted or decoded D. Bhowmik is with the School of Engineering and Physical Sciences, Heriot-Watt University, Edinburgh EH14 4AS, United Kingdom.(e-mail: d.bhowmik@hw.ac.uk).
C. Abhayaratne is with Department of Electronic and Electrical Engineering, The University of Sheffield, Sheffield S1 3JD, United Kingdom.(email: c.abhayaratne@sheffield.ac.uk).
Manuscript received XXX XX, XXXX; revised XXX XX, XXXX; accepted for publication XXX XX, XXXX;.Fig. 1.Multimedia usage scenarios using scalable coded content.using a universal decoder for playback.An example of scalable coding-based multimedia usage is shown in Fig. 1.In scalable image/video coding, the input media is coded in a way that the main host server keeps bit streams that can be decoded to the highest quality and to the full resolution of the content.When the content needs to be delivered to a less capable display (D) or via a lower bandwidth network, the bit stream is adapted at different nodes (N 1 , N 2 , ... , N x , as shown in Fig. 1) using different scaling parameters to match those requirements.At each node the adaptation parameters may be different and a new bit stream may be generated for decoding.
Such bit stream truncation-based content adaptations also affect any content protection data, i.e., watermarks, embedded in the original content.Watermarking for scalable coded content is far more challenging than traditional watermarking schemes [4]- [20], where the performance of the algorithms is measured by traditional metrics, such as, 1) distortion due to embedding (commonly used metrics are Peak Signal to Noise Ratio (PSNR), weighted PSNR or more recently Structured Similarity Index Measurement (SSIM) [21]), 2) data capacity and 3) robustness against signal processing, geometric or compression attacks as listed in Stirmark [22] or Checkmark benchmark [23].In these algorithms, the watermark is embedded with the goal of minimising embedding distortion while trying to maximise the robustness for a given attack.
Many of these algorithms demonstrated high robustness against traditional non-scalable compression schemes, such as, JPEG and MPEG-2.It is also common that most of these watermarking schemes are based on such compression schemes.Often the same watermarking algorithms have been naively extended to propose watermarking algorithms robust against scalable content adaptation attacks, e.g., JPEG2000.Example of such image watermarking algorithms e.g., [7], [11]- [15], [24]- [30].As the discrete wavelet transform (DWT) is the underlying technology of the JPEG2000 compression standard, we consider the watermarking schemes that use wavelet-based techniques in addition to the algorithms mentioned above.In these schemes, different approaches have been used as follows: a) Choosing coefficients in a specific subband for embedding the watermark: e.g., embedding in high frequency subbands for better imperceptibility [7]- [10]; embedding in low frequency subband to achieve high robustness [11], [12]; and balancing imperceptibility and robustness with all subbands spread spectrum embedding [13], [14].
An in-depth review of the above schemes with respect to robustness to scalable content adaptation can be found in [24].None of these algorithms can guarantee correct watermark extraction from the adapted (scaled) content, as they do not have access to scaling information during embedding.In scalable coding, the original media is encoded once and then the resulting bit stream is truncated with any scaling parameter during the content adaptation steps in media consumption chain to obtain the target data rates and resolutions.This creates a situation where no information with respect to target data rates is available to the watermark embedder to make watermarking robust for such content adaptations.While traditional watermarking methods are robust enough to cope with content adaptation, in such approaches as the watermark is being embedded once which is predetermined to be robust to high compression, the possibility of embedding distortion is also higher and not always necessary where target application requires higher resolution content, i.e., less compression.
Different from traditional schemes, we propose the novel concept of scalable watermarking by creating nested distortion-robustness atoms (defined in Section II-B1) to allow flexibility to the user.The proposed algorithm encodes watermarks hierarchically and embeds it at the point of scalable compression.The watermarking scalability here refers to hierarchical watermark embedding where more embedded information corresponds to better robustness.The concept of scalable watermarking is particularly useful in watermarking for scalable coded images where the watermark can also be scaled according to the heterogenous network capacity and the end user's requirement for a target application where it is assumed that the media life-cycle finishes.For example, for a high bandwidth network and a high resolution display, highly imperceptible but less robust watermarked image can be transmitted.As in this scenario, high quality media is desirable and the watermark can also be extracted reliably due to lower compression.Whereas, for a low network bandwidth and low resolution display, the distribution server can choose a highly robust watermarking stream, where, due to higher compression the watermarking imperceptibility is less important, but high robustness is required for reliable watermark extraction (refer to Fig. 2).Similarly, based on any other combinations of the network's capability and user's requirement, the scalable watermarked media code stream can be truncated and distributed accordingly.
With the increased use of scalable coded media, scalable watermarking is very important.To the best knowledge of the authors, little work has been proposed in the current literature in this context.Most such algorithms commonly available are proposed either as a joint progressive scalable compression and watermarking scheme [25], [26] or efficient coefficient selection methods which are robust against resolution or quality scalable attacks [27], [28].These algorithms are primarily focused on two main robustness issues [29]: 1) detection of the watermark after an acceptable scalable compression and 2) graceful improvement of extracted watermark as more quality or resolution layers are received at the image decoder.The extraction of a complete watermark is only possible when all quality layers are available at the detection.In a practical scenario quality layers are not always available to the end user (as discussed earlier) and this poses risk of loosing important watermark information.
Distinctively, we propose a novel scalable watermarking concept resulting in a distortion constrained watermarked code-stream to generate watermarked image with desired distortion robustness requirements.Therefore the extraction of complete watermark is ensured at various stages of scalable compression.This work addresses the tow-fold problem of 1) obtaining the least distortion at a given watermark embedding rate and 2) achieving the best robustness in a scalable fashion by hierarchically encoding lower and higher embedded distortion-robustness atoms, respectively.In designing the algorithm, we have considered the propositions for embedding distortion, i.e., in order to minimize the distortion, the coefficient modification must be minimized; and the concept of bit plane discarding model [30] to emulate scalable content adaptation for improving the robustness against quality scalable content adaptation.The main contributions of this work are as follows: • The theoretical foundation for generating a code-stream that can be truncated at any distortion-robustness atom level to generate the watermarked image with the desired distortion-robustness requirements; • A novel watermarking algorithm that generates scalable embedded code-stream using hierarchically nested joint distortion-robustness atoms; and • Improving watermarking robustness by modelling the bit plane discarding used in quality scaling based content adaptation in scalable coded image and video.
These contributions are demonstrated by proposing a new wavelet domain binary tree guided rules-based blind watermarking algorithm.Following the scalable coding concepts, a universal blind watermark extractor is proposed.It is capable of extracting watermark data from the watermarked images created using any truncated code stream.As no such idea has been explored yet in the literature, in order to evaluate this work, we introduce a new embedding distortion metric shown in Eq. (3) and report the robustness results to support the claim.The initial concept and early results of this algorithm were reported as conference publications in [31] and [32].This work reports the detailed algorithm, discusses extended results with application to both image and video watermarking.Within the scope of this work, we focus on the scalable watermarking concept and restrict ourselves to the robustness against quality scalability.

A. The proposed algorithm
In proposing the new algorithm we aim to address two significant challenges related to robust watermarking techniques for scalable coded image and video: 1) scalability of the watermarking and 2) robustness against scalable media compression.As opposed to traditional algorithms which fail to comply the scalability requirements, we introduce a new watermarking algorithm that creates hierarchical watermarked image / video code-stream distortion-robustness atoms and allows quantitative embedding-distortion measurement at individual distortion-robustness atom level.The system block diagram of the proposed watermarking algorithm and the scalable content adaptation scheme is shown in Fig. 3 major steps for embedding include the forward discrete wavelet transform (FDWT) and coefficient modification using the embedding algorithm followed by the inverse discrete wavelet transform (IDWT).The content is then scalable coded and may be adapted during the media consumption.Watermark authentication includes the FDWT and recovery of the watermark and comparison with the original watermark.The proposed embedding algorithm in this paper follows a nonuniform quantization based index modulation and the process is divided into three parts: 1) Quantized binary tree formation, 2) embedding by index modulation and 3) extraction & authentication.The embedding and extraction were performed on wavelet domain and therefore we use the term coefficients referring to wavelet coefficients in the rest of the paper.1) Quantized binary tree formation: This step defines how a coefficient (C) chosen for embedding a watermark bit is recursively quantized to form a binary tree.The coefficients for embedding the watermark data may be chosen based on its magnitude, sign, texture information, randomly or any other selection criteria.While the selection criteria is a user defined parameter, we have chosen all coefficients for the experiments in this manuscript.
Firstly, C is indexed (b i ) as 0 or 1 using an initial quantizer λ: where mod denotes the modulo operation.Assuming n = |C| λ , we can identify the position of C between the quantized cluster (n) -(n + 1), which can alternatively be described as bit plane clusters as shown in Fig. 4. The coefficient, C, is then further quantized more precisely within a smaller cluster using a smaller quantizer, λ/2, and the corresponding index is computed as: b 1 = |C| λ/2 mod 2. The index tree formation is continued recursively by scaling λ value by 2, as long as the condition λ/2 i ≥ 1 is true.During this tree formation process signs of the coefficients are preserved separately.Based on the calculated index value at various quantization steps a binary tree (b(C)) of each selected coefficient can easily be formed: where (b 0 ), (b 1 )...(b i ) are binary values (bits) in the most significant bit (MSB) to the least significant bit (LSB) positions, respectively with the tree depth i+1.For example, if C = 135 and initial λ = 30, the binary tree b(C) will be b(C) = 01000.
An illustration of the tree formation scenario is shown in Fig. 5.The number of tree nodes, e.g., number of bits in any binary tree is decided by the initial quantizer λ and defined as the depth of the tree.
2) Embedding by index modulation: The above mentioned binary tree in Section II-A1 is used to embed binary watermark information based on symbol-based embedding rules.To introduce the watermarking scalability, we chose 3 most significant bits which represents 8 different states corresponding to 6 different symbols.Although any other number of bits (> 1) can be chosen, the use of more number of bits (> 3) results in more states, thus increase the complexity while less number of bits (< 3) reduces the watermark scalability.Now 3 most significant bits of any binary tree, represents 6 symbols (EZ = Embedded Zero, CZ = Cumulative Zero, WZ = Weak Zero,  Sign of the coefficients are kept separately and added once the coefficients are reconstructed.However, quantifying embedding distortion for the proposed method is challenging with any of the traditional embedding measurement metrics, such as, PSNR or SSIM.This is due to the dynamic nature of the algorithm where depending on the scaling parameters the distortion measurement may vary for a given data capacity on the same coefficient.Traditional metrics are generic for many image processing applications and do not consider the data capacity in their calculation resulting in ambiguity to the user whether the distortion is due to low capacity (randomly chosen pixels or coefficients) but high strength watermark or high capacity (all coefficients / pixels) but low strength watermark.Therefore we propose a new metric which combines the data capacity and embedded distortion as follows: where Φ represents embedding distortion rate, I and I are the original and watermarked image, respectively with dimensions X × Y and L is the number of watermark bits embedded, e.g., data capacity.For the completeness we compared our new metric Φ with traditional PSNR values and the results are discussed in Section IV.
3) Extraction & Authentication: A universal blind extractor is proposed for watermark extraction and authentication process.The term universal signifies the capability to extract watermark information irrespective of watermark embedding or content adaptation parameters.The wavelet coefficients are generated using the FDWT on the host image followed by the tree formation process used during embedding.Based on the recovered tree structure, symbols are re-generated to decide on extraction of a watermark bit, 0 or 1.The extracted watermark is then authenticated by comparing the Hamming distance, H ∈ 0..1 with the original watermark (often referred as Bit Error Rate in the literature) as described in Eq. ( 4): where W and W are the original and the extracted watermarks, respectively.L is the length of the sequence and ⊕ represents the XOR operation between the respective bits.
A lower value of Hamming distance corresponds to higher robustness.

B. Designing watermarking scalability
This section discusses on obtaining the scalable watermark using the binary tree presented in Section II-A.The term watermarking scalability refers to embedding of watermarks in a hierarchical manner, so that, more embedding information leads to better robustness.The proposed algorithm is independent of any specific media coding scheme and hence can be used to design a new joint scalable watermarking -image / video coding-decoding scheme.In the proposed algorithm, the symbols shown in Table I are ranked based on the improvement in robustness associated to them.The MSB in the binary tree corresponds to coarsegrained quantization index, whereas, the LSB represents finegrained quantization index.To extract the watermark bit successfully, all three most significant bits of any binary tree must be unaltered in case of WO, CO or WZ, CZ, whereas only two most significant bits are required for EO or EZ.Therefore, two consecutive 0s (EZ) or 1s (EO) provide the strongest association with 0 or 1, respectively and hence provides high robustness.The symbol pair, WO and CO, offers the same level of robustness.Similarly, the robustness level associated with the symbol pair, WZ and CZ, is the same.Thus, the robustness ranking of the symbols can be defined as EO>CO, WO; and EZ>CZ, WZ; for embedding 1s and 0s, respectively.At the same time, the collective embedding distortion rate, Φ, for these scenarios can be computed as in Eq. (3).In designing the scalable watermarking concept, rest of this section exploits these two properties: a) rule based robustness rank order and b) embedding distortion rate Φ.The complete process is divided into three separate modules: 1) Encoding module, 2) Embedded watermarking module and 3) Extractor module.
1) Encoding module: The main functionality of this module is to generate a hierarchical embedded code-stream.An example scalable watermarking system model is shown in Fig. 7.The sequential activities within the encoding module are described in the following steps: Tree formation: Binary trees are formed as in Section II-A for each coefficient selected for embedding the watermark.Every tree is now assigned a symbol according to the rules in Table I.
Main pass: Based on the input watermark stream, the trees are altered to create right association for required robustness as shown in Fig. 6.Hence, all selected coefficients are rightly associated at least with basic WZ/WO symbol.Thus, we name it base layer.The embedding distortion is calculated progressively at each level in the tree.
Refinement passes: The goal of the refinement passes is to increase the watermarking strength progressively to increase the robustness progressively.The base layer provides basic minimum association with watermark bits.In this refinement pass, the watermarking strength is increased by modifying the symbols and corresponding tree to the next available level, i.e., WZ → EZ, CZ → EZ, WO → EO and CO → EO, as shown in the state diagram in Fig. 6.At the end of this pass, all trees are modified and associated with the strongest watermark embedding EZ/EO.Similar to the previous step, the distortion is calculated as the refinement levels progress.
Hierarchical distortion-robustness atom and code-stream generation: During the previously described passes, the binary trees are modified according to the input watermark-robustness association and the embedding distortion is calculated at each individual tree.Here we call these individual trees or a group of trees as a distortion-robustness atom, where each atom contains two pieces of information: 1) embedding distortion rate, Φ; and 2) modified tree values.Φ is defined for the whole image and the algorithm includes Φ with every atom in a hierarchical manner rather than distortion responsible for individual atoms.For example, if distortion is 3 for atom1, 7 for atom2 and 2 for atom3, the generated code-stream will contain distortion information as 3, 10 and 12 for atom1, atom2, and atom3, receptively.Specific level of distortion is achieved by finding the atom that has nearest match for the target distortion.Once the algorithm finds the match, it truncates at that point and a code-stream is generated by concatenating the atoms that qualify for the target distortion.The structure of the code-stream is shown in Fig. 8.One set of header information is also included in the beginning of the stream to identify the input parameters, such as, the wavelet kernel, the number of decomposition levels, and the depth of the binary tree, etc.The header information was included here for proof of concept for a truly blind watermark extractor.However the header information can be part of standard compression metadata, or can be passed as side information.One can also use a decoder that already has such information.
2) Embedded watermarking module: The embedded watermarking module truncates the code-stream at any distortionrobustness atom level to generate the watermarked image with desired distortion-robustness requirements of the user.Inclusion of more distortion-robustness atoms before truncation increases the robustness of the watermarked image but consumes greater embedding-distortion rate.The code-stream truncation at a given distortion-robustness atom level provides flexibility towards watermarking scalability.To reconstruct the watermarked coefficients, the truncated code-stream is de-quantized following reverse footsteps in Section II-A1.Applying the IDWT on these coefficients generates the watermarked media with required visual quality and the watermark robustness.
3) Watermark extractor module: The extractor module consists of a blind extractor similar as described in Section II-A3.Any test image / video after content adaptation process is passed to this module for watermark extraction and authen-  tication.During the extraction, the FDWT is applied on the test media and the coefficients are used to form the binary tree.
Based on the rules stated in Table I, each tree is then assigned to a symbol and corresponding watermarking association.The association of 0 or 1 indicates the extracted watermark value.
The extracted watermark bits are then authenticated using Eq.(4).

III. ROBUSTNESS TO JPEG2000 BIT PLANE DISCARDING
This section extends scalable watermarking presented in Section II for improving its robustness against compression caused by scalable coding schemes, such as, JPEG2000.Quality scalability in compressed bit streams has been particularly of interest due to the Quality of Service (QoS) requirements in the media consumption chain.As reported in Section I, there exists some algorithms in the literature that offer robustness to compression in general.However, in the algorithmic development, most of the algorithms do not consider the effect of JPEG2000 quantization process on the robustness of the watermark retrieval.This section shows that a special case of the proposed scalable watermarking algorithm in Section II incorporates JPEG2000 quantization process, which leads to a bit-plane discarding model for achieving quality scalability in content adaptations.Firstly, we briefly discuss the bit-plane discarding model followed by a discussion on the fitness of the proposed algorithm for achieving the watermark robustness to quality scalability-based content adaptation.
A. Scalable coding-based content adaptation JPEG2000 uses the DWT as its core technology and offers scalable decoding with quality and resolution scalability.The scalable coders encode the image by performing the DWT followed by embedded quantizing and entropy coding.The coefficient quantization, in its simplest form, can be formulated as follows: where C q is the quantized coefficient, C is the original coefficient and Q is the quantization factor.Embedded quantizers often use Q = 2 N , where N is a non-negative integer.Such a quantization parameter within downward rounding (i.e., using floor), can also be interpreted as bit plane discarding as commonly known within the image coding community.At the decoder side, the reverse process of the encoding is followed to reconstruct the image.The de-quantization process is formulated as follows: where Ĉ is the de-quantized coefficient.In such a quantization scheme, the original coefficient values in the range k where k ∈ ±1, ±2 ± 3... and Q = 2 N for bit plane wise coding, are mapped to Ĉ = C k , which is the center value of the concerned region as shown in Fig. 9 and in Eq. (7).

B. Incorporating quantization in scalable watermarking
To improve the robustness against quality scalable compression, at this point we incorporate the bit plane discarding within the proposed algorithm by restricting the initial quantizer (λ) value to an integer power of two.Therefore the quantization cluster in tree formation (Section II-A) can now alternatively described as a bit-plane cluster.Because of this, every value in the binary trees corresponds to the bit-planes of the selected group of coefficients.Therefore, based on the depth parameter in the embedding algorithm, the selected coefficient can retain the watermark even after bit-plane discarding.
Assuming C and C as the watermarked coefficient before and after bit plane discarding, respectively, we examine the effect of N number of bit-plane discarding on every bit in the binary tree during the watermark extraction.Considering initial λ = 2 M , where M corresponds to the depth of the tree, at the extractor, using Eq.(1) the bit (b i ) in the binary tree can be calculated as: where k 1 is the cluster index as shown in Fig. 10.Using the bit-plane discarding model in Section III-A, the watermarked coefficients, C , are quantized and mapped to the center value, C k , within a bit-plane cluster with an index value of k 2 as shown in Fig. 10.At this point, we consider following three cases to investigate the effect of this quantization and dequantization process: 1) Case 1 (M > N ): In this case, the binary tree cluster (λ = 2 M ) is bigger than the bit-plane discarding cluster.Hence for any bit-plane discarding where M > N , C k value remains within the binary tree cluster, k where b i and b i represents the bit in a binary tree, without bitplane discarding and after bit-plane discarding, respectively.
2) Case 2 (M = N ): This case considers the same cluster size in binary tree and the bit-plane discarding, and therefore C k remains in the same cluster of binary tree during watermark extraction, as shown in Fig. 10.(b) and hence b i = b i .
3) Case 3 (M < N ): In this scenario, the number of bitplanes being discarded are greater than the depth of the binary tree.Due to bit-plane discarding, any watermarked coefficient, (C ), in the cluster (k 2 •2 N ≤ C < (k 2 +1)•2 N ) is mapped to the center value, C k .In terms of the binary tree clustering this range can be defined as (k where (N − M ) is a positive integer.Hence during watermark extraction, the index of the binary tree cluster can be changed and effectively b i = b i is not guaranteed.
So far we have explained the effect of bit-plane discarding on individual bits of a binary tree.Since the algorithm generates the watermark association symbols using the three most significant bits of the binary tree (Table I), we can define the necessary condition for the coefficients to retain the watermark as follows: where d is the depth of the binary tree and N is the number of bit planes assumed to be discarded.
After the second refinement pass in the code-stream all modified coefficients are associated with either EZ and EO.In that case only the two most significant bits are required to be preserved.Hence when considered embedding, the highest robustness criteria in Eq. ( 10) becomes : Nonetheless, in this case, the second most significant bit in the binary tree does not need to be preserved, whereas the MSB is preserved in combination with the support decision from the third most significant bit, i.e., EZ and EO are allowed to be extracted as CZ and CO, respectively.Now we examine the effect of bit-plane discarding in these cases when d = N + 1.
Case EZ: Considering λ = 2 M in this case, after the second refinement pass, the coefficients, C , are associated to embedded zero (EZ → 00x), i.e., k where k 1 mod 2 = 0, as shown in Fig. 11.a.After N number of bit-planes discarding, C is modified to the center value

2
. For M = N (i.e., d = N + 1), k 2 becomes k 1 and therefore: results in the second MSB remains as 0 in the binary tree.Hence, after d = N + 1 number of bit-planes discarding, the coefficient association with EZ remains the same and the watermark information can be successfully recovered.
Case EO: Referring to Fig. 11.b, for embedded one (EO → 11x), the condition for coefficient association be- Similar to the previous case, after N number of bit-planes discarding, C modified to the center value of the corresponding

2
. Considering M = N , similar to Eq. ( 12), we can write: Therefore, the two MSBs of the binary tree are now changed as 11x → 10x.At this point, we aim to extract the third MSB, b , as: 4 , Eq. ( 13) becomes Combining, Eq. ( 14) and Eq. ( 15), the extracted third MSB becomes b = 1 and hence, 11x → 101.Therefore, after d = N + 1 number of bit-planes discarding, the coefficient association with EO becomes CO and the watermark information can still be successfully extracted.
Combining the above mentioned cases, we can modify Eq. ( 11) and conclude that for EZ or EO the relationship between the embedding depth, d, and maximum number of bit-plane discarding, N , is as follows: Therefore, using the above mentioned conditions, the proposed new algorithm ensures the reliable detection of the watermark against quality scalable content adaptation which follows the bit-plane discarding model.

IV. EXPERIMENTAL RESULTS AND DISCUSSION
This section provides the experimental verification of the proposed scalable watermarking scheme, for images as well as video.It is also evaluated for its robustness to scalable content adaptation attacks. 1 The source code of the proposed algorithm is available from https://github.com/dbhowmik/scalableWM2 .

A. Scalable watermarking for images
The experimental simulations are grouped into four sets: 1) Proof of the concept, 2) Verification of the scheme for bitplane discarding model, 3) Robustness performance against JPEG2000 and 4) Robustness comparison with existing blind watermarking schemes.For all experiments, a 3 level 9/7 wavelet decomposition is used as the FDWT.Then the low frequency subband has been selected to embed a binary logo based watermark.The initial quantization value λ is set to 32 resulting a tree-depth of d = 6.d is a user defined parameter which we varied for further investigation and reported results in following sections.In generating the code-stream, atoms are defined by grouping every 16 consecutive binary-trees.The code-stream is generated by organizing hierarchically nested distortion-robustness atoms, generated in 2 individual passes.1) Proof of the concept: Once the code-stream is generated, set of watermarked images are produced by truncating the code-stream at different embedding-distortion rate points, Φ, defined in Eq. ( 3).The results for four test images, Boat, Barbara, Blackboard and Light House are shown in Fig. 12.As the embedding process creates a hierarchical code-stream, for various Φ values, watermark strength varies accordingly, i.e., higher Φ corresponds to higher watermarking strength for a given data capacity.As a result, with increased value of Φ, high embedding distortion is introduced in the watermarked images and hence the visual image quality degrades as shown in the above mentioned figures.However, with higher watermarking strength, the robustness performance also improves.The overall embedding distortion performance for the test images, measured by PSNR and the robustness performance (Hamming distance) at various Φ values is shown in Fig. 13.The x-axis of the plots shows Φ.The y-axis shows the PSNR in plots in Row 1 and the Hamming distance in plots in Row 2.
It is evident from these plots, that a higher embeddingdistortion rate, i.e., higher watermarking strength, results in poor PSNR but offers higher robustness.However, a trade-off can be made based on the application scenario by selecting an optimum embedding-distortion rate to balance the visual quality and robustness.The proposed algorithm is sensitive to image contents and does not enforce coefficients to achieve the highest robustness.For example with minimum φ, the Hamming distance (H) relating robustness may not be 0 which gradually improves with higher value of φ.This is useful to preserve the image quality by controlling the embedding distortion rate.However the algorithm ensures that the robustness is within permissible limit.The interpretation of Hamming distances for practical use is discussed in [24] which proposed that H < 0.2 ensures correct extraction of the watermark.
2) Verification of the scheme against bit-plane discarding: The proposed watermarking scheme incorporates bit-plane discarding model and the experimental verifications for the same are shown in Fig. 14.The y-axis shows the robustness in terms of Hamming distance against the number of bit planes discarded (p) shown on the x-axis.Here different depth (d) values with the minimum embedding distortion rate, Φ min and the maximum embedding distortion rate, Φ max , values are chosen to verify our arguments in Eq. ( 10) and Eq. ( 16).At Φ min , the condition of correct watermark extraction is given in Eq. ( 10) and the same is evident from the results shown in Fig. 14.At Φ max , all coefficients are associated with EZ or EO and the necessary condition to extract watermark is discussed in Eq. ( 16), which is supported by the simulation results as shown in Fig. 14.For example, at d = 6, for Φ min , correct watermark extraction is possible up to p = 3 and for Φ max , correct watermark is extracted up to p = 5 as shown in these plots.
3) Robustness performance against JPEG2000: Fig. 15 and Fig. 16 show the robustness performance of the proposed watermarking scheme against JPEG2000 scalable compression.We firstly verify the proposed scheme's robustness against JPEG2000 compression using different depth parameter, d, as shown in Fig. 15 followed by the watermark scalability at a given depth as shown in Fig. 16.An ITU-T.804JPEG2000 standard reference software3 is used for the experiments.These results compare the robustness for various Φ for a given d.In all the figures the x-axis represents the JPEG2000 quality compression ratio while y-axis shows the corresponding Hamming distances.
It is evident from the plots, that higher depth and higher Φ in a given depth, offer higher robustness to scalable content adaptation attacks.The watermark scalability is achieved by truncating the distortion-constrained code stream at various rate points with respect to Φ.With increased Φ more coefficients are associated with EZ or EO and hence improves the robustness by successfully retaining the watermark information at higher compression rates.The results show that more than 35% improvement in robustness when comparing two consecutive depth levels, d, and more than 60% improvement between Φ min and Φ max at a given depth.
Trade-off between embedding distortion performance and robustness against JPEG2000 Φ are shown in Fig. 17  resents PSNR (dB), the vertical red line represents average Φ and the horizontal black line shows Hamming distance of 0.1.These graphs are useful to indicate the robustness performance of a given image at various embedding-distortion rate.For better imperceptibility one may choose suitable value of Φ from the left section of the graph while for a target compression ratio, any value of Φ can be selected as long as a target Hamming distance (e.g., the horizontal black line here corresponds to H=0.1 for illustration purpose) is achieved.A trade off can be made around a region where two perpendicular lines meet.
4) Robustness performance comparison with existing nonscalable watermarking methods: This is the first scalable watermarking work of its kind and therefore comparisons are made only with the next best available non-scalable watermarking algorithms in the literature.In this section we compare our proposed algorithm with a popular blind requantization based watermarking scheme (non-scalable) used in [9]- [11], [14].These algorithms share a common embedding model [24] which rely on modifying various coefficients towards a specific quantization step.Herein we call it the existing algorithm.As proposed in [11], the existing algorithm is based on modifying the median coefficient towards the step size, δ, by using a running non-overlapping 3×1 window.The altered coefficient must retain the median value of the three coefficients within the window, after the modification.The    equation calculating δ is described as follows: where C min and C max are the minimum and maximum coefficients, respectively.The median coefficient, C med , is quantised towards the nearest step, depending on the binary watermark, b.The extracted watermark, b , for a given window position, is extracted by For fair comparison, we first calculate Φ for the existing watermarking algorithm and then set the same Φ for the experiments in the proposed method.The embedding performance is reported in Table II and the robustness against JPEG2000 compression is shown in Fig. 18.
In embedding distortion performance comparison, for similar Φ, the existing method shows a better overall embedding performance in terms of PSNR.However, the data capacity of the proposed algorithms are 3 times higher than the existing one.Therefore, using the new embedding-distortion metric, Φ, which considers embedding distortion and data capacity into a single metric, we can fairly compare the robustness performance of these two schemes for a given Φ.The results show that despite having three times more data capacity, the proposed algorithm outperforms the existing blind algorithm by an average improvement of 25% to 35% at higher compression ratios.This confirms that the new algorithm, coupled with the bit-plane discarding model, offers improvements in robustness against scalable compression over the existing algorithm which does not use the model.
The proposed algorithm adds a new avenue to watermarking strategies by offering a flexible scalable watermarking approach, i.e., to achieve the higher robustness at a high compression ratio, one can choose higher Φ and the effect on embedding distortion is neutralized by quantization in compression.An example is shown in Fig. 2 for Barbara image, where we compare the embedding distortion of the watermarked image after compression.The PSNR of the watermarked and the un-watermarked images are comparable at various compression points, while the watermarked image offers authenticity of the image with desired robustness H.

B. Scalable watermarking for video
Finally, we extend the proposed scalable watermarking scheme for videos using a 2D+t+2D motion compensated temporal filtering (MCTF) based video watermarking framework [33].2D+t+2D refers to a 3D video decomposition scheme where 2D and t represent a spatial and temporal decompositions, respectively.For example, t+2D is achieved by performing temporal decomposition followed by a spatial transform where as in case of 2D+t, the temporal filtering is done after the spatial 2D transform.Issues such as flicker and residue error propagation related to video watermarking were addressed in the framework by proposing motion compensated temporal filtering that considers object motion within frames.Therefore it is appropriate to choose the same framework to extend the proposed algorithm.In this work the watermarking code-stream is generated using the 2D+t decomposed host video, as described in the framework.The binary tree is formed using the motion compensated filtered wavelet coefficients.
Similar to the image watermarking case of the proposed algorithm, the watermarked video is generated at a given embedding distortion rate, Φ, either at individual frame level or at group of picture (GOP) level.For the experimental set shown in this work, Φ is calculated for every GOP, with a size of 8 frames per GOP.In extracting the watermark data, firstly the test video is decomposed using the 2D+t+2D framework with blind motion estimation, i.e., without any reference to the original video or motion vectors and then the binary tree is formed for the selected coefficients.The watermark extraction decision is made using the association rules described in Table I.
As recommended in the original framework, the experimental simulations in this work used 230 spatio-temporal subband decomposition where a 2-level 9/7 spatial decomposition is performed, followed by a 3-level MCTF-based temporal decomposition.In subband selection, LL s spatial subband is used in two different temporal subbands: LLL and LLH.In all cases, normalization is used during spatiotemporal decomposition.In the embedding procedure, depth parameter d is set to 6 with a data capacity of 6336.The performance of the algorithm is evaluated for various Φ, by comparing the embedding distortion and robustness against scalable compressions.
The embedding distortion is measured using PSNR and the results are shown in Fig. 19 for LLL & LLH temporal subband for the CIF resolution (352 × 288) test sequences Crew, Foreman and News.The x-axis shows the frame number while y-axis shows corresponding PSNR.The robustness performance is evaluated by comparing Hamming distance against scalable compression schemes, such as, Motion JPEG2000 and Motion Compensated Embedded Zero Block Coding (MC-EZBC) [34] .The results are shown in Fig. 20 From the results, it is evident that the concept of scalable watermarking is successfully realised within a video watermarking framework.With the increase in embedding distortion rate, Φ, the robustness performances are improved by 30% to 70% between Φ min and Φ max , while embedding distortion is also increased with increasing Φ.Conceptually as described before, a high Φ can be chosen where high compression is expected and a low Φ can be opted for high resolution video distribution, based on the end user's need.Therefore, a combined scalable watermarking and video encoding scheme can ensure secure multimedia distribution within scalable content adaptation scenarios.
V. CONCLUSIONS In this paper, we have proposed a novel concept of scalable watermarking and extended it to make watermarking robust against the quality scalable content adaptation attacks.To generate a scalable watermark, firstly a distortion constrained code-stream is generated by concatenating hierarchically nested joint distortion robustness coding atoms.The code-stream is then truncated at various embeddingdistortion rate points to create watermarked images, based on the distortion-robustness requirements.The extraction and authentication is done using a blind extractor.The algorithm is developed based on the bit-plane discarding model used in scalable content adaptation.

Fig. 2 .
Fig. 2. Application example where distortion due to watermark embedding is adaptively negated at various JPEG2000 compression ratio (CR) without compromising robustness.H stands for Hamming Distance (H ∈ 0..1), a commonly used metric of watermarking robustness.Lower value of H ensures higher robustness.
EO = Embedded One, CO = Cumulative One and WO = Weak One) to identify the original coefficient's association with a 0 or 1.The rational of allocating symbols associated with 0 or 1 relies on the total number of 0s or 1s in the symbol, i.e., two or more 0s correspond to 0 and vice versa for 1s.The bits in a binary tree, symbols and the corresponding associations are shown in TableIfor a tree depth of 7. Depending on the input watermark stream, if required, a new association is made by altering the chosen 3 most significant bits in the tree to reach the nearest symbol as shown in the state diagram in Fig.6.Assuming the current state of the binary tree is EZ, to embed the watermark bit 0 no change in state is required while to embed the watermark bit 1, a new value of the binary tree must be assigned.The new value of the tree can be associated with either WO or CO or EO.However to minimize the distortion, the nearest state change must occur as shown in the state diagram.Other state changes in the binary tree follow the same rule.Finally, the watermarked image / video is obtained by de-quantizing the modified binary tree followed by an inverse transformation.For example, if a modified binary tree b(C) = 101101 of depth d = 6, the embedded coefficient will be C = 1 * 2 5 +0 * 2 4 +1 * 2 3 +1 * 2 2 +0 * 2 1 +1 * 2 0 = 45.

Fig. 6 .
Fig. 6.The state diagram of watermark embedding rule based on tree-symbolassociation model.

Fig. 9 .
Fig. 9. Quantization in the compression scheme considering N level bit-plane discarding.

(a) Case 1 :Fig. 10 .
Fig. 10.Effect of bit plane discarding in watermark extraction; λ = 2 M and N is the number of bit plane being discarded.
as shown in Fig. 10.(a) and

Fig. 11 .
Fig. 11.Effect of bit-plane discarding in watermark extraction for special case of EZ and EO; λ = 2 M and N is the number of bit-plane being discarded.

Fig. 14 .
Fig. 14.Robustness against discarding of p bit planes for various d at minimum and maximum Φ.

Fig. 17 .
Fig. 17.PSNR and robustness against JPEG2000 vs Φ at d = 6 graph.The x-axis represents Φ, y-axis on the left represents corresponding Hamming distance (H) against JPEG2000 content adaptation at various compression ratio (CR), y-axis on the right represents PSNR (dB), the vertical red line represents average Φ and the horizontal black line shows Hamming distance of 0.1.
, Fig. 21 and Fig. 22 for Crew, Foreman and News sequences, respectively.The left and right columns in these figures represent the robustness performance against Motion JPEG2000 and MC-EZBC, respectively.Results for LLL subband are shown in Column 1 & 3 and LLH in Column 2 & 4. In all the cases the x-axis shows the compression ratio / bit rates and the corresponding Hamming distances are shown in y-axis.The Hamming distances are calculated by averaging the individual frame level Hamming distances of each test sequence.
. The

TABLE I TREE
-BASED WATERMARKING RULES TABLE