Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000

Artificial intelligence (AI) is of great potential for improving the performance of image processing and applications. In this study, we incorporate two AI techniques, namely, the grey wolf optimizer (GWO) and denoising convolutional neural network (DnCNN), within a framework developed based on the quaternion discrete cosine transform (QDCT). Binary embedding is formulated according to the attribute of each QDCT component and the distinctive properties of available modulation schemes. The GWO is responsible for performance optimization, while the DnCNN makes the extracted binary watermark more visually recognizable. Experiment results demonstrate the efficacy of the proposed scheme for resisting a variety of image processing attacks. The proposed scheme outperforms existing ones in terms of the robustness and intelligibility of the retrieved watermarks under the same payload capacity. INDEX TERMS Blind color image watermarking; Grey wolf optimizer; Denoising convolutional neural network; Quaternion discrete cosine transform; Mixed modulation.


I. INTRODUCTION
The widespread availability of editing tools for digital content has led to increasing levels of theft and manipulation of intellectual property. Digital watermarking is a common approach to protecting the rights of digital content owners. In case the digital content has been misappropriated or plagiarized, an examination of retrieved watermarks via the publicly available algorithm can easily reveal illicit attempts. Apart from the application to the authentication of audio, video, image, and data files [1][2][3][4], watermarking is also important for new imaging modalities such as light field, hologram [5], and point cloud [6]. Watermarks are judged according to their ability to resist tampering or removal while remaining imperceptible to the casual observer [7].
Image watermarking techniques can be divided into those implemented in the spatial domain and those implemented in the transform domain. Watermarking in the spatial domain is performed by adjusting image pixels directly. Watermarking in the transform domain involves converting image pixel values in the spatial domain into transform coefficients supposed to use in watermark embedding and extraction. Watermarking schemes in the transform domain impose computational requirements higher than those in the spatial domain; however, they tend to be far more robust against common attacks using a comparable amount of information. Typical methods in the transform domain include those based on the discrete cosine transform (DCT) [8][9][10][11], discrete wavelet transform (DWT) [12,13], and discrete Fourier transform (DFT) [14,15].
The quaternion is a useful mathematical tool that enables algebraic operations in a four-dimensional vector space. Because a color image can be regarded as a twodimensional array with an extra depth constituted by three primary colors, the algebraic properties of the quaternion make it attractive to handle color image watermarking. When the quaternion is applied to image watermarking, there are two ways to process the image content: grayscale and color. In grayscale image watermarking, the quaternion wavelet transform (QWT) [16,17] renders a shift-invariant magnitude and three accompanying components (with two of them representing local image shifts and the third one delivering textural information). The watermark is then embedded in the coefficient of magnitude component to improve imperceptibility and robustness. For the case of color image watermarking, a color image can be cast into a 2 VOLUME XX, 2021 real component along with three imaginary components. The watermark is then embedded into the selected components. Such quaternion-based watermarking makes it possible to take into account three color channels as a whole and achieve a balanced improvement in the payload capacity, imperceptibility, and robustness. This sort of quaternion-based approach includes the quaternion discrete cosine transform (QDCT) [18], the quaternion discrete Fourier transform (QDFT) [3,19], and the quaternion discrete fractional random transform (QDFRT) [20]. Most watermarking schemes rely on the control of adjustable parameters to achieve the desired performance. The optimization method is the most popular approach for this purpose. Optimization methods such as the teachinglearning-based optimization (TLBO) [21], support vector machine (SVM) [22,23], particle swarm optimization (PSO) [24], and grey wolf optimizer (GWO) [25] are known for their capabilities of exploiting the collective behavior of organized systems to iteratively work through a large number of candidate solutions. The DCT-based watermarking scheme proposed by Moosazadeh and Ekbatanifard [26] adopted the low-frequency DCT coefficients as the basis for watermark embedding. The TLBO aimed at determining the optimal position and embedding strength for watermark embedding. Chen et al. [20] developed a blind color watermarking scheme based on the QDFRT. Their scheme exploited the properties of the human vision system (HVS) to adaptively adjust the watermark strength, along with the employment of a random number matrix to enhance security. The SVM was responsible for acquiring the watermark through the relationship between the embedding position and the neighboring coefficient. The scheme proposed by Li et al. [18] transformed the host image into the QDCT domain, where the watermark was embedded in the coefficient of the unitary matrix after singular value decomposition (SVD). In their scheme, the PSO helped to find the appropriate matrix embedding intensity factor for each image to obtain better imperceptibility and robustness. Similar to the preceding approach, Hsu and Hu [3] developed a blind watermarking scheme based on the QDFT. Their scheme used multi-bit partly sign-altered mean modulation to embed watermarks in each QDFT block. The PSO also played the role of optimizing the embedding strength and coefficients of selected components in the QDFT domain.
This paper is focused on the development of an efficient and effective blind color image watermarking scheme that exploits the merits of QDCT and optimization modeling. To promote the performance even further, we incorporate a denoising convolutional neural network (DnCNN) into the proposed scheme to enhance the retrieved watermark. Convolutional neural networks (CNNs) have proven highly effective for data analysis in high-dimensional spaces, such as image classification [27], image denoising [28], and image recognition [29]. Zhang et al. [30] demonstrated the use of a feed-forward denoising convolution neural network (DnCNN) to eliminate Gaussian noise of unknown levels [31]. In this study, we resort to the DnCNN to render a watermark that is visually more recognizable.
The remainder of this paper is organized as follows. Section II outlines the technical backgrounds involved in implementing blind color image watermarking using QDCT. Section III details the procedures involved in the proposed watermark embedding and extraction processes. Section IV compares the performance of the proposed scheme with that of other DCT-related schemes in terms of imperceptibility and robustness. Section V addresses the watermarking enhancement through the DnCNN denoising. Finally, concluding remarks are summarized in Section VI.

II. QUATERNION DISCRETE COSINE TRANSFORM (QDCT) OF A COLOR IMAGE
The outstanding ability of DCT to transform the energy of an image signal into low-frequency coefficients makes it ideal for image compression and watermarking. For a monochrome image of size MN  , the DCT conversion pair between the spatial-domain signal (termed ( , ) f m n ) and spectral-domain representation (termed ( , ) F p q ) are given as follows: A color image is a composite representation of red, green, and blue components, the values of which indicate the light intensity required to describe the digital image. Accordingly, a digital color image is usually stored as a matrix of size 3 MN . In a quaternion representation, a color image can This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. f m n correspond to the ( , ) th mn pixel values of the red, green, and blue components, respectively. Note that i , j , and k denote three complex operators, which can be interpreted as unit vectors pointing along the three spatial axes. These three unit-vectors (i.e., i , j , and k ) hold the following relationships: Note that Eq. (4) distributes the color image into three imaginary parts and leaves out the scalar part as zero. According to the formulation in [32], the quaternion discrete cosine transform (QDCT) derived from quaternion algebra and classical 2-dimensional DCT can be expressed as subject to the constraint that D p q and ( , ) E p q respectively represent the vector components of the ( , ) th pq quaternion coefficient. Specifically, Following the formulation in [33], the inverse conversion from ( , ) Theoretically,

( , )
A f m n shall remain at zero.

III. PROPOSED BLIND WATERMARKING SCHEME
In this section, we discuss how to implement binary watermarking using suitable modulation schemes under the QDCT framework, hereafter referred to as watermark QDCT (wQDCT). The out-of-range errors problem encountered in watermark embedding can be settled through extreme pixel adjustment (EPA). The GWO is then employed to optimize the performance in imperceptibility and robustness. Finally, a DnCNN is introduced to make the extracted binary watermark more visually recognizable. Figure 1 illustrates the architecture of the proposed watermarking scheme.

A. BINARY WATERMARKING
Analogous to the manners seen in DCT-based watermarking, watermarking in the QDCT domain involves partitioning the host image into non-overlapping blocks of 8×8 pixels, taking the QDCT of each block via Eq. (6), and manipulating the selected coefficients according to prescribed rules. While converting the modified QDCT coefficients back to the spatial domain representation, it is important to ensure  Table I lists the statistical means and standard deviations of the QDCT coefficients situated in the second to fourth antidiagonals, which roughly cover the low-to-medium frequency range. In general, embedding a watermark in the low-frequency region is robust against high-frequency attacks like lowpass filtering and image compression; however, it remains vulnerable to low-frequency attacks such as unsharp filtering and histogram equalization. The third and fourth anti-diagonals in mid-frequencies appear an acceptable compromise for a wide variety of attacks. As shown in Table I, the statistical means of the QDCT coefficients under investigation are close to zero. Among the four components, ( , ) A p q presented the largest standard deviation for each QDCT location. Overall, the three imaginary components ( , ) C p q , ( , ) D p q , and ( , ) E p q have similar standard deviations with the smallest one associated with ( , ) C p q . In light of these observations, we adopted different schemes to cope with the corresponding characteristics of the QDCT components.
There are two popular modulation schemes for DCT-based watermarking: quantization index modulation (QIM) [35] and relative modulation (RM) [26]. The QIM dichotomizes the selected DCT coefficients into alternating zones, whereas RM manipulates the coefficients according to a reference level. RM has proven highly effective in terms of robustness; however, it occasionally results in excessive distortion. Hu and Hsu [36] proposed a remedial scheme referred to as mixed modulation (MM), which grafts QIM onto RM to overcome the deficiency of RM.
The observation on the statistical distribution of quaternion components allows us to choose among the available modulation schemes to optimize overall performance. As revealed by Eq. (12), ( , ) A p q plays no role in the formation of ( , ) A p q changes. This makes ( , ) A p q an ideal choice for watermarking in the QDCT domain. Nonetheless, Eqs. (13)-(15) also indicate that any modification of ( , ) A p q simultaneously alters the RGB values. In view of the large standard deviation associated with ( , ) A p q , we employed MM to embed binary bits into ( , ) A p q . The use of MM to embed the watermark bit b w (i.e., 0 or 1) into the coefficient is implemented in two steps. The first step involves the execution of RM and QIM using Eqs. (16) and (17), respectively: where    and    respectively denote the floor and ceiling functions, ˆ( , ) pq denotes the chosen location,  refers to the quantization step of QIM,  is the ground level in the upward direction, and  denotes the threshold separating RM from QIM. Note that  and  set at 2 and 1.5  in this study. As shown in Eqs. (16) and (17), as long as

ˆ( , )
A p q falls within   , RM is used; otherwise, QIM is used. Once ˆ( , ) A p q and ˆ( , ) A p q are available, the next step is to identify a better solution that induces smaller alteration to the block, as follows:  Apart from the use of A component in the MM, the other components C , D , and E can also be used to embed binary bits as long as the constraint ( , ) 0 A f m n  is satisfied. The small standard deviation and mean of ( , ) C p q makes it an ideal candidate for RM, since the damage imposed by RM is mild for a QDCT component with a rather concentrated distribution. The regular patterns in Eqs. (13)-(15) provide a convenient pathway to implement RM on a selected ˆˆ( , ) C p q subject to the constraint ( , ) 0 A f m n  via compensatory adjustments on ˆˆ( , ) D p q and ˆˆ( , ) E p q . In mathematical form, we obtain the following: where  refers to a clearance threshold. The above RM formula (Eq. Watermark extraction for the RM is relatively straightforward. Once the QDCT coefficients of the watermarked color image are obtained, we examine the sign of the designated imaginary component. A positive value indicates the embedding of a binary "1", whereas a negative value corresponds to a binary "0". Specifically, where the tilde indicates that the variable was acquired from a watermarked image following an attack. By contrast, watermark extraction of the MM is relatively complicated. The watermark bit is determined by examining the retrieved ˆ( , ) A p q using a quadruple branching process. The RM is used when ˆ( , ) A p q falls within   ; otherwise, QIM is employed. The MM can be expressed in mathematical form as follows: where mod( , 2)  denotes the modulo function with a modulus of 2.

B. EXTREME PIXEL ADJUSTMENT (EPA)
After a watermark is embedded, all of the pixels are converted to 8-bit unsigned integer values between 0 and 255. Nonetheless, out-of-range errors can occur during the process of storing the watermarked image [37]. Pixels that are close to extreme pixel values in the original image block can easily exceed the legal interval [0, 255] after the watermark is embedded. The rounding up or down of these values to 0 or 255 when the watermarked image is stored as a digital file can lead to the loss of watermark information. Hsu and Hu [3] resolved the out-of-range error by adopting the following extreme pixel adjustment (EPA) function: , otherwise where I denotes the pixel value in one channel. 1  and 2  respectively account for the adjustment range and boundary reduction, and 12   . With the activation of the EPA, the values of 1  and 2  are both set at 1 initially and 1  will increase by 0.5 iteratively until a faultless watermark retrieval can be resolved. accordance with Eq. (22). If the watermark bit is embedded in the imaginary coefficient component C (or D , E ), then RM is used to extract the watermark bit per Eq. (21). The watermark image is then reconstructed using the inverse scrambling algorithm, of course, with the secret key.

D. PERFORMANCE OPTIMIZATION VIA THE GWO
The GWO algorithm was inspired by the hunting mechanism and leadership hierarchy of grey wolves, in which the population is divided into four types: Alpha, Beta, Delta, and Omega [25]. In the original GWO, the most suitable solution is the Alpha, followed respectively by Beta, Delta, and Omega. The hunting mechanism involves searching for prey, encircling it, and then attacking. In searching for prey, the GWO seeks to discover new parts of the searching space by applying sudden changes to the solution. In encircling and attacking the prey, the main goal is to improve the estimated solution obtained during the exploration process by discovering the neighborhood of each solution.
In the present work, the representation of wolf Z in the GWO is as follows: .., zz xx are for the three imaginary components). The spacing parameters and the embedding coefficient locations determine the overall performance of the watermarking scheme. The optimal set of the parameters can be pursued by designating each candidate as a wolf in the GWO and monitoring a combined objective function ) ( z f X of the imperceptibility and robustness defined as follows: where  ,  , and  respectively represent the acceptable lower bound values of PSNR (peak signal-to-noise ratio), MSSIM (mean structural similarity), and BER (bit error rate). The objective function comprises PSNR and MSSIM (to estimate image distortion between the original and watermarked images) and BER (to determine the  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
In the proposed wQDCT, the GWO seeks to identify the optimal spacing parameters and coefficient locations with the aim of achieving a suitable tradeoff among imperceptibility, robustness, and payload. The implementation of GWO in wQDCT for blind watermarking color images is illustrated in Fig. 4. Note that the acceptable lower-bounds of PSNR  , MSSIM  , and BER  were respectively set at 0.925 (PSNR = 37dB), 0.95, and 0.8. The optimal spacing parameters and coefficient locations obtained through the GWO were then installed in the public algorithm for subsequent watermark extraction. To enhance security, the watermarks were permuted using the Arnold transform [38] with chaotic mapping [39], the encryption keys shall be agreed upon in advance by the sender and receiver.
To seek the optimal spacing parameters and coefficient locations of wQDCT, we examined the outcomes of four tentative images (namely, Lena, Baboon, Avion, and Peppers) in presence of three types of image attacks including JPEG compression with a quality factor (QF) of 30, 1% S&P noise, and histogram equalization while simulating the score of each wolf. For the hardware platform equipped with an Intel® Core (TM) i9-9900K CPU @ 3.60GHz, 32GB RAM, and RTX 2080 graphics card under the Matlab® environment, the average computation time required for each wolf to process the abovementioned four images and three attacks in each iteration is 69.51 seconds. As the number of searching agents (i.e., wolves) is set at 20, the total time required by the GWO for 100 iterations is thus 139,020 seconds. Figure 5 depicts the  convergence curve of the GWO searching. As seen in Fig. 5, although we adopted the results of the 100 th iteration, the objective function converged quickly after 20 iterations.

E. DENOISING CONVOLUTIONAL NEURAL NETWORK (DnCNN)
The DnCNN is the most popular deep learning architecture used for image denoising. The DnCNN consists of a series of Conv, BN, and ReLU layers. Conv refers to a convolutional layer used for the automatic extraction of features. BN denotes batch normalization aimed at improving the speed to convergence in training and reducing the influence of the network on initialization variables. ReLU signifies an activation function commonly used in artificial neural networks. The DnCNN model creates various combinations of the Conv, BN, and ReLU layers to perform image denoising tasks, such as Gaussian denoising, single image super-resolution, and JPEG image deblocking [30]. The role of the DnCNN is not intended to further improve the robustness of the proposed wQDCT. Rather, its function is focused on enhancing the texture of the extracted watermark following a possible attack, thus making the watermark more recognizable. The DnCNN considered here adopts a residual learning strategy to draw the latent clean image from noisy observation. During the training stage, the output of the network is just a residual image, and the optimization goal is to reduce the binary cross-entropy between the actual residual image and network output. When a well-trained DnCNN model is used in the testing phase, the information (i.e., weights and biases) stored in the model will be used to eliminate noise in the input noisy image followed by reconstructing an image closer to the original image. The inputs to the DnCNN were 81,184 image patches of size 64×64 acquired by partitioning binary images of plain-text articles into non-overlapping blocks of prescribed size. It is particularly pointed out that the training data set did not contain the watermarks later used in the testing phase. We trained the modified DnCNN for binary denoising using the following architecture and parameters. Next, noise with three different densities (i.e., 0%, 5%, and 10%) was added to these patches. Depth D L was set at 16 and each layer included 64 filters. Figure 6 illustrates the DnCNN network employed to denoise the noisy watermark. We employed adaptive moment estimation (Adam) to optimize the DnCNN with the learning rate set as 0.001 for epochs 1-30, 0.0001 for epochs 31-60, and 0.00005 for epochs 61-200. Note that the modeling parameters involved in the above DnCNN were all tuned offline in advance. Once the DnCNN passed through the training and verification stages, it could cooperate online with the wQDCT to render a clearer and more distinguishable watermark recovered from the watermarked image.

IV. PERFORMANCE EVALUATION OF THE PROPOSED wQDCT
The proposed watermarking algorithm was evaluated in aspects of invisibility, robustness, and embedding capacity. The test dataset in the experiments included 64 different 512×512 24-bit color images obtained from [34]. In this study, we used binary images as the watermarks so that performance evaluation can also be achieved via visual inspection. The sizes of the watermark images were 64×64, 192×64, and 128×128, which respectively corresponds to 1,  3, and 4-bit embedding in every color image block of size 8×8. The 1-bit watermark images and corresponding scrambled versions are shown in Fig. 7. As shown in Fig. 7, there were similar numbers of "1s" and "0s" in the scrambled watermarks; Nonetheless, the distribution of the scrambled binary bits was different from each other. Figure  8 illustrates the original and watermarked "Lena" images based on the proposed wQDCT. In Fig. 8, the naked eye visual inspection cannot discern whether the image contains a watermark. A variety of attacks were implemented in the experiment to examine the robustness of the proposed watermarking algorithm, as shown in Table II. The performance of the proposed wQDCT watermarking scheme with the GWO incorporated was compared with that of different transform-based watermarking schemes. To facilitate the exposition in the subsequent discussion, we adopted the three symbols, P ,  , and  , to denote the proposed, non-optimized, and optimized watermarking schemes, respectively. 1  ) respectively denote the cases of embedding 1 and 3 watermark bits in each block using the proposed wQDCT. Table III summarizes the average imperceptibility and robustness obtained from 64 images using various blind watermarking schemes. As shown by the PSNR values in Table III, the image quality of the abovementioned schemes was similar under deliberate control. The PSNR of 3 DCT  was good when entropy selection was enabled but dropped when entropy selection was disabled. The resultant MSSIMs generally remain roughly at the same level as seen in the PSNRs. The MSSIM values of most schemes were (e) (f) (g) A, (e) 1-bit in component C,(f) 3-bit in component C, and (g) 4-bit in component C.    Table II, except for JPEG compression (B1-B4).  [3] wQDCT wQDCT-D 1(1,1)  Note that the CS attack (D1-D4) considered in the experiment is the version developed by Metzler et al [41]. As long as the CS was assigned with higher ratio measurements, all compared watermarking schemes were able to withstand the CS attack. However, the proposed wQDCT showed effective resistance against the CS with mid-to-high ratio measurements whether in the cases of Table III also presents the results after the employment of wQDCT and DnCNN. For convenience, we used the abbreviation "wQDCT-D" to signify the presence of DnCNN during watermark extraction. Except for the extreme cases where the BER was nearly 0 or around 50%, the wQDCT-D considerably reduced the BER values attainable by the wQDCT with the highest reach around 10%. Nevertheless, the DnCNN might contribute a slight degradation of BER if the retrieved watermark was already error-free. In the category of wQDCT-D, both 3

(GWO)
A QDCT P and 3 (GWO) C QDCT P suffered a tiny loss of BER from 0.00% to 0.01%.
The normalized cross-correlation (NCC) metric represents another aspect of the robustness of the watermarking scheme against various attacks [42]. The larger the NCC value is, the better the robustness.    conditions. The NCC exhibited a similar tendency roughly equal to 12BER  . Overall, the proposed wQDCT-D outperformed other compared schemes in most attacks. After applying the DnCNN, the wQDCT-D consistently showed better NCC values.
To conclude the above discussion, we note that 1(SVM) QDFRT  uses a random number matrix to enhance security but the random number matrix leads to a considerable impact on different images. 1  JPEG compression attacks, watermarks embedded in the imaginary components are more reliable than those embedded in the real component. The incorporation of the DnCNN with the wQDCT will render better BERs, which subsequently allows the extracted watermarks to be more distinguishable.

V. WATERMARK ENHANCEMENT THROUGH DENOISING
In this experiment, a 4-bit watermark composition was adopted to demonstrate the competence of the DnCNN and to explore the feasibility of embedding the watermark into different sorts of components. Table V lists Table V were similar. The proposed wQDCT-D was shown to suppress the BER especially for the range between 10% and 40%. However, when the BER was above 40% or below 1%, the performance of the DnCNN deteriorated. The reason is due 14 VOLUME XX, 2021 to that there was hardly any useful information left in the watermark once the BER exceeded 40%. On the other hand, artifact errors tended to occur in case the BER fell below 1%. The DnCNN improved the performance of  Table V suggest that when multiple watermark bits are embedded, allocating slightly more bits in the real component than in the imaginary component (e.g., Finally, Figure 9 illustrates the effect due to the incorporation of DnCNN. Within each grid, the left half image highlighted with a blue box shows the watermark extracted from the wQDCT with Lena, while the right half enclosed by a cyan box displays the result of 2 2 (GWO) AC QDCT P after denoising. Clearly, most of the text images are easier to recognize after undergoing all attacks except for JPEG compression.

VI. CONCLUSION
This paper presents a QDCT-based watermarking scheme jointly exploiting the EPA, MM, and RM, and artificial intelligence techniques GWO and DnCNN. The GWO enables the proposed wQDCT scheme to identify the optimal control parameters as well as QDCT coefficient locations to achieve a satisfactory performance between robustness and imperceptibility. The DnCNN acts as an auxiliary appliance to refine the extracted watermark. In a series of experiments, the proposed wQDCT-D scheme not only demonstrated superior robustness but also enhanced the comprehensibility of the retrieved watermarks. The watermark embedding in the real component of QDCT proved effective in withstanding JPEG compression, while embedding in the imaginary components (i.e., C, D, and E) proved effective against unsharp filtering and histogram equalization. The distributive embedding in the real and imaginary components makes it possible to defend against a variety of attacks when embedding multiple watermarks. Overall, the proposed scheme outperforms other DCTbased schemes in terms of robustness when the embedding strength and payload capacity are set at the same level.