1 Introduction

Reversible data hiding (RDH) [1], as being a special branch of information hiding, has received a lot of attention in the past few years. It is not only concerned about the user’s embedding data, but also pays attention to the carriers themselves. RDH ensures that the cover data and the embedded message can be extracted from the marked content precisely and losslessly. This important data hiding technique provides valuable functions in many fields, such as medical image protection, authentication and tamper carrier recovery, digital media copyright protection, military imagery and legal, where the cover can not be damaged during data extraction. A framework of RDH for digital images is illustrated in Fig. 1.

Fig. 1.
figure 1

Framework of RDH embedding/extraction.

In order to improve the efficiency of RDH, researchers have proposed many methods in the past decades. Generally speaking, prediction and sorting play an important role in the process of reversible information hiding. The prediction focuses on how to better exploit inter-pixel correlations to derive a sharply distributed one. And the emphasis of sorting technique is exploiting the correlation between neighboring pixels for optimizing embedding order.

The improvement of the prediction is important for both histogram shifting and difference expansion based on RDH schemes. Some predictors [4] have proposed such as median edge detection (MED), gradient adjusted prediction (GAP), and differential adaptive run coding (DARC). The MED predictor tends to select the lower vertical neighbor in the cases where a vertical edge exists right to the current location, the right neighbor in cases of a horizontal edge below it, or a linear combination of the context pixels if no edge is detected. The GAP algorithm weighs the neighboring pixels according to local gradient and classifies the edges to three classes namely, sharp, normal, and weak, which uses seven neighboring pixels to estimate unknown pixel value. The DARC is a non-linear adaptive predictor that uses three neighboring pixels to estimate the unknown pixels. Dragoi and Coltuc [3] have proposed extended gradient-based selection and weighting (EGBSW) for RDH. EGBSW [3] algorithm uses four linear predictors, computing the output value as a weighted sum between the predicted values corresponding to the selected gradients, then predicted value is obtained.

Sorting [5, 6] is a fundamental step to exploit the correlation between neighboring pixels for optimizing embedding order, hence sorting is a fundamental step to enhance the embedding capacity and visual quality. Kamstra and Heijmans’ [7] use of sorting introduced a significant performance advantage over previous methods. They reduced the location map size through arranging the pairs of pixels in order. Sachnev et al. [5] used local variance values to sort the predicted errors. They sort the cells in ascending order of the local variance values, which first embeds the smoother cells with lower local variance values. But in some cases, it does not work directly accurate. For example, Afsharizadeh [6] extended Sachnev et al.’s work [5] by using a new proposed efficient sorting technique, which is a new sorting measure resulted from more accurate sorting procedures. Ou [8] proposed a simple and efficient sorting method by calculating its local complexity, which is the sum of absolute differences between diagonal blank pixels in the 4 \(\times \) 4 sized neighborhood. A small local complexity indicates that the pair is located in a smooth image region and should be used preferentially for data embedding. However, the above algorithms did not consider the characteristic of prediction-error distribution.

However, we notice that the existing research about RDH is commonly focused on gray-scale images. In real life, it is the color images that are widely used. In recent days, reversible data hiding for color images is a rarely studied topic in [9, 10]. Considering that the color channels correlate with each other, Li \(\textit{et}\) \(\textit{al.}\) [11] propose a RDH algorithm based on prediction-error expansion that can enhance the prediction accuracy in one color channel through exploiting the edge information from another channel. By doing so, the entropy of the prediction-error is decreased statistically. Based on the inter-channel correlation of color image Li \(\textit{et}\) \(\textit{al.}\)[11], a novel inter-channel prediction method is examined and a corresponding reversible data hiding algorithm is proposed.

In our proposed method, the prediction error is obtained by an interchannel secondary prediction using the prediction errors of current channel and reference channel. Experiments show that this prediction method can produce a shaper second order prediction-error histogram. Then, we will introduce a novel second order perdicting-error sorting (SOPS) algorithm, which make full use of the feature of the edge information obtained from another color channel and high correlation between adjacent pixels. So it will reflect the texture complexity of current pixel better. Experimental results demonstrate that our proposed method outperforms the previous state-of-arts counterparts significantly in terms of both the prediction accuracy and the overall embedding performance.

The paper is organized as follows. Section 2 describes the proposed second order perdicting-error sorting algorithm for reversible data hiding. The simulations done using the proposed technique and the obtained results are presented in Sect. 3. In Sect. 4, conclusions are briefly drawn based on the results.

2 Second Order Perdicting-Error Sorting (SOPS) for Color Image

2.1 Second Order Perdicting-Error Based on Correlation Among Color Channels

Naturally, the edges of images play a critical role in human visual system (HVS), which is revealing with jump in intensity. How to use the property to predict pixel is very important. Some predictors have proposed for RDH such as median edge detection (MED) and gradient adjusted prediction (GAP) [2] may be ineffective for the rough region. The perdicting-error in this situation usually differs greatly from the pixel because of the large difference between the pixels along the gradient direction in rough region. However, more precise perdicting-error can be obtained by the pixels, which is along the edge direction. Especially for color image, we can make full use of the feature of the edge information obtained from another color channel and high correlation between adjacent pixels. In [11], Li \(\textit{et}\) \(\textit{al.}\) pointed out that the edge information drawn from different color channels is similar to each other. Hence, if an edge is detected in one color channel, there would be an identical edge in the same position in the other channels.

In each channel, we use double-layered embedding method proposed by Sachnev \(\textit{et}\) \(\textit{al.}\) [13], with all pixels divided into the shadow pixel set and the blank set (see Fig. 2). In the first round, the shadow set is used for embedding data and blank set for computing predictions. While in the second round, the blank set is used for embedding and shadow set for computing predictions. Since the two layers embedding processes are similar in nature, we only take the shadow layer for illustration.

Fig. 2.
figure 2

The current pixel and its neighboring pixels

Let \(p_c\) and \(p_r\) denote the sample of a pixel p in the current channel and reference channel. In order to determine whether the pixel is located on the edge of the image, we need to calculate the two parameters. The average distance \(D_{avg}\) and direction distance \(D_{dir}\) can be given by

$$\begin{aligned} D_{avg}= \biggl | \sum _{k=1}^{8} \alpha _{avg}^k p_r^k-p_r \biggr | \end{aligned}$$
(1)

where \(\alpha _{avg}^k\) \((\mathrm{{k}}=1,2, \ldots 8)\) are coefficients of eight neighbors \(p_r^1=p_r^{nw}\), \(p_r^2=p_r^{n}\), \(p_r^3=p_r^{ne}\), \(p_r^4=p_r^{w}\), \(p_r^5=p_r^{e}\), \(p_r^6=p_r^{sw}\), \(p_r^7=p_r^{s}\) and \(p_r^8=p_r^{se}\). If \(p_r\) locate at smooth region, the average distance \(D_{avg}\) should be small. On the contrary, if \(p_r\) locate at rough region, the \(D_{avg}\) is very large. However, how to determine the direction of the edge in the image. Next, we determine the direction by calculating the direction distance \(D_{dir}\) as follow

$$\begin{aligned} D_{dir}&= min \biggl \{ \biggl | \frac{p_r^w+p_r^e}{2}-p_r \biggr |, \biggl | \frac{p_r^n+p_r^s}{2}-p_r \biggr |, \\ \nonumber&\quad \qquad \biggl | \frac{p_r^{nw}+p_r^{se}}{2}-p_r \biggr |,\biggl | \frac{p_r^{ne}+p_r^{sw}}{2}-p_r \biggr | \biggr \} \end{aligned}$$
(2)

where \(|(p_r^w+p_r^e)/2-p_r|\), \(|(p_r^n+p_r^s)/2-p_r|\), \(|(p_r^{nw}+p_r^{se})/2-p_r|\), and \(|(p_r^{ne}+p_r^{sw})/2-p_r|\) represent the four edge directions, which are horizontal, vertical, diagonal and antidiagonal. Taking the smallest one as \(D_{dir}\). Edges are revealing with jump in intensity, for instance, the smallest means the pixels are locate on the edges of image.

The \(|D_{avg}-D_{dir}|\) can be used to indicate whether the reference sample is located on an edge region. Considering that all the color channels have similar edge distribution in [11], the prediction of the pixels should be taken into account the edge information obtained from another channel. Therefore, we can employ \(|D_{avg}-D_{dir}|\) to classify the location of the current sample \(p_c\). If the \(|D_{avg}-D_{dir}|\) is close to zero or very small, which means the current sample \(p_c\) locate on the smooth region of image. In this case, the edge information obtained from another channel is useless, we can make full use of the eight neighbors of \(p_c\). On the other hand, if \(|D_{avg}-D_{dir}|\) is larger than a predefined threshold \(\rho \), we think that \(p_c\) is located at or near to an image edge area at a high possibility. Under the circumstances, the edge information obtained from another channel should be taken into account. So, we can get

$$\hat{p}_c=\left\{ \begin{array}{ll} \lfloor (p_c^w+p_c^e+p_c^n+p_c^s)/4+0.5 \rfloor &{}{|D_{avg}-D_{dir}| \le \rho },\\ \lfloor P(p_c^k|D_{dir})+0.5 \rfloor &{}{|D_{avg}-D_{dir}| > \rho }. \end{array}\right. $$

where \(P(p_c^k|D_{dir})\) according to \(D_{dir}\). For instance, when \(D_{dir}=|(p_r^w+p_r^e)/2-p_r|\), then \(P(p_c^k|D_{dir})=(p_r^w+p_r^e)/2\). When \(D_{dir}=|(p_r^n+p_r^s)/2-p_r|\), then \(P(p_c^k|D_{dir})=(p_r^n+p_r^s)/2\). When \(D_{dir}=|(p_r^{nw}+p_r^{se})/2-p_r|\), then \(P(p_c^k|D_{dir})=(p_r^{nw}+p_r^{se})/2\). When \(D_{dir}=|(p_r^{ne}+p_r^{sw})/2-p_r|\), then \(P(p_c^k|D_{dir})=(p_r^{ne}+p_r^{sw})/2\).

Then we can get the first order perdicting-error based on correlation among color channels as follow

$$\begin{aligned} \varDelta e_c=p_c-\hat{p_c} \end{aligned}$$
(3)
$$\begin{aligned} \varDelta e_r=p_r-\hat{p_r} \end{aligned}$$
(4)

where \(\varDelta e_c\) is first order perdicting-error in the current channel and \(\varDelta e_r\) is first order perdicting-error reference channel. Next, the second order prediction-error is computed by

$$\begin{aligned} \varDelta ^2 e=\varDelta e_c-\varDelta e_r \end{aligned}$$
(5)

when the pixels in the smooth region of image, the pixels are similar to each other and the first order perdicting-errors \(\varDelta e_c\) and \(\varDelta e_r\) are close to zero. Therefore, the second order prediction-errors are also close to zero. On the other hand, when the pixels are located at rough region, the first order perdicting-errors \(\varDelta e_c\) and \(\varDelta e_r\) relatively large. However, considering that all the color channels have similar edge distribution and take into account the edge information obtained from another channel. the second order prediction-errors become smaller. So, the second order prediction-error sequence \(\varDelta ^2 e=(\varDelta ^2e_1,\cdots ,\varDelta ^2e_N)\) is derived.

How to evaluate the prediction method? The entropy value of the prediction-error can be used to evaluate the performance of the proposed prediction method. If the entropy is smaller, the performance of prediction is better. On the contrary, if the entropy is larger, the performance of prediction is worse. In this paper, the UCID (Uncompressed color image database) is employed in our experiment, which has over 1300 uncompressed color images. In the Fig. 3, we can observe that the entropy value of our proposed method is smaller than that corresponding to other tradition methods such as MED, rhombus and first order perdicting-error based on correlation among color channels.

Fig. 3.
figure 3

The entropy value of the prediction-error obtained by our proposed prediction method and other tradition methods

2.2 Second Order Perdicting-Error Sorting Based on Generalized Normal Distribution

The generalized “error” distribution is a generalized form of the normal, it possesses a natural multivariate form, and has a parametric kurtosis that is unbounded above and possesses special cases that are identical to the Normal and the double exponential distributions [21]. Given that the probability density function (PDF) of prediction-error follows generalized normal distribution or gaussian distribution, we consider using this model to describe the prediction-error in Fig. 3. Generalized normal distribution density function is defined by Nadarajah [9].

$$\begin{aligned} f(\varDelta e|u,\alpha ,\beta )=\frac{\beta }{2\alpha \varGamma (\frac{1}{\beta })}exp \biggl \{-\biggl |\frac{(\varDelta e-u)}{\alpha }\biggr |^\beta \biggr \}, \end{aligned}$$
(6)

where \(\varDelta e\) is prediction-error with mean u and variance \(\sigma ^2\). \(\alpha =\sqrt{\sigma ^2 \varGamma (1/\beta ) / \varGamma (3/\beta ) }\) is a scale parameter, playing the role of a variance that determines the width of the PDF, while \( \beta >0 \), called the shape parameter, controls the fall-off rate in the vicinity of the mode (the higher \( \beta \), the lower the fall-off rate). \( \varGamma (.) \) denotes the Gamma function such that \( \varGamma (t)=\int _{0}^{\infty } x^{t-1}exp(-x)dx\). It is easy to see that the Eq. (6) reduces to the normal distribution for \( \beta = 2 \), and Laplacian distribution for \( \beta = 1 \).

Fig. 4.
figure 4

The distribution of \(f(\varDelta ^2 e)\)

Based on Eq. (6), we can easily get \(\varDelta e_c \sim GND (u_c,\alpha _c,\beta _c) \) and \(\varDelta e_r \sim GND (u_r,\alpha _r,\beta _r) \). Then \(\varDelta ^2 e=\varDelta e_c-\varDelta e_r\) can be given by

$$\begin{aligned} f(\varDelta ^2 e) =&-\int _{-\infty }^{+\infty } f_{\varDelta e_c-\varDelta e_r}(\varDelta e_c,\varDelta ^2 e-\varDelta e_c)d\varDelta e_c \nonumber \\ =&-\int _{-\infty }^{+\infty } \frac{\beta _c\beta _r}{4\alpha _c\alpha _r\varGamma (1/\beta _c)\varGamma (1/\beta _r)} \nonumber \\&{\times } \, exp\biggl (-\biggl |\frac{\varDelta e_c-u_c}{\alpha _c} \biggr |^{\beta _c}-\biggl |\frac{\varDelta ^2 e-\varDelta e_c-u_r}{\alpha _r} \biggr |^{\beta _r}\biggl ) d\varDelta e_c \nonumber \\ \end{aligned}$$
(7)

According to Eq. (7), there does not appear to exist a closed form expression. However, in order to simplify and reduce the complexity of the problem, we take \( \beta =1 \) as a particular case of generalized normal. When \( \beta =1 \) is corresponding to the Laplace distribution as follows

$$\begin{aligned} f(\varDelta ^2 e) =&-\int _{-\infty }^{+\infty } f_{c}(\varDelta e_c)f_{r}(\varDelta ^2 e-\varDelta e_c) d\varDelta e_c \nonumber \\ =&\frac{\alpha _c}{2(\alpha _c^2-\alpha _r^2)} exp(-\frac{|\varDelta ^2 e-(u_c-u_r)|}{\alpha _c}) \nonumber \\&{-}\,\frac{\alpha _r}{2(\alpha _c^2-\alpha _r^2)} exp(-\frac{|\varDelta ^2 e-(u_c-u_r)|}{\alpha _r}) \nonumber \\ \end{aligned}$$
(8)

It can easily be seen that the \(f(\varDelta ^2 e)\) is the difference between the two Laplace distributions with the same mean \(u_c-u_r\) under different weights. The probability density function (PDF) of second order prediction-error can be seen in Fig. 4. As shown in Fig. 4, the distance from \(u_c-u_r\) to y axis can be used to reflect the accuracy of the second order prediction error. The smaller the distance is, the better the accuracy of prediction is. Since the distance is used to measure the accuracy of the prediction error, we can get a new function \( \varPhi (\varDelta ^2 e)=|f(\varDelta ^2 e)| \) (Fig. 5). The distribution of \( \varPhi (\varDelta ^2 e) \) can be given by It can easily be seen that the expectation of function \( E[\varPhi (\varDelta ^2 e)] \) has positive correlation to \(u_c-u_r\). So it can be used to characterize the accuracy of the second order prediction error. For example, if the \( E[\varPhi (\varDelta ^2 e)] \) is higher, which means the pixels in image region are more random or unpredictable. Consequently, the pixels are hard to predict accurately in this region. Thus, the prediction-errors can be rearranged by sorting according to \( E[\varPhi (\varDelta ^2 e)] \). Let \( u=u_c-u_r \), \( a=\frac{\alpha _c}{2(\alpha _c^2-\alpha _r^2)} \) and \( b=\frac{\alpha _r}{2(\alpha _c^2-\alpha _r^2)} \). The \(\varPhi (\varDelta ^2 e)\) can be given by

$$\begin{aligned} \varPhi (\varDelta ^2 e) =&\frac{a}{2\alpha _c} (exp(-\frac{|u-\varDelta ^2 e|}{\alpha _c})+exp(-\frac{u+\varDelta ^2 e}{\alpha _c})) \nonumber \\&{-}\, \frac{b}{2\alpha _r} (exp(-\frac{|u-\varDelta ^2 e|}{\alpha _r})+exp(-\frac{u+\varDelta ^2 e}{\alpha _r})) \end{aligned}$$
(9)
Fig. 5.
figure 5

The distribution of \(\varPhi (\varDelta ^2 e)\)

where \( \varDelta ^2 e, u \ge 0 \). Then we can get the \( E[\varPhi (\varDelta ^2 e)] \) as follow

$$\begin{aligned} E(\varPhi (\varDelta ^2 e))&= -\int _{0}^{+\infty } \varDelta ^2 e\varPhi (\varDelta ^2 e) d\varDelta ^2 e \nonumber \\&=a\times exp({-\frac{u}{\alpha _c}})-b\times exp({-\frac{u}{\alpha _r}})+u \nonumber \\ \end{aligned}$$
(10)

As shown in Fig. 2, the the parameters u and \(\alpha \) can be estimated by

$$\begin{aligned} u_c=min \{(d_1+d_3)/2, (d_2+d_4)/2 \} \end{aligned}$$
(11)
$$\begin{aligned} \alpha _c= \sqrt{\frac{1}{4\times 2} \sum _{4}^{k=1}(d_k-u_c)^2} \end{aligned}$$
(12)

where \(d_1=p_c^w-p_c^n\), \(d_2=p_c^n-p_c^e\), \(d_3=p_c^e-p_c^s\) and \(d_4=p_c^s-p_c^w\).

Observed form above, \( E[\varPhi (\varDelta ^2 e)] \) is an increasing function of \(u_c,u_r,\) and \(\alpha _c,\alpha _r\). The smaller \(u_c,u_r,\) and \(\alpha _c,\alpha _r\) is, the better the accuracy of prediction is. So, the \( E[\varPhi (\varDelta ^2 e)] \) can well characterize local context complexity for pixel and prediction accuracy of prediction-error.

By setting a threshold \( \lambda \), the entropy satisfying \(E[\varPhi (\varDelta ^2 e)] \le \lambda \) are utilized in data embedding while the others are skipped. For a specific payload R, \( \lambda \) is determined as the smallest value such that it can ensure the enough payload. Thus, the embedding process starts from the prediction-error with the smallest \( E[\varPhi (\varDelta ^2 e)] \) value in the sorted row, and moves on to the next prediction-error until the last bit of data is embedded. As shown in Fig. 6, the left is the prediction-error of the lena image before sorting and the error margin is very high. The right is sorted by our method and the results can be clearly seen that both error and entropy are small being sorted in front. The image quality can be improved significantly, because the message is embedded in the appropriate prediction-error.

Fig. 6.
figure 6

The prediction-error of lena

3 Application, Experiment and Analysis

In this section, we apply second order prediction-error sorting (SOPS) algorithms to the Li \(\textit{et}\) \(\textit{al.}\) [11] method. It is stressed that the embedding and extraction procedures are the same with the algorithms in [11]. We just replace or add the prediction and sorting algorithm in experiments. Then frameworks of the proposed SOPS for RDH scheme are presented in Fig. 7. As shown in Fig. 7, we first hide data into the red and red channels by taking the green one as the reference channel. When hiding data into the green one itself, the reference channel is the marked red one. The embedding procedures and extracting procedures of second order perdicting-error sorting for color image are as follows:

Fig. 7.
figure 7

The framework of data hiding for the three channels.

Fig. 8.
figure 8

Test image Lena, Barbara, Kodak-01 and Kodak-24.

figure a
figure b

We implemented these methods on the computer with Intel core i3 and 4GB RAM. The program developing environment is MATLAB R2011b based on Microsoft Windows 7 operating system. In the experiment, we employ four color images (refer to Fig. 8) to test the performance of our proposed RDH algorithm via embedding capacity distortion curves. In the Fig. 8, the first two standard images are saved in TIFF format, with size \(512 \times 512\). And the two Kodak images are the first and the last ones saved in the database (http://r0k.us/graphics/kodak/), with PNG format and \(512 \times 768\) in size. Our method is evaluated by comparing with the other six recent works of Li \(\textit{et}\) \(\textit{al.}\) [11], Li \(\textit{et}\) \(\textit{al.}\) [12], Sachnev \(\textit{et}\) \(\textit{al.}\) [13], Alattar [14], Hu \(\textit{et}\) \(\textit{al.}\) [15], Yang and Huang [16]. For our method, we vary the embedding from 100,000 bits to 300,000 bits or 600,000 bits with step size 50,000 bits.

Observing form Fig. 9, Our method is evaluated by comparing with the other six recent works. The comparison results are shown in Fig. 9(a)–(d). According to the experimental results, one can see that the proposed method outperforms these state-of-the-art works. Our method can provide a larger PSNR whatever the test image or capacity is. Comparing with Li \(\textit{et}\) \(\textit{al.}\) [11], experimental results show that our method provides an average increase in PSNR of 0.65dB for Lena, 3.63dB for Barbara, 2.66dB for Kodak-01 and 0.91dB for Kodak-24. our method, an average 1.96dB PSNR gains is earned compared with the Li \(\textit{et}\) \(\textit{al.}\) [11] method, and compared with Sachnev \(\textit{et}\) \(\textit{al.}\) [5], the gains of PSNR is much more higher (Fig. 10).

Fig. 9.
figure 9

(a) and (b) is performance comparison between our method and six methods of Li et al. [11], Li et al. [12], Sachnev et al. [13], Alattar [14], Hu et al. [15], Yang and Huang [16].

Fig. 10.
figure 10

(a) and (b) is performance comparison between our method and six methods of Li et al. [11], Li et al. [12], Sachnev et al. [13], Alattar [14], Hu et al. [15], Yang and Huang [16].

4 Conclusion

In this paper, we propose a novel second order perdicting and sorting technique for reversible data hiding. Firstly, the prediction error is obtained by an interchannel secondary prediction using the prediction errors of current channel and reference channel. When the pixels in the smooth region of image, the pixels are similar to each other and the first order perdicting-errors are close to zero. Therefore, the second order prediction-errors are also close to zero. On the other hand, when the pixels are located at rough region, the first order perdicting-errors relatively large. However, considering that all the color channels have similar edge distribution and take into account the edge information obtained from another channel. the second order prediction-errors become smaller. Experiments show that this prediction method can produce shaper second order prediction-error histogram. Then, we will introduce a novel second order perdicting-error sorting (SOPS) algorithm, which make full use of the feature of the edge information obtained from another color channel and high correlation between adjacent pixels. So it will reflect the local context complexity for pixel and prediction accuracy of prediction-error. Experimental results show that the proposed method has better results compared to the other six recent works of Li \(\textit{et}\) \(\textit{al.}\) [11], Li \(\textit{et}\) \(\textit{al.}\) [12], Sachnev \(\textit{et}\) \(\textit{al.}\) [13], Alattar [14], Hu \(\textit{et}\) \(\textit{al.}\) [15], Yang and Huang [16].