Distortion Function for Emoji Image Steganography

: Nowadays, emoji image is widely used in social networks. To achieve covert communication in emoji images, this paper proposes a distortion function for emoji images steganography. The profile of image content, the intra- and inter-frame correlation are taken into account in the proposed distortion function to fit the unique properties of emoji image. The three parts are combined together to measure the risks of detection due to the modification on the cover data. With the popular syndrome trellis coding (STC), the distortion of stego emoji image is minimized using the proposed distortion function. As a result, less detectable artifacts could be found in the stego images. Experimental results show that the proposed distortion function performs much higher undetectability than current state-of-the-art distortion function HILL which is designed for natural image.

function is used to assign embedding costs for all cover elements to quantify the effects of modification. There are many distortion functions designed for spatial images, such as Holub et al.  ;Holub, Fridrich and Denemark (2014) ;Li, Wang, Huang et al. (2014); Sedighi, Cogranne and Fridrich (2016)] and JPEG images, such as Guo et al. [Guo, Ni and Shi (2014); Guo, Ni, Su et al. (2015); Wang, Zhang and Yin (2016); Wei, Yin, Wang et al. (2018); Du, Yin and Zhang (2018)]. For other kinds of images, new distortion functions should be proposed to fit their unique properties. In the age of big data currently, many kinds of digital images [Guan, Zhang, Wu et al. (2019); Wu, Dong, Ota et al. (2018)] are emerged. Specially, emoji image is widely used in social networks, e.g., Twitter, Facebook, and instant messaging systems, e.g., Skype, WeChat, to express emotion vividly. Different with natural image, as shown in Fig. 1, the emoji image is constituted by several curves with legible profile. The correlation between pixels is different from natural image which can be modeled as Markov chain. To save storage space, the usual format of emoji image is palette (a typical example: graphics interchange format). For the vitality of expression, most of the emoji images are motional. That means there are more than one frame contained in each emoji image. In this case, the correlation between the frames should also be considered for steganography. Furthermore, this inter-frame correlation is different from the correlation in natural images which are motionless.

Figure 1: Several emoji images
Existing distortion functions  ;Holub, Fridrich and Denemark (2014) ;Li, Wang, Huang et al. (2014);Sedighi, Cogranne and Fridrich (2016); Guo, Ni and Shi (2014); Guo, Ni, Su et al. (2015); Wang, Zhang and Yin (2016) ;Wei, Yin, Wang et al. (2018)] are designed for natural image, which aim to restrain embedding changes into texture and complex regions to conceal the modification trace [Wang, Yin and Zhang (2018)]. Although these distortion functions perform well in natural image, they are not suitable for emoji image since the profile has not been used enough. Therefore, it is necessary to develop customized distortion function for emoji image. To the best of our knowledge, there is no distortion function designed for emoji image. To fill up this gap, we propose a distortion function for steganography in emoji image. Different with existing distortion functions designed for natural image, the proposed distortion function combines the profile of image content, the intra-and inter-frame correlation together to measure the risks of detection due to the modification on the cover data. In this way, the unique properties of emoji image are considered. When secret data is embedded with syndrome trellis coding, the obtained stego emoji exposes less detectable artifacts.

Structure
The structure of the proposed method is shown in Fig. 2. To fit the properties of emoji image, the profile of image content, the intra-and inter-frame correlation are employed to form the profile, texture, and variation cost respectively. Then the three parts are combined together to measure the risks of detection due to the modification on the cover data.

Content Profile Extraction
Cover image

Emoji image
The usual format of emoji image is palette. As shown in Fig. 3, each palette image is composed of a color palette and a color index matrix. The color palette is a list of entries of representative colors in the image, and the elements in the color index matrix are pointers to those palette entries that specify the red-green-blue (RGB) colors [Tzeng, Yang and Tsai (2004)].

Color index matrix
Color palette

Figure 3: Demonstration of the palette format
Since there is more than one frame (color index matrix) contained in each emoji image, each emoji image is composed of a color palette and several color index matrices corresponding to the frames. In other words, an emoji image is composed of several palette images with only one-color palette. The color palette is shared with all color index matrices.

Distortion function design
According to the properties of the emoji images, a new distortion function is designed.
For an emoji image with k colors and l color index matrices, denote the i-th index in color palette as ci , i∈{0, … , k}, the j-th color index matrix with size M×N as Xj={xj(u,v)}∈ {ci} M×N , j∈{0, … , l}. The proposed distortion function assigns a embedding cost for each xj (u,v). The details are as follows.
Denote the RGB color values corresponding to ci as Ri, Gi, and Bi respectively. To minimize the color value distortion cause by the modifications made on xj(u,v) during steganography, the value-similar xj (u,v) should corresponding to similar (Ri, Gi, Bi) values.
To achieve this, all the Xj are modified using Algorithm 1.

Algorithm 1 Color index matrix adjustment
Input: The j-th color index matrix Xj , color palette ci . Output: Adjusted Xj .
The first part of the proposed distortion function aims to take use of the intra-frame correlation. This purpose is similar with the distortion functions for natural images. Current state-of-the-art distortion function is HILL, which constituted by a high-pass filter and two low-pass filters to make the modifications clustered. We employ the approach of HILL to assign the texture cost ρj T (u,v) for each xj (u,v). The details are as follows. Let Fh be a high-pass filter, the residuals Rj of Xj are calculated using Eq. (1) firstly.
The nonexistent pixel which is out of the image boundary would be obtained by pixel symmetric padding. For example, xj(u+1,v) is obtained by copying xj (u-1,v) when it is out of the block boundary, and vice versa. Then two low-pass filters Fl1 and Fl2 are employed to obtained the cost matrix ρj H ={ρj H (u,v)} M×N in HILL [Li, Wang, Huang et al. (2014)] using Eq. (3).
(4) Thus, the texture cost ρj T (u,v) for each xj (u,v) is defined in Eq. (5). Where the cost ρj H (u,v) is calculate using Eq. (3), which is the same with the cost in HILL.
To extract the content profile of frame Xj, it is transformed into grayscale image firstly by replacing xj (u,v) with the corresponding real luminance value. Then the obtained grayscale image is further transformed into binary image Yj={yj(u,v)} M×N . Thus, the value of the elements in Yj are 0 or 1. The content profile of Xj is the locations corresponding to "0" in Yj.
The locations corresponding to "1" in Yj belong to the background of Xj, which are so smooth that any modifications would be discovered. Therefore, these locations are not suitable for steganography. Accordingly, the profile cost ρj P ={ρj P (u,v)} M×N for each xj (u,v) can be obtained using Eq. (6).
In order to utilize the inter-frame correlation among different Xj, we consider the color difference between Xj and Xj-1. Both Xj and Xj-1 are transformed into grayscale image to calculate the color difference. To avoid the subscript of Xj overflowing, the embedding tasks are not done on the first frame X1. That means the first frame is kept unchanged during data embedding. Denote the (u,v)th pixels of the grayscale Xj and Xj-1 as pj (u,v) and pj-1(u,v) respectively, the color difference Dj={dj(u,v)} between Xj and Xj-1 is, Then the variation cost ρj V ={ρj V (u,v)} M×N for each xj(u,v) is defined in Eq. (8) to decrease the modifications on the frame which is similar with the previous frame.
(8) where the values "1.3" and "15" are empirically determined by experiments. Finally, the three parts (texture cost ρj T (u,v), profile cost ρj P (u,v), variation cost ρj V (u,v) ) are combined together by multiplication. The final embedding cost ρj(u,v) assigned for xj (u,v) is defined in Eq. (9).

Experimental results
Several experiments are conducted to verify the effectiveness of the proposed distortion function method. Firstly, we setup the experimental environments. Subsequently, we analyze the quality of stego image. Finally, we provide the results of undetectability compared with HILL.

Experimental setup
To build the cover image set, we collected 560 emoji images that is widely used in social networks. These images are in palette format and each image contains 256 colors and several frames. There are 2557 frames in total of the 560 images. We have uploaded all the 560 images on https://pan.baidu.com/s/1nOsn_eoI8vLpgqo8ue8nOQ. We compare the proposed distortion function with the popular distortion function HILL which performs the state-of-the-art undetectability. Since HILL is designed for spatial image, each frame is transformed into grayscale image firstly when embedding with HILL. In other words, for HILL embedding, there are 2557 grayscale images are used as covers. The capacity of secret data embedded in each frame is set as 600 bits, 700 bits, 800 bits, 900 bits, 1000 bits, and 1100 bits respectively. All embedding tasks are done by the embedding simulator [Pevný, Filler and Bas (2010)] since it is widely used to simulate the optimal embedding. For steganalysis, the feature sets SPAM proposed by Pevný et al. [Pevný, Bas and Fridrich (2010)] and SRMQ1 proposed by Fridrich et al. ] are employed in our experiments. The ensemble classifier proposed by Kodovsky e al. [Kodovsky, Fridrich and Holub (2012)] is used to measure the property of feature sets. In detail, half of the cover and stego feature sets are used as the training set while the remaining half are used as testing set. The criterion to evaluate the performance of feature sets is the minimal total error P E under equal priors achieved on the testing set in Kodovsky et al. [Kodovsky, Fridrich and Holub (2012)]: where PFA is the false alarm rate and PMD is the missed detection rate. The performance is evaluated using the average value of PE over ten random tests.

Image quality demonstrations
The demonstrations of the proposed method are shown in Fig. 4. Where Fig. 4(a) is a cover emoji image composed of several frames. After each frame is embedded with 600, 800, and 1000 bits respectively, the obtained stego images are shown in Figs. 4(b), 4(c) and 4(d) correspondingly. It is clear that the stego images are close to the cover image, which means the visual quality of the stego images is satisfactory regardless of the capacity. Thus, the usability of emoji images is reserved after embedded adequate secret data using the proposed method. Figure 4: Demonstrations of (a) cover and corresponding stego emoji images using the proposed method with capacity (b) 600 bits, (c) 800 bits, (d) 1000 bits  It is clear that the security performance of the proposed method is much better than HILL for all cases, regardless of the steganalytic tools and capacity. Specifically, the PE values of the proposed method are more than two times of HILL. For the cases of large capacity, e.g., 900, 1000, 1100 bits, the PE values of the proposed method are nearly three times of HILL. The large improvement on undetectability is because that the proposed distortion function is designed by following the unique properties of emoji image, while HILL not. In addition, inter-frame correlation is the most unique property of motion image, which is mentioned in Usui et al. [Usui, Takano and Yamamoto (2017)]. The modifications of steganography should avoid destroying the correlation as far as possible. The correlation can be reflected in the difference between image frames. For this reason, we also give the undetectability comparisons on difference image in Fig. 6 to further demonstrate the superiority of the proposed method. Tab. 2 lists the corresponding numerical values.  Since the first frame is kept unchanged during data embedding, as mentioned in Subsection 2.2, the difference images are obtained by calculating the differences between the first frame and the other frames respectively in each emoji cover and stego image. That means the first frame is kept unchanged during embedding. As shown in Fig. 6, the undetectability tested on the difference images of the proposed method is still better than HILL.

Conclusion
A distortion function for emoji image steganography is proposed in this paper. To fit the properties of emoji image, the profile of image content, the intra-and inter-frame correlation are considered in the proposed distortion function. The three parts are combined together by multiplication to resist steganalysis. Experimental results proved the effectiveness of the proposed distortion function. For further study, it is significant to develop the theoretical optimal embedding for emoji image by uniting the steganographic methods for palette image.