Automation of reversible steganographic coding with nonlinear discrete optimisation

Authentication mechanisms are at the forefront of defending the world from various types of cybercrime. Steganography can serve as an authentication solution through the use of a digital signature embedded in a carrier object to ensure the integrity of the object and simultaneously lighten the burden of metadata management. Nevertheless, despite being generally imperceptible to human sensory systems, any degree of steganographic distortion might be inadmissible in fidelity-sensitive situations such as forensic science, legal proceedings, medical diagnosis and military reconnaissance. This has led to the development of reversible steganography. A fundamental element of reversible steganography is predictive analytics, for which powerful neural network models have been effectively deployed. Another core element is reversible steganographic coding. Contemporary coding is based primarily on heuristics, which offers a shortcut towards sufficient, but not necessarily optimal, capacity--distortion performance. While attempts have been made to realise automatic coding with neural networks, perfect reversibility is unattainable via such learning machinery. Instead of relying on heuristics and machine learning, we aim to derive optimal coding by means of mathematical optimisation. In this study, we formulate reversible steganographic coding as a nonlinear discrete optimisation problem with a logarithmic capacity constraint and a quadratic distortion objective. Linearisation techniques are developed to enable iterative mixed-integer linear programming. Experimental results validate the near-optimality of the proposed optimisation algorithm when benchmarked against a brute-force method.


Introduction
Steganography is the art and science of concealing information within a carrier object (Anderson & Petitcolas, 1998). The term encompasses a wide range of techniques and applications, including but not limited to covert communications (Fridrich, Goljan, Lisonek, & Soukal, 2005), ownership identification (Cox, Kilian, Leighton, & Shamoon, 1997), copyright protection (Barni, Bartolini, Cappellini, & Piva, 1998), broadcast monitoring (Depovere et al., 1999) and traitor tracing (Shan He & Min Wu, 2006). An important application of steganography is data authentication, which plays a vital role in cybersecurity. The advent of data-centric artificial intelligence has been accompanied by cybersecurity concerns (Boden et al., 2017). It has been reported that intelligent systems are vulnerable to adversarial attacks such as poisonous data collected for re-training during deployment (Muñoz-González et al., 2017), malware codes hidden in neural network parameters (Liu et al., 2020) and invisible perturbations crafted to cause erroneous decisions (Goodfellow, Shlens, & Szegedy, 2015). A proper authentication mechanism must ensure that the integrity of data has not been undermined and that the identity of users has not been forged, and thereby protect against these insidious threats.
Digital signatures are a type of authentication technology that is based upon modern cryptography (Rivest, Shamir, & Adleman, 1978). This technology can be incorporated into a trustworthy surveillance camera in such a way that photographs are taken and stored along with digital signatures (Friedman, 1993). However, storing such auxiliary metadata as a separate file entails the risk of accidental loss and mismanagement during the data lifecycle. Steganography can allow auxiliary information about the data to be embedded invisibly within the data itself. Nevertheless, although generally imperceptible to human sensory systems, any degree of steganographic distortion might not be admissible in some fidelity-sensitive situations such as forensic science, legal proceedings, medical diagnosis and military reconnaissance. This is where the notion of reversible computing comes into play (Alattar, 2004;Chang, Li, & Shi, 2018;Coatrieux, Pan, Cuppens-Boulahia, Cuppens, & Roux, 2013;De Vleeschouwer, Delaigle, & Macq, 2003;Fridrich, Goljan, & Du, 2001;Lee, Yoo, & Kalker, 2007;Wu & Zhang, 2020).
A fundamental element of reversible steganography, in common with lossless compression, is predictive modelling (Rissanen, 1984;Shannon, 1948;Weinberger & Seroussi, 1997). Prediction error modulation is a cutting-edge reversible steganographic technique composed of a predictive analytics module and a reversible coding module (Celik, Sharma, Tekalp, & Saber, 2005;Dragoi & Coltuc, 2014;Fallahpour, 2008;Hwang, Kim, & Kim, 2016;Li, Yang, & Zeng, 2011;Sachnev, Kim, Nam, Suresh, & Shi, 2009;Thodi & Rodriguez, 2007). The recent development of deep learning has advanced the frontier of reversible steganography. It has been shown that deep neural networks can be applied as powerful predictive models (Chang, 2020(Chang, , 2021(Chang, , 2022Hu & Xiang, 2021). Despite inspiring progress in the analytics module, the design of the coding module is still based largely on heuristics. While there are studies on end-to-end deep learning that use neural networks for automatic reversible computing, perfect reversibility cannot be guaranteed (Duan et al., 2019;Lu, Wang, Zhong, & Rosin, 2021;Zhang, Fu, Di, Li, & Liu, 2019). From a certain point of view, it is hard for a neural network, as a monolithic black box, to follow the intricate procedures of reversible computing (Castelvecchi, 2016). While deep learning is adept at handling the complex nature of the real world (LeCun, Bengio, & Hinton, 2015), reversible computing is more of a mechanical process in which procedures have to be conducted in accordance with rigorous algorithms. Therefore, at the time of writing, it seems advisable to follow a modular framework.
The essence of reversible steganographic coding is determining how values change to represent different message digits. Different solutions can lead to different trade-offs between capacity and distortion. Instead of relying on heuristics, this study pursues the development of optimal coding for reversible steganography in order to attain optimal capacity-distortion performance. We model reversible steganographic coding as a mathematical optimisation problem and propose an optimisation algorithm for addressing the nonlinearity of this problem. In particular, the task is to minimise steganographic distortion subject to a capacity constraint, where both objective and constraint are nonlinear functions. We propose linearisation techniques for addressing this nonlinear discrete optimisation problem. The remainder of this paper is organised as follows. Section 2 outlines the background regarding reversible steganography. Section 3 formulates the nonlinear discrete optimisation problem and discusses the complexity of a brute-force search algorithm. Section 4 presents linearisation techniques for tackling the nonlinear discrete optimisation problem. Section 5 analyses the optimality of solutions through simulation experiments. Section 6 provides concluding remarks.

Background
Prediction error modulation is a reversible steganographic technique that consists of an analytics module and a coding module. The analytics module begins by splitting a cover image into context and query sets, denoted by c and q respectively. A conventional method is to arrange pixels in two groups according to a chequered pattern. Then a predictive model is applied to predict the intensities of the query pixels from the intensities of the context pixels. A contemporary practice of predictive modelling is to employ an artificial neural network originally designed for computer vision tasks. The coding module embeds a message ω into the cover image by modulating the prediction errors ε = q −q. The modulated errors ε are then added to the predicted intensities, causing distortion of the query pixels. The stego image is created by merging the context set c and the modulated query set q . The decoding procedure is similar to the encoding procedure. It begins by predicting the query pixel intensities. Since the context set is kept unchanged, the prediction in the decoding phase is guaranteed to be identical to that in the encoding phase given the same predictive model. The message is extracted and the query set is recovered by demodulating the prediction errors. The image is reversed to its original state by merging the context and recovered query sets. The procedures for encoding and decoding are depicted schematically in Figure 1 and also provided in Algorithms 1 and 2. We would like to note that the message may contain certain auxiliary information for handling pixel intensity overflow. This paper does not go into detail about every aspect of the stego-system; instead, our study focuses on the mathematical optimisation of reversible steganographic coding.

Nonlinear Discrete Optimisation
The essence of reversible steganographic coding is designating one or multiple error values as the carrier and determining how these values change to represent different message digits. A rule of thumb for reversible steganographic coding is to choose the prediction errors of the peak frequency as the carrier. While the peak frequency implies the highest capacity, this capacity-greedy strategy is not necessarily optimal in terms of minimising distortion.

Problem Definition
According to the typical law of error, the frequency of an error can be expressed as an exponential function of its numerical magnitude, disregarding sign (Wilson, 1923). In other words, the frequency distribution of prediction errors is expected to centre around zero. In general, a smaller absolute error tends to have a higher occurrence. A special exception is that the occurrence of zero might be lower than the occurrence of a certain small absolute error considering that the latter is the sum of both positive and negative error occurrences. Consider an absolute error histogram as shown in Figure 2. The problem of reversible steganographic coding is to establish a mapping between the values in [0, n] and the values in [0, n + ϑ], where ϑ denotes the extra quota and is typically defined as less than or equal to the number of successive empty bins in the absolute error histogram. Encoding is a one-to-many mapping that links a cover value to one or more stego values. A message digit can only be represented if the connections are greater than one. Different cover values can never yield the same stego value. This is done in order to avoid an overlap between values (i.e. an ambiguity in decoding). Therefore, a cover value may be changed to a different stego value even if it does not represent any message digit. We impose a constraint that each cover value can only be mapped to the nearest available stego values since a non-cross mapping drastically reduces the problem dimension. An example of a cover/stego mapping is illustrated in Figure 3.

Model Formulation
Let us denote by a i the frequency of the value i and by x i the number of extra coverto-stego links for the value i. The total number of links for the value i equals x i + 1. The number of bits that can be represented by modifying the value i is log 2 (x i + 1) and thus the capacity is computed by In fact, the number of bits that can be represented by modifying 0 equals log 2 (2x 0 + 1) because 0 can be mapped to both positive and negative values. For example, there are three different states 0 and ±1 when x 0 = 1, and five different states 0, ±1 and ±2 when x 0 = 2. To be concise, we simplify the case by mapping 0 randomly to a positive or negative value so that the capacity computation for 0 is identical to that for other values at the cost of slightly underestimating the capacity offered by the former. The probability of changing a cover value to each stego value is 1/(x i + 1). The deviations of the first to the last stego value are 0 + y i to x i + y i respectively, where y i denotes the sum of all the previous extra links (i.e. the cumulative deviation). Hence, the expected distortion in terms of the squared deviations is computed by where We can simplify the algebraic expression by The reason for computing squared deviations rather than absolute deviations is that image quality is often measured by the peak signal-to-noise ratio (PSNR), which is defined via the mean squared error (MSE). Our goal is to solve for the decision variables x i ∈ {0, . . . ϑ} which minimise the distortion objective subject to the capacity constraint. The sum of all the extra cover-to-stego links is not allowed to exceed the quota ϑ. To summarise, the mathematical optimisation problem for reversible steganographic coding is var. x i ∈ {0, · · · , ϑ}, ∀i = 0, . . . , n.

Brute-Force Search
Brute-force search is a baseline method for benchmarking optimisation algorithms. The solution space that exhausts all possible combinations of the decision variables is equal to (ϑ + 1) n+1 ∈ O(c n ). By taking account of the quota constraint, we can reduce the solution space from the number of possible combinations to the number of feasible combinations. In number theory and combinatorics, the partition function part(t) computes the number of ways of writing t as a sum of the positive integers in [1, t]. Let Λ t denote a matrix of part(t) rows and t columns which enumerates all possible partitions: Each vector λ represents a possible partition in which each element is the quantity of a candidate integer (i.e. the summand). For example, Λ 2 , Λ 3 and Λ 4 are The total number of feasible solutions can be calculated by adding up the number of feasible solutions given by each individual partition matrix from Λ 1 to Λ ϑ ; that is, ϑ t=1 feasible(Λ t , n * ), where n * = n+1 denotes the number of integers in [0, n]. For each matrix Λ t , the number of feasible solutions is computed by summing the number of possible combinations given by each partition vector λ , denoted by feasible(Λ t , n * ) = part(t) =1 comb(λ , n * ).
A combination is a selection of values from a set of n * values based on a given partition vector and hence the number of combinations is computed by where λ * i = λ ,i represents a convenient notation without explicitly writing out the index of the partition vector (for reducing the verbosity). The number of combinations is a product of t binomial coefficients and each term is meant to choose (and remove) an unordered subset of λ * i values from the remaining values in the set of n * values. Let us take Λ 3 for example. The number of combinations for partition vectors λ 1 , λ 2 and λ 3 are computed as follows: Hence, the complexity of this brute-force algorithm is approximately equal to

Linearisation
The difficulty of our optimisation problem lies in the nonlinear nature of the capacity constraint and the distortion objective. To apply off-the-shelf optimisation tools, we have to tackle these nonlinearities.

Logarithmic Capacity Constraint
The capacity constraint involves the calculation of logarithm of variables log 2 (x i + 1). The logarithmic function is nonlinear. A useful linearisation trick is to re-model the problem with binary-integer variables. We binarise each decision variable x i with the domain [0, ϑ] into a 0/1 vector (or a one-hot vector) of length ϑ + 1, as illustrated in Figure 4. The vector consists of 0s with the exception of a single 1 whose position indicates the value of x i ; that is, such that 1 · x i = 1, ∀i = 0, . . . , n.
We can retrieve x i using the dot product of vectors Accordingly, the quota constraint becomes In a similar manner, the logarithm can be derived using the dot product of vectors Hence, we rewrite the capacity constraint as

Quadratic Distortion Objective
The distortion objective involves three nonlinear terms x 2 i , y 2 i and x i y i . These terms are quadratic functions of variables. The first term can be approached using the dot product as before; that is The remaining two terms contain the partial sum of variables y i , which is computed by To linearise the univariate quadratic term y 2 i and the bivariate quadratic term x i y i , we introduce two non-negative continuous slack variables z y 2 i ≥ 0 and z xiyi ≥ 0. Replacing the quadratic terms with the dot product and the slack variables results in a linear distortion objective We begin by solving this mixed-integer linear programming problem, which does not yet reflect the quadratic terms regarding cumulative distortion, and obtain an initial solution comprisingx i ,z y 2 i , andz xiyi . The initial slack variables would be zeros because the objective is to minimise distortion. To make the slack variables reflect the quadratic terms properly, we add the following constraints In this way, we reformulate a problem with a nonlinear objective into a problem with a linear objective and nonlinear constraints. We make use of the solution obtained previously to linearise these nonlinear constraints and solve the mixed-integer linear programming problem iteratively. To begin with, we express the variables in terms of the previous solution: wherex i andỹ i are treated as constants. Then, we apply the first-order Taylor series to approximate the univariate quadratic term as and similarly the bivariate quadratic term as As a result, the nonlinear constraints are transformed into linear constraints To recapitulate, the nonlinear discrete optimisation problem is approached by means of an iterative method that solves a mixed-integer linear programming problem with binary-integer variables and non-negative continuous slack variables:

Simulation
We carry out experimental analysis on the optimality of the proposed method benchmarked against the brute-force method. The experimental setup is described as follows. For the predictive model, we use the residual dense network (RDN), which has its origins in low-level computer vision tasks such as super-resolution imaging (Zhang, Tian, Kong, Zhong, & Fu, 2018) and image restoration (Zhang, Tian, Kong, Zhong, & Fu, 2021). This neural network model is characterised by a tangled labyrinth of residual and dense connections. It is trained on the BOSSbase dataset (Bas, Filler, & Pevný, 2011), which originated from an academic competition for digital steganography, and comprises a large collection of greyscale photographs covering a wide variety of subjects and scenes. The algorithms are tested on selected images from the USC-SIPI dataset (Weber, 2006). All the images are resized to a resolution of 256 × 256 pixels via Lanczos resampling (Duchon, 1979). The border pixels along with half of the rest of the pixels are designated as the context. Accordingly, the number of query pixels equals (254 × 254)/2. We display both distortion and capacity as divided by the number of query pixels. Figure 5 shows the absolute error distribution for each test image. It is observed that most of the error values are below around 30 to 50, depending on the image. We set n = 55 conservatively in the sense that nearly every value of non-zero occurrence is included. We implement the algorithms with respect to different quota settings (ϑ = 1, 2, 3, 4). Figures 6 to 9 show performance evaluations of the proposed optimisation algorithm. Each point of the curve indicates the minimum distortion of a solution under a specific capacity constraint. In the vast majority of cases, the solutions found by the proposed method are identical to those given by the brute-force method. When failing to find the optimal solutions, the objective values reached are within a small distance from the optimal ones. Hence, even though optimal solutions cannot always be guaranteed, the results suggest that the proposed method can attain near-optimal performance.

Conclusion
This paper studies a mathematical optimisation problem applied to reversible steganography. We formulate automatic coding in prediction error modulation as a nonlinear discrete optimisation problem. The objective is to minimise distortion under a constraint on capacity. We discuss the complexity of a brute-force search algorithm and the linearisation techniques for the logarithmic capacity constraint and the quadratic distortion objective. The problem is transformed into an iterative mixedinteger linear programming problem with binary-integer variables and slack variables. Our simulation results validate the near-optimality of the proposed algorithm.

Disclosure statement
No potential conflict of interest is reported by the author.