Recovering Sign Bits of DCT Coefficients in Digital Images as an Optimization Problem

Recovering unknown, missing, damaged, distorted, or lost information in DCT coefficients is a common task in multiple applications of digital image processing, including image compression, selective image encryption, and image communication. This paper investigates the recovery of sign bits in DCT coefficients of digital images, by proposing two different approximation methods to solve a mixed integer linear programming (MILP) problem, which is NP-hard in general. One method is a relaxation of the MILP problem to a linear programming (LP) problem, and the other splits the original MILP problem into some smaller MILP problems and an LP problem. We considered how the proposed methods can be applied to JPEG-encoded images and conducted extensive experiments to validate their performances. The experimental results showed that the proposed methods outperformed other existing methods by a substantial margin, both according to objective quality metrics and our subjective evaluation.

DCT can be used to convert a discrete input signal with n samples into n frequency components, which are also called DCT coefficients.The first of all DCT coefficients is normally called the DC (direct current) coefficient, representing the zero-frequency component or the average amplitude of the input signal.Other DCT coefficients all represent the amplitude of a specific non-zero frequency component, and are normally called AC (alternate current) coefficients.Although DCT is not the optimal signal decorrelation transform, it is able to approximate the optimal solution well under a wide range of conditions and can be very easily implemented, therefore it has been widely utilized in many signal processing applications [2]- [4].Especially, 2-D blockwise DCT has been widely adopted in image and video compression standards including JPEG [5], MPEG-1/2, MPEG-4, AVC/H.264[6], and HEVC [7].Meanwhile, researchers extensively investigated characteristics of DCT to facilitate its applications [8]- [14].
Sensitive multimedia data should be encrypted before transmission to prevent unauthorized access and protect individual privacy.Due to the large volume of multimedia data and some other needs, such as format-compliance and perceptual encryption [15], selectively encrypting a small part of important data becomes a natural choice among researchers [16]- [19].However, some special properties of multimedia data, such as strong correlation of neighboring pixels, the more stable distribution of DCT coefficients, make many proposed selective encryption schemes of digital images and videos insecure against various attacking methods in different settings [15].Among all the attacks, one group of them work under the ciphertextonly condition and try to recover an encrypted image/video to reveal more visual information than simple error-concealment attacks can recover, where "error-concealment attacks" refer to attacks that simply replace encrypted information by some simple values (e.g., zeros).For instance, in [20], Uehara et al. demonstrated that encrypted DC coefficients can be approximately restored from known AC ones by exploiting the strong correlation between adjacent pixels, resulting in recovered images with satisfactory visual quality.This work was further improved by Li et et al. by using an optimizationbased approach to minimize the so-called under/over-flow rate of pixel values caused by error propagation [21].Soon after, Li et al. proposed a more general method for restoring an arbitrary set of missing DCT coefficients from known ones, and formulated the problem as a linear programming (LP) problem that aims to minimize the sum of absolute differences between neighboring pixels, which was shown to outperform all previously proposed methods significantly [22].Note that the DCT coefficient recovery method can be used not only for attacking selective encryption schemes, but also in other less security-related applications, such as image compression.
As the most significant bit (MSB) of a DCT coefficient, the sign bit plays an important role in DCT-based image and video coding and encryption.In many image and video coding standards, sign bits are separately encoded.Researchers have also looked at how to more efficiently encode sign bits to reduce informational redundancies in the encoded image and video bitstream [23]- [26].In the context of image and video encryption, random flipping sign bits has been widely adopted as a general component in selective encryption schemes due to its capability to maintain format-compliance and also sizepreservation while being very easy to implement and highly efficient [18], [19], [27]- [31].Despite the wide use of sign bits, we have not seen any past research investigating advanced methods for recovering unknown sign bits of DCT coefficients, where advanced methods refer to those that are not based on simple error-concealment strategies.This gap leads to a lack of understanding on the security of selective image and video encryption schemes based on sign bit encryption and also on more efficient sign bit prediction and compression methods for image and video coding.
In this paper, we fill the above-mentioned research gap by formulating the sign bit recovery problem as an optimization problem that aims to minimize the sum of absolute differences between neighboring pixels, following a similar vein of the DCT coefficient recovery model reported in [22].Different from and more challenging than the work in [22], a sign bit is a binary value so the optimization model is a mixed integer linear programming (MILP) problem, which is NPhard in general.To solve the problem efficiently, we propose two approximation methods that can obtain reasonably good visual quality with a manageable time complexity.In the first method, we relax the MILP problem to an LP problem by replacing the binary unknown variables with continuous values and then estimate the sign bits based on the solution of the relaxed LP problem.In the second method, we divide the image into sufficiently small sub-images as [32], then solve an MILP problem for each sub-image independently, and finally refine the merged result by solving a global LP or MILP optimization problem.We extended the proposed methods to handle JPEG images used in real-world applications so that the methods take special encoding rules about sign bits and DCT coefficients.To demonstrate the performance of the proposed sign bit methods and how they work with different encoding parameters of the JPEG standard, we conducted extensive experiments with a set of 30 standard test images.We also compared the performance of the proposed methods with other native ones and a simplified version based on a relaxed LP model alone.The experimental results showed that the proposed methods were able to achieve satisfying visual quality with a practical time complexity, and remarkably outperformed other base-line methods.Our work can not only provide more insights on designing and evaluating selective multimedia encryption schemes, but also guidance on how to design more efficient image and video coding methods and how to recover lost, damaged or distorted sign bits in error-prone environments.
The rest of the paper is organized as follows.Section II reviews the related work.Section III explains our two proposed sign bit recovery methods with details.Experimental results are given in Section IV.The last section concludes the paper.

II. RELATED WORK
This section is organized to show closely related work in four areas: image and video encryption schemes using sign bit encryption; simple (not optimization-based) methods for recovering DCT coefficients; optimization-based methods for recovering DCT coefficients; and known methods for recovering sign bits of DCT coefficients.

A. Sign bit encryption for image and video encryption and privacy protection
As mentioned in Section 1, random flipping sign bits has been widely used for image and video encryption [27]- [29], [31].Here, we briefly introduce some representative work.Wang et al. designed a tunable encryption scheme for H.264/AVC videos, in which the encryption strength is adjusted by selecting some encryption objects: intra prediction modes, sign bits of non-zero coefficients, and sign bits of motion vectors [28].Hofbauer et al. proposed an encryption scheme for HEVC videos that solely encrypts some sign bits of AC coefficients in the encoded bitstream to distort the visual information strongly while maintaining full format-compliance and size-preservation [29], but also acknowledged that encrypting AC coefficients' sign bits alone cannot guarantee full confidentiality based on some analysis [33].Hofbauer et al., however, did not propose a method to recover encrypted sign bits to improve the visual quality of recovered images.Some researchers also proposed to use sign bit encryption for video surveillance systems for privacy protection purposes [18], [19], [30].For instance, Dufaux et al. proposed a privacy protection algorithm that encrypts only privacy-sensitive regions through random flipping sign bit while keeping the surveilled scenes comprehensible [18].

B. Simple image and video recovery against transmission errors and encryption
If errors occur during transmission and storage of image data, there may exist some incomprehensible or simply blank areas in the decoded version.To address this problem, some error-concealment techniques were proposed to recover such corrupted areas via exploiting the strong correlation between the target area and its surrounding areas [34]- [36].Bingabr and Varshney designed a recovery algorithm that can accurately correct corrupted coefficients in one DCT block with the assistance of reference information accurately received from an extra channel [36].Some researchers also proposed to use supervised machine learning methods to recover corrupted DCT coefficients, e.g., using a trained neural network [34].Park et al. proposed an estimation algorithm for corrupted DCT coefficients based on projections onto convex sets, in which the surrounding undamaged blocks are extracted to form a convex hull for reference [35].For errors occurring in video data, in addition to spatial information, the temporal redundancy between frames also can be exploited to recover corrupted DCT coefficients [37], [38].For example, Tsekeridou et al. devised an error-concealment method for MPEG-2 videos based on spatio-temporal video redundancy and blockmatching principles [38].
In multimedia selective encryption, there are always components that are excluded from encryption to avoid encrypting the whole bitstream.In this case, such unencrypted data can be potentially used to estimate encrypted ones [15], [17].Some sketch attacking methods were proposed to obtain an estimated image of low visual quality [39]- [42].In [39], Li and Yuan highlighted that the number of zero coefficients in each block is closely related to image texture and edge information, which generally remains unchanged to preserve compression ratio in existing JPEG encryption algorithms.Taking advantage of this observation, they proposed a nonzero count attack on encrypted JPEG images, generating a rough sketch.Following a similar idea, Minemura et al. proposed three improved sketch attacks that do not require manual adjustment of thresholds and can generate images of higher visual quality [40].Subsequently, these attack methods were extended to attack encryption schemes for H.264/AVC videos [41], [42].

C. Optimization-based DCT coefficient recovery
Some DCT coefficients might be absent due to selective encryption, transmission errors, multimedia coding, or malicious removal.When only DC coefficients are missing, they can be estimated sequentially from the known AC ones by exploiting the strong correlation in multimedia data [20]- [22].This is the so-called DC recovery problem.In [20], Uehara et al. summarized two properties of most natural images, which were the cornerstone of their recovery method.They estimated the missing DC coefficients one by one through minimizing the sum of the absolute difference of boundary pixels between the current DCT block and its neighboring ones.Then, they adjusted the estimated DC coefficients to keep pixel values within the valid range.The final output image was the average of four images obtained by four different scanning directions.Li et al. pointed out that there was a serious error propagation effect in the estimation process of Uehara et al.'s method, and suggested immediately making adjustment after estimating each DC coefficient to alleviate such error propagation [21].Moreover, they proposed to minimize the so-called under/overflow rate of pixel values to improve the visual quality of recovered images.Qiu et al. designed a similar method to improve the error resistance for JPEG image transmission [43].
Generalizing the DC recovery problem, Li et al. defined a more general problem of recovering an arbitrary set of missing DCT coefficients from other known ones, which was formed as an optimization problem that can be solved as an LP problem [22].The optimization aims to minimize the sum of the absolute differences of all neighboring pixels of the recovered image.In addition to being much more general, the new general method can also significantly surpass previous methods on the DC recovery problem.Later, a partition strategy [44] was introduced to reduce the time complexity of solving the optimization problem.The image is divided into multiple groups by image segmentation, each of which is separately recovered using LP, and then the brightness of each group is adjusted to minimize the discontinuing artifacts.As the special case of DCT recovery, DC recovery problem can be modeled as the dual of a min-cost flow problem and then be solved with a min-cost flow algorithm, where the time complexity is drastically reduced from O(n 2 ) to O(n 3/2 ) [45].
The LP model in [22] was also extended to recover undecoded coefficients in distributed video coding scheme [46], in which an additional temporal smoothness maximization is introduced into the optimization process.In addition, Wang et al. modified the objection function of the optimization model in [22] and added a regularization term to restore part of the image from the structured side information [47].In [48], Chen et al. proposed an optimization method to estimate DC coefficients for further compression, which exploits directional texture information of neighboring blocks and solves an optimal offset in a closed-form.

D. Sign bit recovery of DCT coefficients
So far, sign bit recovery mainly exists in decompression of image and video data [7], [23]- [26], [49], [50].In image and video compression standards based on the blockwise DCT, the redundancy among pixels within each DCT block is exploited to a large extent, while the redundancy between blocks especially between non-neighboring blocks is mostly unexplored.More specifically, the high correlation among pixels at the boundary between adjacent blocks is underutilized, which can actually be used to predict the signs of coefficients to further improve compression performance.The reason of such under-exploration is the need for the encoding process to be in real time, so a more complicated global optimization process is often considered too heavy.
Ponomarenko et al. proposed a sign bit prediction algorithm for lossy image compression, in which some sign bits are selected for compression in such a way that the border pixels reconstructed from inverse DCT are closest to those estimated from the previously-decoded blocks in spatial domain [23].To eliminate the expensive computational costs of transform between spatial and frequency domains, Rad et al. proposed a sign bit recovery method that can operate in the frequency domain solely [49].They estimated the sign bits of five lowfrequency coefficients and categorize DCT blocks into five patterns, each of which is treated with a different predictor.In [25], Lakhani integrated their proposed sign bit prediction algorithm for some significant coefficients into a modified JPEG codec.In fact, sign bit data hiding technique was introduced in the HEVC standard, in which the sign bit of a non-zero coefficient is omitted under some conditions [7].
Although recovery of DCT coefficients has been actively researched, the possibility of recovering encrypted, unknown, missing or damaged sign bits from a digital image has been much less studied in the literature.This paper fills this research gap by proposing the first sign bit recovery methods by modeling the recovery problem as an optimization problem, which can achieve a good recovery performance while having a practically small computational complexity.

III. PROPOSED METHODS
In this section, we first introduce the primary optimization model for the sign bit recovery problem.Then, two approximation methods, one based on linear programming (LP) and the other on hierarchical mixed integer linear programming (MILP), are presented to solve the problem efficiently.

A. The model
To facilitate the description and establishment of the optimization model, we first present two properties on pixel values of digital images first summarized in [20].
Property 1.The difference between any two neighboring pixel values of a natural image is a Laplacian variate with a zero mean and a small variance.
Property 2. For each block, pixel values of AC coefficients constrain the value of its DC coefficient.
Property 1 is well-known for natural images.Figure 1 shows the distribution of differences between neighboring pixel values of the test image "Lenna" of size 512 × 512.With the real distribution (red), a Laplacian distribution Laplace(µ, b) estimated from the real data is also shown, where the parameter µ is set to 0 and the bandwidth parameter b was obtained by the maximum likelihood estimation algorithm.Property 2 is derived from the definition of the 2-D DCT.In the rest of the paper, we use x = Ay to denote the relationship between image pixel values {x(i, j) ∈ [x min , x max ]} 0≤i,j≤N −1 and DCT coefficients {y(k, l)} 0≤k,l≤N −1 in block-wise N × N 2-D DCT, which can be written as where and C(k) is 1/N when k = 0 and 2/N when k > 0.
As shown in Eq. ( 1), the DC coefficient y(0, 0) is obviously constrained by the sum term without DC value and interval [x min , x max ].Note that Eq. ( 1) is a linear equation and the indices are relative to each block.
In [22], a similar DCT coefficient recovery problem is addressed, where missing DCT coefficients are estimated based on known ones.The main difference between the DCT coefficient recovery problem and the sign bit recovery problem is that the unknown variables in the latter are binary (1 or −1), but they are continuous in the former.Following the way how the DCT coefficient recovery problem is modeled, we can model the sign bit recovery problem as follows: where y * (k, l) is the known DCT coefficient, s(y, l) is the sign bit of the unknown DCT coefficient y(k, l) and f (x) is the objective function defined based on some properties of the image.We use the same objective function defined for the DCT coefficient recovery problem defined by Li et al. in [22]: where (i, j) and (i , j ) are coordinates of neighboring pixels.The above objective function is based on Property 1 and Fact 1, and its actual effect is Equation ( 2) uses a set of auxiliary variables h i,j,i ,j to linearize the above nonlinear objective function.The fact that the unknown variables s(y, l) are binary means that the above model becomes a mixed integer linear programming (MILP) problem 1 , rather than an LP problem for the DCT coefficient recovery problem.It has been known that as a general problem the MILP problem is NP-hard so cannot be solved using a polynomial time algorithm.This means that we have to seek more efficient approximation methods.We propose two such methods described below.

B. Two approximation methods 1) Method 1 -LP with relaxation:
We apply a linear relaxation to the constraint y(k, l) = s(y, l)|y(k, l)| so that the DCT coefficient y(k, l) with unknown sign bit can be assigned any value between −|y(k, l)| and −|y(k, l)|.In other words, the constraint becomes a normal LP linear condition with an upper and lower bounds: This change converts the MILP problem to a standard LP problem that can be solved in polynomial time.After the estimated DCT coefficient ŷ(k, l) is obtained, we take ŷ(k, l)'s sign bit to determine the value of s(y, l) and the final estimation of the coefficient as ỹ(k, l) = sign(ŷ(k, l)) • |y(k, l)|, where the function sign(•) extracts the sign of a real number as a value 1 or −1.Note that if ŷ(k, l) = 0 its sign bit is undefined, so we need a strategy to handle such cases.Four possible strategies are: 1) setting the coefficient to zero; 2) always assigning 1; 3) always assigning -1; 4) randomly assigning two possible sign bit values by following Bernoulli distribution (i.e., assigning 1 with probability p and −1 with probability q = 1 − p).
2) Method 2 -Hierarchical MILP or Hybrid MILP and LP: In this method, we adopt a "divide-and-conquer" (DAC) strategy to reduce the time complexity of the overall MILP problem without significantly compromising the visual quality of recovered images.The main idea is to reduce the MILP problem of the whole image into a number of MILP problems of smaller regions, and then to solve a smaller global MILP problem or a global LP problem to align the results of all the smaller region-wise MILP problems by refining DC coefficients (brightness) of all blocks in all regions.The regions need to be sufficiently small to make the smaller region-wise MILP problems solvable with a practically small time complexity.This DAC strategy could work because other than pixels on the boundary of each region, the smaller MILP problem should still be able to accurately recover all inner pixel values.The less accurate pixel values on the boundary can then be partially fixed using the final global optimization step.
The global LP step is done by fixing estimated AC coefficients and focus only on adjusting values of all blocks' DC coefficients.In other words, the aim of this step is to globally align the brightness of all blocks in all regions such that the resulting image is as smooth as possible.We have two strategies for this step: 1) allowing different blocks within each region to have different DC coefficients; 2) assigning the same DC coefficient to all blocks in the same region, since the internal smoothness of each region should have been addressed by solving the corresponding region-specific MILP problem.These two strategies are called "block LP" and "region LP", respectively.
Normally, MILP problems are solved using a branch-andbound algorithm, whose worst-case complexity is simply the size of the solution space.Assuming we have u unknown sign bits, the worst-case complexity will be O(2 u ).Now, assuming we divide an H ×W image into regions of fixed size H ×W and the u unknown sign bits are distributed uniformly across all regions, the m = HW H W smaller MILP problems will have an overall worst-case complexity of O 2 u m m .The global MILP problem will have a complexity of O(2 W H/N 2 ), and the global LP problem will have a polynomial-time complexity of O HW/N 2 1.5 (block LP) or O m 1.5 (region LP) if the most efficient known LP solving algorithm is used.By fixing H W to be a sufficiently small number, e.g., 64 × 64 or 32 × 256, we can effectively control the overall complexity of the whole process to be effectively polynomial time.

C. Extension to different encoding methods
In real-world applications, digital images are always encoded following a specific image encoding standard such as JPEG, PNG and GIF.Considering JPEG is the most widely used image encoding standard, we also considered special encoding features of JPEG images in our approximation methods for solving the optimization problem.Such features include DC level shifting, quantization of DCT coefficients, two's complement encoding of the DCT coefficients, and different DC encoding schemes.
From Eq. ( 1), one can see that DC coefficients are always non-negative since they represent average brightness of a block, which will be always non-negative.To more effectively encode DC coefficients, in JPEG, pixel values are first subtracted by half of the range (e.g., 128 for 8-bit images) before the 2-D DCT is applied.In this case, the sign bit of a DC coefficient can be positive or negative.
In JPEG, each coefficient y(k, l) is divided by a quantized step and rounded to the nearest integer.Due to the error caused by quantization, we may need to relax the variable bounds further to make reasonable predictions.The relaxation for coefficient y(k, l) is where Y (k, l) is the quantized DCT coefficient and Q(k, l) denotes the corresponding quantization step defined in the quantization table.The quantization of DCT coefficients means the range of pixel values calculated from such coefficients may go outside the valid range (e.g., [0,255] for 8-bit images), therefore, we need to consider how to relax the lower and upper bounds of pixel values by considering the effect of quantization errors.Considering the extreme case that quantization errors of all DCT coefficients have the same sign bit as the corresponding element in the 2-D DCT matrix A, the maximum quantization error of the pixel value is Considering the additional quantization errors that can be introduced in the solving process of the LP and MILP algorithm, we increase (i, j) by 1 to ensure such additional errors will still be tolerated.As a whole, the consideration of such quantization errors leads to a new condition for x(i, j): Another coding feature in JPEG is the encoding of DCT coefficients using a standard table similar to two's complement format.Accordingly, in the LP with relaxation method, the relaxed upper and lower bounds of a coefficient with an unknown sign bit will be asymmetric since the non-sign bits are encoded differently for positive and negative values.
Finally, in JPEG each DC coefficient is encoded following the differential pulse code modulation (PCM) method, i.e., what is encoded is the difference of the current DC coefficient and a previously encoded one.There are several different ways to define the previous DC coefficient2 : • DC prediction mode 1: the previously coded block in the same row; • DC prediction mode 2: the previously coded block as scanned in the raster order; • DC prediction mode 3: the average of two previously coded blocks immediately above and/or to the left scanned in the raster order.To deal with the above encoding features, a new variable z is introduced to represent the result of the DC differential encoding.Since we only know the absolute value of an encoded difference z, the constraint on z is z ∈ {−|z|, |z|}, and the corresponding linear relaxation is −|z| ≤ z ≤ |z|.The relationship between z and the DC coefficient y(0, 0) is defined according to the differential encoding scheme.For example, if prediction mode 1 is adopted, one can deduce y(0, 0) − y (0, 0) = z, where y (0, 0) is the DC coefficient of the previous block in the same row.Apparently, DC coefficient y(0, 0) is determined by the variable z.For comparison, we also define DC prediction mode 0 to be the case where the original DC coefficients are encoded (i.e., without using the differential encoding scheme).Moreover, the dependency introduced by the differential encoding may cause error propagation to occur.This can seriously downgrade the visual quality of restored images.
When the region-wise MILP method is used with DC differential encoding, some DC coefficients of the current region are dependent on DC coefficients from one or more previous regions.We propose the following three different strategies to solve such inter-regional dependencies.
• Dependency mode 0 -Removing the dependency completely.Then, the corresponding encoded difference is actually not exploited, although the dependency within the region is still maintained.• Dependency mode 1 -Solving all regions following the raster order so that all previously encoded DC coefficients are available for the current region.For DC prediction mode 2, the leftmost block of each row relies on the rightmost block of the above row, so if the region covers more than one block row, the region has to be as wide as the whole image.Otherwise, some regions to the left will have dependencies on some other regions to the right that have not been previously solved.In other words, the valid region size of this and the next modes can only be In this mode, the MILP problem of each region is limited to pixel values and DCT coefficients of the current region.• Dependency mode 2 -The same as dependency mode 2 except for each MILP problem of a region we also consider pixel value differences between the current region and the one or more adjacent region(s) that have been previously solved.Since DCT coefficients and pixel values of all previously solved regions are already solved, they can be considered known so that the size of the MILP problem remains the same in terms of the number of unknown variables.

IV. EXPERIMENTAL RESULTS
In this section, we first give the results of the extensive experiments conducted on 30 typical images (22 of size 256 × 256 and 8 of size 384 × 256) to verify the performance of the two recovery methods 3 .The objective metrics PSNR and SSIM were used as the main quantitative indicators for evaluating the visual quality of reconstructed images.We also conducted subjective quality evaluation of selected images to ensure the objective quality indicators match the actual visual quality perceived by us as expert observers.Then, we briefly compare our methods with some naive recovery methods.Note that all experiments were implemented in Matlab with IBM ® ILOG ® CPLEX ®4 .

A. Performance of the relaxed LP method
In this subsection, we show our thorough analysis of the impact of several key parameters and different implementation strategies on the recovery performance of the relaxed LP method.
1) Accelerating computation by ignoring small DCT coefficients: To speed up the computation, we directly set an unknown DCT coefficient to zero if its absolute value is lower than a certain threshold T .Since the coefficients close to zero contribute less to pixel values, ignoring them may not affect the visual quality of recovered results much.To find an appropriate threshold that balances the computational complexity and the recovery quality, we evaluated the computational time and visual quality of results at various threshold values under different DC prediction modes.
When there are fewer missing coefficients, we found that the speed acceleration effect is weak because only a small number of coefficients have closer-to-zero values and therefore are ignored.Note that we increase more unknown coefficients in the "zig-zag" order used in the JPEG standard.Increasing the number of missing coefficients U , we found that the speedup became more apparent and observed a sharp reduction in the computation time at a specific threshold.As shown in Fig. 2, the time reduction happens at T = 5 when U = 6.The corresponding recovery quality at various threshold values is shown in Fig. 3.It can be seen that the performance is not very sensitive to the change of threshold T .According to the experimental results, we set T = 5 for other experiments, which can provide speedup to some extent while preserve almost the same recovery performance.2) How the DC encoding scheme affects the recovery result: The DC differential encoding can impact the recovery performance because of the error propagation effect.Table I lists the recovery performance of the four different DC prediction modes with U = 2. Observing the results in Fig. 3, one can see that the best performance was obtained in DC prediction mode 0, i.e., when the differential encoding is absent.This is consistent with our expectation, since there is no error propagation effect in Mode 0. In other modes, if an error occurs for one block, then the following blocks depending on it for DC prediction will be affected.As the error propagates over the entire image in Mode 2, the recovery performance is worst.Mode 1 is slightly better because the error propagates only within each row.Although the error also propagates globally in Mode 3, the visual quality of recovered images is superior to that of Modes 1 and 2. This may be attributed to the fact that the DC prediction is dependent on the block above and to the left of the current block.So the error propagation effect might be alleviated by taking the average of the DC coefficients of the two blocks.To visualize the impact of the aforementioned error propagation effect, we display four recovered images under the four prediction modes in Fig. 4. 3) How the sign bit recovery method affects the recovery result: After solving the linear relaxation of the original problem, one needs to convert the obtained coefficients to sign bits.A simple way is to directly extract the sign bits of estimated coefficients.However, this leads to ambiguity when the obtained coefficients happen to be zero.In such cases, we adopt the following four methods: 1) setting the coefficient to zero by ignoring the known absolute value of the DCT coefficient; 2) setting the sign to 1; 3) setting the sign to −1; and 4) assigning the sign randomly.
The recovery performances of the above four methods are listed in Table II, when U = 2 and with the DC differential encoding disabled (i.e., DC prediction mode 0).As shown in Table II, the performances of the first four methods are very similar, suggesting that different mapping methods have little impact on the visual quality of the final recovered images.The results are similar for different values of U and prediction modes of DC, too.Based on the results, we adopted method 1 for other experiments.4) How the bound relaxation affects the recovery result: As mentioned before, due to the way how DCT coefficients of JPEG images are encoded, we may need to further relax the ranges of pixels and coefficients to cope with the quantization error.Let R x and R y denote whether relax x and y, respectively.Table III exhibits the recovery performance of four different combinations of bound relaxations, when U = 2 and the JPEG quality factor (QF) is set to be 95.Here, we only consider the DC prediction modes 1 and 2 used in the JPEG coding.As shown in Table III, adding extra relaxations does not improve the performance, and even degrades the performance in some cases.Therefore, the additional relaxation looks unnecessary.Without the above relaxation, the coefficient y and its quantized version Y are linearly correlated since y(k, l) = Y (k, l)•qt(k, l).However, this relationship does not exist when R y = 1 according to Eq. (3).Then the number of variables to be determined and time consumption increase considerably.Table IV shows the time consumption in the above relaxed optimization.One can see that relaxing x does not affect the computation efficiency much, but relaxing y increases the computation time by nearly five times.The comparison results of time consumption and performance for different U are quite similar.Based on the results, we set R x = 1 and R y = 0 for the subsequent experiments on JPEG images.5) How the JPEG quality factor affects the recovery result: When the JPEG quality factor decreases, DCT coefficients are divided by larger quantized values in quantization, which causes more known coefficients to be zero.Zero-value coefficients provide less useful information for optimization, so the recovery performance becomes worse.Tables V and VI display the recovery performance on JPEG images in DC prediction modes 1 and 2, respectively.In most cases, the recovery performance only drops slightly as the JPEG quality factor decreases.As the performance drop is relatively small, the optimization method can still handle JPEG images well.6) How the number of missing sign bits affects the recovery result: When more sign bits are missing, the optimization becomes more intractable and the recovery performance should degrade accordingly.We conducted experiments based on the assumption that in each DCT block, sign bits of the U most significant DCT coefficients are missing.We increase the number of missing sign bits in the "zig-zag" order used in JPEG.Since higher-frequency DCT coefficients have less energy statistically, we predicted that the reduction of recovery performance will get smaller as U increases.Figure 5 shows  7) Time consumption: Similar to [22], the time complexity of the resulting linear relaxation problem is O(n 4 m 4 U ), where n × m is the size of the input image.Figure 6 shows the average time consumption with respect to different numbers of missing coefficients, and the increase is roughly in line with the theoretical estimate.Besides, we found that the time consumption across different images can vary by several times.The optimization is particularly time-consuming for images with monotonous backgrounds.In these backgrounds, most of the coefficients are equal to zero and provide less useful information, which causes the optimization to be more intractable and therefore consume more time.As the quality factor descends, the computational time rises remarkably, which may be explained by the existence of more zero-value coefficients after quantization and less information to find the optimal solution.The time consumption on JPEG images is reported in Table VIII.For the hierarchical MILP and the hybrid MILP and LP methods, we mainly focused on the effect of the region size, the global alignment method, and the DC encoding scheme on the recovery performance, and also the run-time performance.
1) Time complexity: The time complexity of the regional MILP problems increases exponentially with the number of unknown coefficients.Similar to the relaxed LP method, we attempted to accelerate computation by ignoring small coefficients.However, We found that the acceleration effect vary dramatically across different images.Moreover, even for different regions of the same image, the time consumption can vary by a factor of tens.As a typical example, Table IX shows the time consumed when solving each of 16 regional MILP problems of a test image of size 256 × 256 (Fig. 8a)) with different thresholds, where region size is 64 × 64 and U = 5.The threshold T does not seem to have a straightforward effect on how the time complexity can be reduced, therefore, we decided to set T = 0 in subsequent experiments.Besides, we set a time-out value for each MILP problem depending on the problem scale, which was empirically set to be 600 seconds.This time-out value helps assure that the optimization will always finish within a finite time and exclude unrealistic settings that require an excessive amount of time for the optimization to finish.2) How the region size affects the recovery result: The region size is an essential parameter for the hierarchical MILP and the hybrid MILP and LP methods.In theory, the larger the region size, the better the recovery performance, at the cost of a higher time complexity.There should be an adequate large size to balance the recovery performance and the computational complexity.Specifically, keeping increasing the size can only provide slight improvement on recovery performance, but places an unacceptable burden on computation.
To explore the effect of the region size, we conducted experiments with various region sizes and different values of U .Without loss of generality, we divided images into square regions of equal size.The visual quality of recovered images in the first stage is shown in Table X.The performance gap between different sizes (from 16 × 16 to 64 × 64) is consistent across different U .We observed a big improvement when expanding the size from 16×16 to 32×32.Keeping increasing the size, we found that the improvement of the recovery performance was not significant, while the time consumed increased substantially.Although the time limit was already reached when solving the MILP problem for a region of size 64 × 64 and U = 7, we still tried a larger size of 128 × 128 with a quadrupled time limit to test the potential of the regional MILP problem based methods.It was observed that the visual quality instead declined when U = 7 due to the time restriction.As a whole, 32 × 32 seems a suitable region size, considering both the time complexity and the recovery performance.3) How the global brightness alignment strategy affects the recovery result: In the second stage, the DC coefficients are further optimized to align the global brightness.We adopted three global alignment strategies including the global MILP, the block LP, and the region LP.The global optimization can be effective for smaller region size used in the first stage.In contrast, the additional optimization may be unnecessary when the region size is large enough.
The mean visual quality results of recovered images are shown in Tables XI and XII.Note that we disabled the DC differential encoding here to focus on the alignment effect solely.For comparison, the results in the first stage are also given.There is a noticeable improvement when the region size is 16 × 16.Increasing the region size further, the global alignment has negligible effect and even causes a slight decrease in visual quality.Besides, we observed that the global MILP strategies can provide stable and better improvement compared to the other two LP-based alignment strategies.This indicates that the hierarchical MILP method is better than the hybrid MILP and LP method.4) How the DC encoding scheme and the dependency mode affect the recovery result: The DC differential encoding causes dependency between divided regions.We designed three DC dependency modes in Sec.III-C to solve the dependency.Here, the global alignment is necessary for dependency Mode 0, since the dependency is simply removed, which leads to severe brightness misalignment, as shown in Fig. 7.The global MILP alignment strategy was adopted in the second stage.Compared with the simple removal of dependency in Mode 0, more subtle strategies are developed in the other two modes to resolve the dependency, in the hope of achieving a better performance.However, the error propagation effect should also be considered when the DC differential encoding is considered.Our experimental results showed that the error propagation across different regions cannot be handled well in Mode 1, leading to inferior recovery performance compared to Mode 0 across different DC prediction modes.We also found that, for Mode 2, a slight improvement over Mode 1 can be achieved by considering pixel value differences between adjacent regions in the first stage.For Mode 0, there is no obvious difference in the recovery performance among the three DC prediction modes.Table XIII lists the recovery performance under different DC prediction modes and U = 5 for 22 images of size 256 × 256. 5 The region size 32×32 was adopted in most cases as this is the size balancing the computational complexity and the recovery performance well.For DC prediction mode 2, considering the special requirements on the region size, we chose three representative values for the region sizes: 8×128, 16×256, and 32×256 for comparison, where the number of pixels of the first one is the same as that of the size 32 × 32.It can be seen that the visual quality is considerably degraded when the DC prediction is introduced.For instance, compared with the case of U = 7 in Tables XI and XII, the recovery performance under DC prediction mode 2 with U = 5 is even worse than the case of DC prediction mode 0 with U = 7, although in the former case there are less sign bits missing per block.Such a phenomenon implies that encrypting the sign bits with the differential DC encoding may provide better security than encrypting more sign bits without the differential DC encoding.

C. Performance comparison
Two previous subsections show results about some key parameters and methods for identifying suitable settings for our recovery methods.For the hierarchical MILP method, the region size 32 × 32 was selected to balance the recovery performance and the computation complexity.To show the effectiveness of our methods, we present three naive methods as benchmarks, since there are no other known solutions based on more advanced techniques in the literature.The naive recovery methods just attempt to restore the unknown parts with simple error-concealment strategies.For instance,

Fig. 1 .
Fig. 1.Distribution of difference between neighboring pixel values of the standard test image "Lenna" of size 512 × 512.

Fact 1 .
Given N observations {Z i } N i=1 of a Laplacian distribution Laplace(µ, b) with zero means µ = 0, the maximum likelihood estimator (MLE) of the parameter b of the Laplacian distribution is 1 N N i=1 |Z i |.

Fig. 7 .
Fig. 7. Recovered images using dependency mode 0 in the first stage for different prediction modes: a) Mode 0; b) Mode 1.

TABLE I PERFORMANCE
COMPARISON USING DIFFERENT DC PREDICTION MODES.

TABLE IV SPEED
COMPARISON FOR BOUND RELAXATIONS.

TABLE V PERFORMANCE
COMPARISON UNDER DIFFERENT QUALITY FACTORS IN DC PREDICTION MODE 1.

TABLE VI PERFORMANCE
COMPARISON UNDER DIFFERENT QUALITY FACTORS IN DC PREDICTION MODE 2. DC prediction mode.The PSNR values of recovered images drop rapidly when U ≤ 10, and then decreases relatively slowly.In terms of SSIM, the general trend is that the performance keeps decreasing in a moderate speed, and the slope gets smaller with larger U .We also conducted experiments on JPEG images with QF of 95 and the corresponding results are shown in TableVII.Similarly, the visual quality drops rapidly when U ≤ 8, and then the trend gradually flattens.

TABLE VIII TIME
CONSUMPTION ON JPEG IMAGES.

TABLE IX TIME
CONSUMED (IN SECONDS) OF SOLVING EACH OF 16 REGIONAL MILP PROBLEMS OF THE TEST IMAGE FIG.8A).

TABLE X PERFORMANCE
COMPARISON FOR VARIOUS REGION SIZES.

TABLE XI THE
MEAN VISUAL QUALITY OF RECOVERED IMAGES UNDER VARIOUS CONFIGURATIONS (SSIM).

TABLE XIII PERFORMANCE
COMPARISON FOR DC DEPENDENCY MODES.