Image Statistics Preserving Encrypt-then-Compress Scheme Dedicated for JPEG Compression Standard

In this paper, the authors analyze in more details an image encryption scheme, proposed by the authors in their earlier work, which preserves input image statistics and can be used in connection with the JPEG compression standard. The image encryption process takes advantage of fast linear transforms parametrized with private keys and is carried out prior to the compression stage in a way that does not alter those statistical characteristics of the input image that are crucial from the point of view of the subsequent compression. This feature makes the encryption process transparent to the compression stage and enables the JPEG algorithm to maintain its full compression capabilities even though it operates on the encrypted image data. The main advantage of the considered approach is the fact that the JPEG algorithm can be used without any modifications as a part of the encrypt-then-compress image processing framework. The paper includes a detailed mathematical model of the examined scheme allowing for theoretical analysis of the impact of the image encryption step on the effectiveness of the compression process. The combinatorial and statistical analysis of the encryption process is also included and it allows to evaluate its cryptographic strength. In addition, the paper considers several practical use-case scenarios with different characteristics of the compression and encryption stages. The final part of the paper contains the additional results of the experimental studies regarding general effectiveness of the presented scheme. The results show that for a wide range of compression ratios the considered scheme performs comparably to the JPEG algorithm alone, that is, without the encryption stage, in terms of the quality measures of reconstructed images. Moreover, the results of statistical analysis as well as those obtained with generally approved quality measures of image cryptographic systems, prove high strength and efficiency of the scheme’s encryption stage.


Introduction
The beginning of the twenty-first century brings the dynamic development of telecommunication technologies giving the possibility of practical use of multimedia in almost all areas of our lives. The successively increased bandwidths of data transmission channels enable fast transmission of high resolution images and video sequences. This allows for the wide spread of remote data exchange systems for audio/video conferencing in real time, publishing multimedia content in computer networks through websites or image data exchange between experts in various fields using dedicated database systems, that is, in medical sciences, engineering or forensics.
The transmission of multimedia data using publicly available open communication channels makes the data vulnerable to interception and overhearing by unauthorized parties. This problem can be solved with cryptographic algorithms. The security of multimedia data transmission can be ensured by conventional cryptographic algorithms, that is, block ciphers such as Data Encryption Standard (DES) [1], International Data

Review of Existing Solutions
The ETC schemes proposed in the literature are based on a variety of approaches, that is, source coding with additional information [5][6][7], iterative image reconstruction based on reduced representation in the orthogonal transform domain [8], encryption and compression in the domain of integer wavelet transform [9], compressive sampling [10], the elements of game theory [11]. Depending on the adopted approach, we can expect different values of the compression ratio CR obtained at different values of the Peak Signal-to-Noise Ratio (PSNR), as well as different levels of compliance with the considered scenarios of practical usage (see Figure 1). In particular, we have approaches that exploit: • Source coding with additional information: compression is based on known statistical relationships between ciphertext and the private key. If the elements of the ciphertext and the private key are correlated to the extent that the Hamming distance between them can be bounded from above, that is, the distance is not greater than ξ, then this information can be effectively used by the compression algorithm to reduce the size of the data without knowing the private key. It is enough to divide the set of values of ciphertext elements into layers in which we place those elements that are distant by more than 2ξ bits, whereas to the output stream we write not the values of elements themselves, but the identifiers of the layers to which they belong. In paper [5] it was shown that under certain conditions it is possible to compress encrypted data to the same level as in the case of compression of original data, that is, not subjected to encryption. The Authors also proposed a scheme for the encryption and lossless compression of binary images using LDPC (Low-Density Parity Check Codes) correction codes and XOR operation at the compression and encryption stages respectively. It allowed to obtain the practical compression ratios at the level of CR = 1.3. In paper [6], the original approach was improved by taking into account the spatial dependencies between the values of neighboring pixels in the image. This allowed to increase the value of the compression ratio to the level of CR = 2.3. Another improvement of the original method for grayscale and color images was proposed in paper [7]. The average values of compression ratios were obtained at the level of CR = 1.8. The essential drawbacks of approaches based on source coding are relatively low levels of the compression ratio, and the lack of symmetry of the entire scheme, which requires to combine decompression and decryption stages. In practice, it means that such approaches do not fully follow the scenarios depicted in Figure 1. • Approximate representation of image pixels in the domain of linear transform: such approach was proposed in paper [8], here at the encryption stage the image pixels are scrambled with use of permutation determined on the basis of the private key, whereas the compression stage assumes to divide the elements of ciphertext into two sets: (a) rigid pixels that are not further modified, (b) flexible pixels. The values of flexible pixels are represented in the domain of linear orthogonal transform and then quantized, while the results are assigned to a specific equivalence classes resulting from the division operation. It allows to describe the value of each of the obtained coefficients in the form of a weighted sum of three components: (a) coarse, which can be estimated based on the values of the nearest rigid pixels, (b) the average, which next to the values of rigid pixels is written to the output stream, (c) detailed, which is rejected. The values of rigid pixels and representations of transform coefficients that describe the components of the average elastic pixels are written to the output stream. The compression itself is lossy. The reconstruction of the image is possible based on the proposed iterative procedure. The practical values of the compression ratio obtained with this method are at the level of CR = 3 with PSNR values around 35 dB. • Image representation in the domain of discrete integer wavelet transform (DIWT) (see [9])-at the encryption stage, the grayscale input image is transformed using DIWT into one coarse band and nine bands containing detailed information. The data contained in the coarse band is encrypted by adding to it a sequence of pseudo-random numbers, while the range of resulting values is limited by the modulo division operation. The data contained in the remaining bands is permuted. Both the the pseudo-random sequence and the permutation are determined on the basis of the private key. The compression stage operates only on the encrypted data coming from the detailed bands. The data taken from those bands is quantized and then entropy coded using arithmetic coding. The compression and encryption stages are reversible, with the whole scheme being symmetrical. The practical results obtained with this method are very close to those obtained with JPEG standard. It should be emphasized that the compression method used here is a lossy one. Hence it is possible to obtain high compression ratios around CR = 20 at the expense of quality distortion, but still with PSNR above 30 dB. • Compressive sensing (see [7])-at the encryption stage, the image is reshaped into a single vector, and then encrypted by a linear method consisting of multiplying the input vector by any matrix, for example, permutation matrix, which is generated on the basis of a private key. The compression stage is based on compressive sampling and relies on projecting the ciphertext vector onto a set of basis vectors from the subspace with reduced size. Such a basis is most often a set of linearly-independent vectors whose elements are randomized. In addition, the coefficients of projections are quantized. In this way the reduction in the size of the data can be obtained. The image decoding process is based on the theory of compression sensing, wherein the matrix of discrete cosine transform (DCT) is taken as a matrix allowing for a sparse representation of the input image. During the experiments the practical values of the compression ratios were at the level of CR = 2 with the PSNR coefficients around 30 dB. However, the discussed scheme is asymmetrical, which manifests in the way that the decompression and decryption stages must be combined. • Game theory-proposed in paper [11], it is an improvement of the previously described approach from paper [8]. Here, at the encryption stage the image is divided into blocks of arbitrary sizes (e.g., 32 × 32 pixels), the order of which, the same as the order of pixels within the blocks, is modified using permutations described by the private key. An additional action is to determine the block type, that is, whether it is a texture or a smooth part of an image. Its aim is to increase the efficiency of the compression step. However, such actions must be done by the sender, which is surely a disadvantage of this approach. The compression step is based on the [8] approach, wherein the algorithm is applied to subsequent blocks, not to the whole image. Then, depending on the type of block, the value of a coefficient describing the share of rigid and elastic pixels, as well as the quantization step can be selected individually for each block. The choice of parameter values is adaptive and controlled by an algorithm based on the game theory, where image quality is being maximized while keeping the limit on the size of image after compression. Image reconstruction is based on the iterative technique proposed in [8]. During the experimental research, the quality of smooth images and textures was at the level of 36 and 27 dB, with the compression ratios around CR = 2.32, which is an improvement of about 3 and 1 dB respectively when compared to the original approach from paper [8].

Mathematical Model of the Analyzed Scheme
In this section, we will present the mathematical model of the proposed scheme, and show the main characteristics of the compression and encryption processes present within its course.
Let us assume that the input of the considered scheme is a monochromatic image being a realization of some two-dimensional, stationary, zero-mean random field Ψ. Let w, h, n ∈ N and W = w × n, H = h × n and let's suppose that M = w × h and N = n 2 . In such case the image will be represented by H × W element matrix U, with elements u ij ∈ R for i = 1, . . . , H and j = 1, . . . , W. At first, initial arrangement the image's input data is performed, what is shown in Figure 2.
. .  In step (a), the input image U is divided into separate, square fragments X k , k = 1, . . . , M, each of which being the n × n element real matrix. Input image's matrix U components mapping to the elements of matrices X k can be compactly written in the form of the following relationship: ∀ k ∈ 1, . . . , M and ∀ i, j ∈ 1, . . . , n: where the symbol × stands for the floor function and mod denotes the integer modulo operation. In the next step, that is, step (b) in Figure 2, each of the X k matrices is flattened, creating the respective N = n 2 -element vector x k of the form: where ( X k ) l , k = 1, . . . , M, l = 1, . . . , n, is the l-th column of the matrix X k and vec ( × ) is the matrix column vectorization operator, see for example [14]. In the last step, that is, step (c), N -element vectors x k are arranged into successive columns of the N × Melement matrix X, that is: where ( X ) k is the k-th column of the direct input matrix X, which constitutes the final form of the input data arrangement in the considered coding scheme. After the input matrix X has been prepared, the coding process begins, whose course is schematically shown in Figure 3.  In the first step, denoted by (i), the input matrix X is encrypted and scaled, that is, Y = 1/α X C, where C is the M × M -element, real, orthogonal, that is, CC T = I, encryption matrix and α ∈ R is the scaling factor, whose value will usually be greater than 1, ensuring the range of the magnitudes of the elements of the ciphertext X C being acceptable from the point of view of the subsequent coding stages.
Step (ii) is optional and, for specific implementations of the analyzed scheme, might involve acquisition of the scaled ciphertext Y in the form of the preferred standard graphics file format, for example, BMP or PNG (see e.g., [15]). This enforces truncation or rounding of the ciphertext data to the respective integer values, since most of the graphics file formats assume their input image data to be coded as integers. This is modeled by adding the N × M -element integer projection error matrix E to the ciphertext Y (see e.g., [16,17]), resulting in the integer-valued matrixŶ. Steps (iv), (v) and (vi) comprise the actual image compression process which follows exactly the JPEG image compression algorithm's operation (see [18]). In those steps the JPEG method can be utilized without any modifications. In step (iv) the ciphertext Y is transformed by the orthogonal, N × N -element compression matrix A, then it is quantized and entropy coded. Quantization is modeled by adding the N × M -element rounding error matrixÊ to the matrix Z, resulting in the integer-valued matrixẐ. In step (vii) the compressed ciphertext is sent through the open communication channel to the destination device. Decompression, which is performed in step (viii), is the exact reverse of the compression process taking place in steps (iv), (v) and (vi). At this stage we assume the decompression matrix A T to be the inverse of the orthogonal compression matrix A applied in step (iv). Here, the unmodified JPEG algorithm can also be fully utilized, resulting in the obtainment of the N × M -element, real matrix W. In the last stage (x) of the considered scheme, the decryption and rescaling of the values of the matrix W is carried out, what results in obtaining the N × M -element matrixX, which approximates the input image matrix X. The analyzed encryption before compression coding scheme, described above, can be stated in terms of the following model equation: Equation (4) comprises the mathematical model of the examined scheme and is used as the basis for derivation of its most significant efficiency characteristics.

Compression Process Effectiveness Analysis
In this section we will present and analyze the equations describing the efficiency of the compression process present within the considered image coding scheme. For this purpose, on the basis of the model Equation (4), using high-resolution approximations to Shannon's information theory, for example, [19][20][21][22][23], along with the results developed in our previous work [12], we'll derive the distortion-rate characteristics of the analyzed scheme, that is, the D(R) function, whose explicit form comprises the exhaustive description of the effectiveness of the analyzed image compression process. Eventually, we'll show that: (i) the obtained D(R) function does not depend on the choice of the encryption matrix C, and (ii) the encryption step preserves the second order statistics of the input image data. Both mentioned features are the main characteristics of the image coding scheme examined in this work.
We will solely base our analysis on the results obtained in our previous work (please refer to [12] for all the detailed derivations), in which it is initially stated that the mean squared error of reconstruction of the input signal at the output of the considered scheme takes the following form: where tr{×} is the matrix trace operator. Using orthogonality of the compression and encryption matrices A and C, respectively, we can simplify the form of the imageX, reconstructed at the output of the analyzed scheme, obtaining: Substituting Equation (6) to the relationship (5) and using proper sample approximations, E andÊ, of the integer projection and quantization errors' matrices, respectively (see [12]), present in steps (ii) and (v) of the examined scheme (c.f. Figure 3), we conclude that: where α is the scaling factor used in steps (i) and (x) of the analyzed scheme, while σ 2 e i and σ 2 e i are the sample estimators of integer projection and quantization errors' variances (see [12]). It is worth explaining here, that in case of the optional scenario (c.f. step (ii) in Figure 3), in which the user wishes to archive the encrypted image X C in the form of one of the selected standard graphics file formats, e.g BMP, the concrete value of the scaling factor α, proper for the particular input image X, has to be chosen in such a way, that all integer projected values 1 α X C + E of the scaled ciphertext 1 α X C must fall into an interval contained within the appropriate input range, accepted by that selected format, for example, in case of 8-bit grayscale BMP format images, 1 α X C + E must fall (after 128 levelshift) into an interval [ 0 , . . . , 255 ] ⊂ Z. The detailed discussion on the values of the scaling factor α, proper for our model's assumptions, will be carried out in the next section.
Let us now examine the problem of approximation of the sample estimators of integer projection and quantization errors' variances, that is, parameters σ 2 e i and σ 2 e i , i = 1, . . . , N present in the Equation (7), which result from performing the steps (ii) and (v) within the examined image coding scheme. Let's consider uniform scalar quantization of a continuous, one-dimensional random variable X with sufficiently smooth probability density function p X , performed with two types of scalar quantizers,x (1) andx (2) , whose reconstruction levels for an arbitrary value x ∈ R of a random variable X are given by the following relationships: where ∆ ∈ R + denotes the quantization step. The operation of both of the considered quantizers is depicted schematically below in Figure 4. Quantizersx (1) andx (2) perform rounding and truncation, respectively, of the variable's X values to the nearest multiplicities of their quantization steps. With such assumptions we can write the common expression for the quantization error variance σ 2 e (k) , for both of the considered quantizersx (k) , k = 1, 2 , in the following way: where N L 1 denotes the number of quantization levels of a selected quantizer and Taking advantage of the well-known results of high resolution quantization theory, see [20][21][22], we can infer that the expression (9) can be approximated by the following relationships: appropriate for the considered roundingx (1) and truncationx (2) quantizers, respectively. Moreover, setting ∆ ≡ 1 in Equation (10), lets us approximate the error variances for the integer rounding and integer truncation towards zero operations, which eventually take the following forms: 12 for rounding, 1 3 for truncation.
Going back to the main course of our considerations and using the results stated in Equations (10) and (11), we can rewrite the expression (7) for the mean squared error D of the reconstruction of the input image at the output of the analyzed scheme, as follows: where ∆ 2 i , i = 1, . . . , N are the steps of the independent scalar quantizers used in stage (v) of the examined scheme and σ 2 e can take alternative values given in (11), depending on the chosen integer projection operation applied in step (ii) of the analyzed scheme.
Let us now concentrate the on the evaluation of minimum average bit rate R, being the bit rate in the sense of the Shannon's rate-distortion theory [19], of representation of a single sample coded at the output of the examined scheme, corresponding to the mean squared error D, given in (12). On the basis of the detailed derivation presented in our earlier work [12], we can immediately state that the approximate value of the considered bit rate may be expressed as: where a T i , i = 1, . . . , N is the i-th row of the orthogonal compression matrix A, utilized in step (iv) of the considered scheme, and R x = 1/M X X T is the sample autocovariance matrix of the input image X. Using dependencies (12) and (13), after some mathematical manipulations (once again please refer to [12] for the details), we obtain the explicit form of the relationship between the analyzed mean squared error D and its corresponding minimum average bit rate R, characteristic of the analyzed coding scheme: where ∆ ∆ ∆ is the Euclidean norm of ∆ ∆ ∆ = [ ∆ 1 , ∆ 2 , . . . , ∆ N ], that is, the vector of the quantization table coefficients. According to Shannon's information theory, the relationship between the measures D and R, given by the approximate dependency (14), that is, the distortion-rate function, is an exhaustive description of the effectiveness of image compression process being the part of the coding scheme considered in this paper. By analyzing Equation (14), describing the distortion-rate function for the examined coding scheme, one can conclude that the effectiveness of the compression process does not depend on the choice of the encryption matrix C. Moreover, we have: therefore, from statistical point of view of the compression process, which follows the input image's encryption step, both signals, input X and encrypted Y, are equivalent, up to a scaling factor, what is compensated in the last step of the analyzed scheme. Both mentioned features are the main characteristics of the presented image coding scheme and stood originally at the basis of its construction.

Examples of Compression Process Quality Characteristics
In this part of the work we will present examples of theoretical image compression process quality characteristics, achieved by the examined image coding scheme under selected operational scenarios, along with the exemplary practical results allowing for brief verification of the accuracy of the derived approximation of the distortion-rate function (14), characteristic to the considered scheme.
Let us assume a global image model (see [24,25]), in which the image is considered to be the realization of some two-dimensional, discrete index, zero-mean, stationary, separable random field with W 2 × W 2 -element autocovariance matrix of the form: where U is W × W -element stochastic matrix representing the image itself, E{ × } is the expected value operator, ⊗ is a matrix Kronecker product (see e.g., [14]), σ 2 x is the variance of a single random variable of the field U, ρ r , ρ c ∈ (−1, 1 ) are row and column correlation coefficients, respectively, of adjacent elements of the U matrix, and the individual elements of the W × W -element matrix R ( ρ ) are defined as follows: Let us assume further that the M × M -element, orthogonal encryption matrix C has the following form: where diag ( C 1 , . . . , C M/K ) is a block-diagonal, orthogonal real matrix, composed of K × K -element orthogonal matrices C i , i = 1, . . . , M/K, while additional M × M element permutation matrices P r and P c apply respective permutations to the rows and to the columns of the block-diagonal matrix diag ( C 1 , . . . , C M/K ) . Such choice of the form of the encryption matrix C is very useful practically since it enables, for example, the calculation of the ciphertext image with the use of fast parametric transformations (see Section 5 or, e.g., [26]), and/or perform a simple balance adjustment between the efficiency of the compression process within the examined scheme and its cryptographic strength.
It is relatively easy to show that for the form (17) of the encryption matrix C, the maximum possible absolute value of a single element of the ciphertext matrix Y is equal to √ K x max , where x max is the maximum possible absolute value of the individual sample of the input image X. Based on the assumption that x max is also the maximum possible value comprising the input range limit, accepted by the selected standard graphics file format, to which the user wishes to archive the encrypted image Y in the optional step (ii) of our scheme (see discussion in the first part of Section 4), we can infer that the maximum possible value of the scaling constant α used in step (i) of the analyzed scheme is √ K. This inference can be summarized as: As explained earlier in Section 4, choosing the value α max of the scaling factor α ensures that all integer projected samples 1 α max Y + E of the scaled ciphertext 1 α max Y will fall into an interval contained within the appropriate input range, accepted by the mentioned graphics image file format.
Let us now choose the compression transform to be 8 × 8 point two-dimensional discrete cosine transform of the second kind (2D-DCTII) with a matrix form (see [27]): where the elements of 8 × 8 -element matrix A (8) 1D are: This is an orthogonal transformation, optimal in the image compression problem for the probabilistic image model (18) with row and column autocorrelation coefficients ρ r , ρ c of the adjacent image elements equal in limit to 1, see [27]. Let us finally choose the quantization table, which is the last parameter required for the simulation model of theoretical quality characteristics of the compression process present in the examined coding scheme to be completed. Let us take the quantization table recommended in the JPEG standard (see [18], table K.1, p. 143) with the following quantization steps: The parameters R x , C, A 2D , ∆ ∆ ∆, and α max for σ 2 x = 1024 and ρ r = ρ c = 0.95 correspond to the simulation of the compression process, being a part of the considered scheme, which fully utilizes unmodified standard JPEG algorithm which in this case operates on W × W pixel, 8-bit grayscale image. Figure 5 shows the theoretical quality characteristics of the compression process for the examined scheme in the form of the dependence of the PSNR measure of image reconstruction error D at the output of the analyzed scheme on the minimum average bit rate R, resulting from Equation (14). For clarity, it is worth mentioning that in Figure 5 the PSNR = 10 log 10 ( 255 2 /D ), where the mean squared error D is given by the relationship (14).
Additionally, to illustrate the trade-off between the scheme's cryptographic strength, depending on the dimensions K of the encryption matrices C i (see (17) and Section 6), and its compression capabilities, three different characteristics for K = 8, 16 and 32 are shown in Figure 5. The last of the characteristics presented in Figure 5 applies to the case when σ 2 e = 0, that is, when the user omits the optional step (ii) of the analyzed scheme in which the integer projection is performed. In such case Equation (14) does not depend on the scaling factor α, and the situation corresponds to the scheme's implementation variant in which it is not necessary to save the intermediate encrypted image, for example, to a file in the BMP format, that is, the ciphertext Y is fed directly to the compression subsystem as a matrix of real samples. Such scenario requires only a slight modification of the standard JPEG method in such a way, that it should be acceptable to supply the JPEG's compressor input with real sample values instead of integers. Finally, it is interesting to verify, at least briefly, the accuracy of the derived high resolution approximation (14) of rate-distortion characteristics of the considered scheme on the exemplary input image. For this purpose we have chosen the 512 × 512-pixel, 8-bit grayscale 'Mandrill' image (see Figure 12 in Section 7), used later in our experiments, since it enables for examining proportionally wider range of higher bit rates, in comparison to other images used in our experiments, due to its relatively noisy statistical characteristics. The image was processed exactly according to our scheme's model (4) operation, with the use of parameters C, A (8×8) 2D , ∆ ∆ ∆, and α max defined in Equations (17)- (21), for the block sizes K = 8, 16 and 32 of the encryption matrix C. To achieve variable sample bit rates for the experimental image, the quantization matrix (21) was multiplied by scaling factors s ∈ R + , whose values were calculated according to a widely-used JPEG's quantization table scaling method, described for example, in [28]. We then calculated sample bit rates for subsequent image compression qualities, determined by the consecutive values of parameter s, using the histogram method of differential entropy approximation for the exemplary input image, along with their corresponding sample mean squared errors of the reconstruction of the original exemplary image on the output of the analyzed scheme. Computed in such way, sample bit rates R and corresponding sample mean squared errors D, were then compared with their theoretical counterparts defined in Equations (13) and (14), calculated on the basis of the exemplary image's sample autocovariance matrix R x , respective quantization tables s × ∆ ∆ ∆ and the integer projection error's variance σ 2 e = 1 3 , corresponding to the integer truncation of the encrypted image data, performed in step (ii) of our simulation. The resulting graphs, showing the comparison of the modeled and the sampled rate-distortion pairs, characteristic to the examined image coding scheme for the exemplary 'Mandrill' input image are depicted below in Figure 6.
The graphs reveal that for higher bit rates the approximate, theoretical D(R) characteristics derived for the considered scheme, match up quite accurately with their data sampled counterparts. For lower bit rates on the other hand, the derived dependencies become inaccurate, what results from the simplifications, proper for high-resolution quantization theory approximations, used in our derivations.

Fast Parametric Orthogonal Transforms
Fast parametric orthogonal transforms (FPOT) are an extension of the class of known transforms with strictly defined basis vectors (e.g., DCTII) onto the class of transforms described by the values of parameters. The parametrization allows to determine the form of the transform on the basis of the private key, or enables its automatic adaptation to a given criterion (see e.g., [29]), while maintaining fast computational structures with O(n log 2 n) complexity, where n is the transform size. The fast computational structures of FPOTs can be determined in the way allowing to obtain the required properties (e.g., involutory transforms [30,31]) or can follow the fast computational structures of known orthogonal transforms. The second approach is based on the heuristic: if the known transform has good properties in solving the given class of problems, then parametrization, and the ability to adopt transform itself, can only serve to improve the results.
A good example of the heuristic is FPOT with two-stage structure that follows the well known Benes interconnection network [32] (see Figure 7, for n = 8). It is well known that Benes network is able to realize any permutation in the set of n elements. Hence, its computational structure can be effective from the point of view of data encryption. The parametrization of Benes network involves the use of base operations (described symbolically as '•') for example, defined as: where α i is the operator's parameter, and i is an index of the operator with i = 1, . . . , L P (n), while L P (n) describes the total number of transform parameters. For the two-stage structure we have L P (n) = n 2 (2 log 2 n − 1). And then {α i : i = 1, 2, . . . , L P (n)} is the set of parameters whose values fully define the form of the resulting transform.
In the task of encryption-then-compression of natural images FPOTs can be used at the image encryption stage to implement the block elements of the encryption matrix C, and also realize all the necessary permutations (see Section 3). In the first case we can use the structure shown in Figure 7 with base operations in the form of rotations defined as (22). Then the mapping of the bits of the private key (q bits resulting in ξ = 2 q different values) onto the values of parameters (from the range [0, 2π)) can be implemented according to the formula α i = 2πd i ξ −1 , where d i ∈ {0, 1, . . . , ξ − 1} for i = 1, 2, . . . , L P (n) (c.f. [13,31]). Then the concatenation of the bit representations of numbers d i is the part of the private key K describing the transformation. x(0) x (1) x (2) x (3) x (4) x (5) x (6) x(7) In the case of permutations the Benes network (see Figure 7) in its direct form can be used. Here, however, the base operations can be reduced to simple variants T i changing the order of elements if the s i parameter equals 1, that is,: where s i ∈ {0, 1}. Then any permutation in the n-element set can be described by the sequence {s i : i = 1, 2, . . . , L P (n)} of binary numbers, whose concatenation will complement the private key K.

Analysis of the Encryption Efficiency of the Considered Scheme
An analysis of encryption efficiency is an essential step in the design of any cryptographic system. In the case of the analyzed approach, such an analysis was performed: (i) based on the known measures used to evaluate the effectiveness of image cryptographic methods, that is, combinatorial analysis and quality indicators in the form of Histogram Analysis (HA), Maximum Deviation (MD), Correlation Coefficient (CC), or Irregular Deviation (ID) (see [33,34]), (ii) as a statistical analysis aimed at deriving dependencies allowing to determine the probability of obtaining a reconstruction error at a level not greater than a given value in the case of an attempt to randomly guess the proper encryption key.

Combinatorial Analysis
The combinatorial analysis applies to the considered encryption scheme (see Section 3), where the encryption matrix C is a composite of permutation matrices P r , P c (M × M element matrices) and a block-diagonal matrix with K × K element blocks C l for l = 1, 2, . . . , M/K. We assume that both permutation matrices and the element mixing matrices C l are implemented with use of fast parametric transforms modeled as the two-stage structure of the Benes network (see Section 4). In case of Benes network the number of free parameters equals L P (n) = n 2 (2 log 2 n − 1), where n is the size of the transform. It is assumed that for C l block matrices with base operations of the form (22) that each of the α i parameters for i = 1, 2, . . . , L P (n) is quantized to a number of ξ = 2 q values α i = 2πd i ξ −1 , which are evenly distributed over the range [0, 2π) with d i ∈ {1, 2, . . . , ξ} being an integer that describes the value of the rotation angle in terms of q bits in a natural binary representation. The Benes network with base operations of the form (23) performs any permutation in the m-element set, where m is the transform size, and each s i parameter is a one-bit integer, that is, s i ∈ {0, 1}. Thus, the length of the private key expressed in bits, which is the concatenation of binary representations of d i and s i parameters, can be determined according to the formula L K (K, M, q) = q M K L P (K) + 2L P (M), where M/K is the number of C l block matrices.
In case of an attack consisting in guessing a private key for the decryption step, the probability of drawing a key that differs by the number of κ bits in terms of the Hamming distance from the key used at the encryption stage can be described using the Bernoulli distribution with the probability of success equal to p = 1 2 . The expected value of κ variable, that is, the average number of bits distinguishing both keys, will then be equal to half the key length, that is, E{κ} = 1 2 L K (K, M, q). In turn, the probability of drawing a key that differs by the number of κ 0 bits can be described as: For example, with image of the size 512 × 512 pixels, which was divided into fragments consisting of 8 × 8 pixels, and the size of the C l block matrices was assumed to be the size of vectors X k , we get K = 64 and M = 4096. Assuming the value of q = 4 bits, the resulting length of the private key will be equal to L K (64, 4096, 4) = 256L P (64) + 2L P (4096) = 184,320 bits. Thus, the probability of guessing the encryption key will be approximately 10 −55,485 . For a key that differs by one bit from the encryption key we have 0.38 × 10 −55,479 , and for a key that differs by two bits 0.41 × 10 −55,474 . For κ 0 =46,080 (i.e., for 25% of the key length) the probability will be 0.11 × 10 −10,473 . The obtained exemplary values of probabilities are negligibly small, even for relatively small size of C l block matrices, which is K = 64. On this basis, we can conclude that the examined encryption method in conjunction with the used scheme of construction of matrix C can be characterized by high combinatorial complexity. This complexity can be further increased by increasing the value of the K parameter.

Statistical Analysis of the Decryption Error
Let there be a given set O(m) (not necessarily understood in the sense of a consistent algebraic structure) of random orthogonal matrices with the dimension m × m elements (i.e., if only we have A ∈ O(m) then AA T = A T A = I, where I is an identity matrix). By a ij for i, j = 1, 2, . . . , m we denote the elements of matrix A ∈ O(m), which are also random variables. We further assume that the matrices belonging to the set O(m) have the following properties: (i) property of zero expected value: expected values of random variables a ij for i, j = 1, 2, . . . , m are zero, that is, E{a ij } = 0 holds; (ii) stationarity property: variances of a ij random variables are constant and equal to m −1 , that is, E{a 2 ij } = m −1 for i, j = 1, 2, . . . , m; (iii) lack of correlation between matrix elements: for any two different random variables E{a ij a kl } = 0 is true, where i, j, k, l = 1, 2, . . . , m, and i = k, and j = l. In practice, matrices from the set O(m), which have the mentioned properties, can be generated: as the result of Gram-Schmidt orthogonalization of random matrices, as a product of random Housholder's transformation or Given's rotation matrices, or with use of the fast orthogonal parametric transforms. Let X be an N × M element matrix representing input data arranged as a set of M image blocks expanded into N-element column vectors (see Sec. III). Input data, known as the plain text, is encrypted using the encryption matrix C I ∈ O(M), and the encryption process can be described as: where Y is N × M element resulting matrix, that is, the ciphertext. The column vectors of this matrix are the encrypted forms of the column plain text vectors. The decryption process follows the following formula: where C II ∈ O(M) is the decryption matrix. If C II = C I T , then also X = X, and the absolute error of the signal reconstruction can be defined using the Hilbert-Schmidt operator as: where tr{×} is the trace of a matrix and (×) T describes matrix transposition. The relative value of D error related to the energy of the input signal can be defined as: Next substituting X in formula (24) with X = XC I C II we obtain after elementary matrix transformations the formula: where I is M × M element identity matrix. In the rest of the section we will calculate the expected value, variance, and the statistical distribution of the reconstruction error calculated for randomly selected decryption matrices C II ∈ O(M). The expected value of the reconstruction error (24) calculated for randomly selected matrices C II can be expressed as: Taking into account the assumed statistical properties of matrices from the set O(M), that is, where O is the null matrix, we can write: The result in (28) gives immediately D * = 2, which means that the expected value of the relative reconstruction error is twice the energy of input signal. In terms of the known measure of signal reconstruction quality, Signal to Noise Ratio (SNR), this corresponds approximately to −3 dB. The variance of the absolute error value can be described using the following relationship: where the expected value is calculated relative to the matrix C II ∈ O(M). Next taking into account the expression (26) and the expected value D (see (28)), we get the formula: Having in mind previously formulated properties of stationarity and the lack of correlation of elements of matrices from the set O(m), we can rewrite (29) in a simpler form: We should note that formula (30) does not depend on the encryption matrix C I , since tr{X T YY T X} = tr{X T XX T X}. It can be also proved that tr{X T XX T X} ≤ tr{X T X} 2 . This allows to upper bound the variance of absolute error as: In case of relative error we obtain an analogical estimate: The statistical distribution of the value of the absolute error D, and also the relative error D * , is the normal distribution. It results directly from the fact that the value of D is created as a weighted sum of random variables with zero expected value, that is, the elements c ij for i, j = 1, 2, . . . , M of matrix C, and the validity of this statement can be proved on the basis of the Central Limit Theorem. Hence, the probability density function of D can be fully characterized by its expected value D and the variance σ 2 D . Similarly D * and σ 2 D * fully define the distribution of relative error. This allows for simple estimation of the probability of obtaining an error value not greater than a given value. For example, in case of D * the probability of getting an error value not greater than D 0 is described as: where we assume D * = 2 and σ D * = 2/ √ M. An exemplary results for 'Lena' image with resolution 512 × 512, and image blocks of sizes 8 × 8 pixels, which correspond to N = 64 and M = 4096, are presented in Figure 8 for the following values of D 0 = {0.1, 0.25, 0.5, 0.75, 1.0, 1.5}. The obtained results are plotted as the cumulative probability density (CDF) function of the D * random variable. It should be noted that for images, unlike text data, changing one or even several bits of binary representation of pixel luminance does not mean that the image content will be unreadable. However, the SNR quality measure allows to assess the legibility of the image after decryption. Figure 9 shows how the legibility of 'Lena' image changes in the function of SNR measure. An analysis of results shows that the values of SNR measure close to −2 dB guarantee good hiding of image content. On the basis of formula (32) the probability of obtaining SNR values not lower than −2 dB (i.e., for D 0 ≈ 1.6) can be determined. This probability will be around 0.18 × 10 −40 . The probabilities for the remaining values of SNR are shown in Figure 8.  In case of encryption algorithms the algorithm having good properties is the one that produces an even distribution of symbols representing the data after encryption [34]. The even distribution corresponds to the same frequency or probability of occurrence of particular symbols in the ciphertext (pixel luminance values). Since the considered method is based on matrix multiplication, that is, it is a linear technique, and the pixel values in the image obtained after encryption are formed as the weighted sums (with the elements of the encryption matrix) of luminance values of input image pixels, then, according to the Central Limit Theorem, the resulting distribution will be a normal distribution (or will be close to such distribution). This is an inherent feature of linear methods. Figure 10 shows histograms for three sample images calculated before and after encryption using the examined method. The luminance probability distribution of the pixels after encryption looks in a statistical sense like normally distributed noise.

Maximum Deviation Index
The Maximum Deviation (MD) index allows to evaluate the efficiency of the encryption process in terms of the deviation calculated between the distributions of pixel luminance values in the input and encrypted images. In order to do this, first the histograms of both images should be calculated, that is, H I (i) for the input image and H C (i) for the encrypted image, where i = 0, 1, . . . , 255. Then the value of the deviation index can be determined on the basis of the following formula: where H D (i) = |H I (i) − H C (i)|. The greater the value of η MD , the more the encrypted image differs from the input. Table 1 contains MD index values obtained for three sample images ('Boat', 'Lena' and 'Peppers') with four encryption algorithms, that is, method from paper [9], the considered method, and two popular symmetric block ciphers, that is, Advanced Encryption Standard (AES) and Data Encryption Standard (DES) (see [35]). In case of block ciphers, we used the Cipher Block Chaining (CBC) scheme, which consists in sequential binding of encrypted blocks, and allows to avoid the repetition of some patterns in the encrypted image. The obtained results are comparable in terms of an order of magnitude. The results obtained with method from [9] and the considered method are very close. For images 'Boat' and 'Lena' the method from paper [9] allows to obtain better results, while for 'Peppers' image the considered method generates the highest value of η MD index. The results obtained with AES and DES ciphers are very similar. It is possible to indicate images for which block ciphers give higher values of η MD index than the examined method ('Boat', 'Lena'). However, we can also indicate images for which the opposite relationship holds ('Peppers'). It should be noted that η MD index allows to assess the quality of the encryption method based only the notion of the pixel luminance distribution. Table 1. The values of η MD index for sample images ('Boat', 'Lena', 'Peppers') and various encryption methods (considered, AES and DES block ciphers in the CBC scheme, method from paper [9]).

Image
Method [

Correlation Coefficient Index
By the Correlation Coefficient (CC) index η CC , we understand the correlation calculated between the input and the encrypted image. Obviously η CC ∈ [−1, 1]. From the viewpoint of encryption task the desired value of η CC is zero, which corresponds to the lack of statistical similarity between both images. Let U with dimensions W × H represent the input image and V the encrypted image. Then the η CC index can be determined on the basis of the following formula: where u ij and v ij for i = 1, 2, . . . , W, j = 1, 2, . . . , H are the pixel luminance values of U and V images, while µ(Z) and σ(Z) are the mean value and the standard deviation of the pixel luminance values, respectively, that is,: in case of the mean value and for standard deviation: The results of the experimental measurements of µ CC index for sample images ('Boat', 'Lena' and 'Peppers') obtained for the method from paper [9], the considered encryption method, and both AES and DES block ciphers, are presented in the Table 2. Based on the analysis of results, we can conclude that µ CC index values for the considered method are close to zero. The method proposed in [9] allowed to obtain a better result for 'Boat' image. The results obtained for block ciphers are on average an one order of magnitude lower than the results obtained with the methods dedicated to image encryption.

Irregular Deviation Index
The Irregular Deviation (ID) index is based on measuring the deviation of pixel values in encrypted image relative to pixel values in an input image. Let matrices U and V (W × H element matrices) describe pixel luminances in input and encrypted images, respectively. In order to determine the value of the ID index (µ ID ), first we have to calculate the matrix T, which holds the modules of differences between the elements of matrices U and V, that is, t ij = u ij − v ij for i = 1, 2, . . . , W and j = 1, 2, . . . , H. Then, it is required to determine the H T histogram of occurrences of individual elements in matrix T. The next step is to count the average number of pixels, which for each value of luminance differed from the input value. It can be done on the basis of H T histogram using the formula: The smaller the value of µ ID index, the greater the efficiency of the encryption method. Table 3 contains sample values of µ ID index calculated for exemplary natural images ('Boat', 'Lena', 'Peppers') and method from paper [9], the considered encryption method, and AES and DES block ciphers. The analysis of the experimental results shows that the considered encryption method can be characterized by higher values of the µ ID index, on average by 29%, compared to the results for AES and DES block ciphers. In turn, the results obtained with block ciphers are comparable. The method proposed in paper [9] gives results close to the results obtained with the considered method, and better for 'Lena' and 'Peppers' images.

Experimental Results in Efficiency of the Compression Process
In order to verify the effectiveness of compression a series of tests were performed on natural images with different statistical characteristics. In Table 4 and in the graphs in Figure 11, the results for two standard, representative test images are shown, that is, the 'Lena' image (1a) and the 'Mandrill' image (2a) in Figure 12. Both images are 8-bit grayscale with resolutions of 512 × 512 pixels. The images were compressed with the standard JPEG method and, for comparison, encrypted and compressed with the analyzed scheme. At the compression stage of the considered coding scheme the standard JPEG algorithm was applied in two variants. In the first one, it was utilized without any modification to its original form and operated on ciphertext previously projected, by truncation towards zero, to their respective integers. Such projection corresponds to the situation in which the user saves the encrypted image to one of the standard graphics formats, for example, BMP format, before it is actually compressed. In the second variant the projection was not performed and the input of the JPEG compressor was supplied with real values obtained after encryption of the input image. This required only a slight modification of the JPEG method, by allowing the input of the compressor to be fed with real samples instead of integers. Moreover, in order to examine the influence of the choice of dimensions K of encryption matrices C i , for each of the two described variants of the examined method, the effectiveness of the compression process was evaluated for three different values of K, namely for K = 8, 16 and 32. For all the described tests the compression levels were regulated by appropriate modification [36], p. 122 of the standard quantization table, given in (21). The PSNR was used as the image reconstruction quality measure, calculated as: where W = 512 stands the vertical and horizontal resolutions of images, U ij andÛ ij , i, j = 1, . . . , W stand for the original and reconstructed images, respectively, for each of the tested methods. Results of all the experiments are shown in Table 4.

Summary and Conclusions
The paper analyzes an image encryption scheme that preserves input data statistics and can be used in conjunction with a popular JPEG image compression standard. In this way the considered encryption method together with JPEG standard constitute a highly efficient realization of encrypt-then-compress image processing framework. Moreover, thanks to the use of fast parametric linear transforms the presented scheme is computationally efficient and does not alter the statistical characteristics of input images what allows to preserve the effectiveness of the compression process. The obtained results indicate that for a wide range of compression ratios the effectiveness of compression process, understood in the terms of the quality of reconstructed images evaluated with PSNR measure, in case of compressing encrypted images is comparable to the respective effectiveness of the standard JPEG method alone, that is, applied without the encryption stage. It should be noted that the maximum quality differences of the reconstructed images in relation to the standard JPEG algorithm (for the practical case with K = 8, the compression ratio ≈5 and the use of the unmodified JPEG method, with the encrypted image data saved to an intermediate graphics file, for example, BMP format file) were 1.31 dB and 0.53 dB for the test images 'Lena' and 'Mandrill', respectively, what can be considered to be a highly satisfactory result, consistent with theoretical predictions presented in Section 4 and confirmed by visual comparisons depicted in Figure 12.
This paper also presents the detailed analysis of the efficiency of the encryption step using common approaches, that is, histogram analysis, Maximum Deviation (MD), Correlation Coefficient (CC), Irregular Deviation (ID), or combinatorial and statistical analysis. The analyzed method is also compared in terms of the mentioned encryption efficiency measures (i.e., MD, CC and ID) to the method from paper [9], and symmetric block ciphers DES and AES. The choice of the method from [9] is due to its comparable performance in compression of encrypted images, and to the fact that the method is dedicated for encryption and compression of natural images. The analysis of experimental results shows that the efficiencies of the encryption steps of the analyzed method and the method from paper [9] are very close. Moreover, the results obtained for the considered method with MD, CC and ID indexes are comparable to those for symmetric block ciphers DES and AES, although in most cases worse, except for the MD index in the case of the 'Peppers' image. Furthermore, the histogram analysis shows that the pixel intensity distribution of encrypted images is approximately normal. The statistical analysis of the encryption stage indicates high combinatorial complexity of the examined method, since the number of bits of the private key is linear-logarithmically dependent on the block size of the encryption transform matrix, what for the block size N = 64, number of blocks M = 4096 and the number of 4 bits per single key parameter gives a total length of a key L K (64, 4096, 4) = 184,320 bits. This also guarantees a low probability of guessing a private key, which makes possible to decrypt images with a PSNR value greater than the certain threshold allowing for recognition of decrypted images. For example, for PSNR not greater than −1.76 dB the mentioned probability is equal to 0.64 × 10 −57 .
The experimental results along with the presented theoretical analysis show that the considered scheme is highly efficient in terms of its encryption capabilities and compression quality, as well as in terms of ease of its application in conjunction with JPEG image compression standard. It should be noted that the analyzed encryption method preserves the image statistics in the form of the auto-correlation matrix. As so, it can be used in connection with any image compression method based on the form of the auto-correlation matrix.