Bit-Error Aware Lossless Image Compression with 2D-Layer- Block Coding

With IoT development, it becomes more popular that image data is transmitted via wireless communication systems. If bit errors occur during transmission, the recovered image will become useless. To solve this problem, a bit-error aware lossless image compression based on bi-level coding is proposed for gray image compression. But bi-level coding has not considered the inherent statistical correlation in 2D context region. To resolve this shortage, a novel variable-size 2D-block extraction and encoding method with built-in bi-level coding for color image is developed to decrease the entropy of information and improve the compression ratio. A lossless color transformation from RGB to the YCrCb color space is used for the decorrelation of color components. Particularly, the layer-extraction method is proposed to keep the Laplacian distribution of the data in 2D blocks which is suitable for bi-level coding. In addition, optimization of 2D-block start bits is used to improve the performance. To evaluate the performance of our proposed method, many experiments including the comparison with state-of-the-art methods, the effects with different color space, etc. are conducted. The comparison experiments under a bit-error environment show that the average compression rate of our method is better than bi-level, Jpeg2000, WebP, FLIF, and L3C (deep learning method) with hamming code. Also, our method achieves the same image quality with the bi-level method. Other experiments illustrate the positive effect of built-in bi-level encoding and encoding with zero-mean values, which can maintain high image quality. At last, the results of the decrease of entropy and the procedure of our method are given and discussed.


Introduction
With cloud computing and Internet of Things (IoT) development, the requirement for data transmission and storage is increasing. Fast and efficient compression of data plays a very important role in many applications. For instance, image data compression has been used in many areas such as medical, satellite remote sensing, and multimedia.
There are many methods to compress image data including prediction-based, transformation-based, and other methods such as fractal image compression and deep learning with Auto Encoder (AE) [1,2], Recurrent Neural Network (RNN), Convolutional Neural Network (CNN) [3], and Residual Neural Network (RestNet) [4]. The transformation-based method includes Discrete Cosine Transform (DCT), Karhunen-Loeve Transform (KLT), Hadamard transform, Slant transform, Haar transform, and singular value decomposition [5]. Usually, transformation-based or deep learning methods are used in lossy compression while prediction-based methods are used for lossless compression.
In some cases, lossless compression must be applied when data acquisition is expensive. For example, lossless image compression must be applied in aerial, medical, and space images [6,7]. In industry, many engineered lossless compression methods including Portable Network Graphics (PNG), WebP [8], and Free Lossless Image Format (FLIF) [9] are used. Also, some deep learning-based lossless compression methods [10][11][12] are early researched. As one classical method for lossless compression, the predictionbased method takes into account the difference between pixel values and their predicted values, which are generally smaller numbers than the pixel values themselves. Thus, each difference value needs a smaller number of bits to encode [13]. It mainly has three kinds of methods including context-based, least-square (LS)-based, and spatial structure-based. Among these methods, the method based on spatial structure with 2D context region is an effective solution to improve the compression ratio (CR) because of considering the inherent statistical correlation using blockbased methods such as quadtree-based block [14], reference block [15], template-matching [16], and hierarchical decomposition [17]. Quadtree-based block and hierarchical decomposition methods split image to many subimages. And the reference block method considers the phenomenon that a physical object is constructed from numbers of structure components. Inspired from these methods, splitting image into many blocks where each block has similar color is taken as an effective method used by this work.
With IoT development, it becomes more usual that image data is transmitted through wireless communication systems and lossless image compression is used to improve transmission throughput. However, if bit errors occur in a wireless noisy channel during transmission, the recovered image will be damaged or become useless. So, lossless image compression must resolve the problem and keep the recovered image be useful. Most methods including engineered lossless compression methods and deep learning based methods are not suitable to transmission in noise channel; to the best of our knowledge, fewer researches have worked on this case except our previous work [7]. By protecting the key information bits with error control coding, our work proposed a bit-aware lossless image compression based on bi-level coding for gray image as a one-dimensional signal. In the coding method, only the linear predictive bi-level block coding parameters are encoded using (7,4) Hamming codes and residue sequences are left as they are to improve the performance of compression rate (CR). One reason for the efficiency of bi-level coding is that it uses the sparsity property of data which required fewer encoding bits.
In this work, we will use bi-level coding [7] for natural images with red (R), green (G), and blue (B) components. As R, G, and B are highly correlated, a linear transformation is applied to map RGB to other color space and achieve better CR [17,18]. As discussed above, the spatial structurebased method with 2D context region is taken as an effective solution to improve CR. Therefore, image is split into many 2D blocks which has sparsity property and be suitable to be encoded with bi-level coding. Finally, a novel variable-size 2D-block extraction and encoding method with built-in bilevel coding is proposed to improve CR for color image and robust to bit-error environment. An important 2Dlayer-block extraction method is used to split the image to many 2D blocks with similar color and keep the Laplacian or Gaussian distribution of data in one 2D block, which has sparsity property.
The contributions of this paper are summarized as follows: (1) For color image compression, a lossless color transformation from RGB to the YCrCb color space is used for the decorrelation of color components. The prediction-based method is used to remove data correlation and produce residue sequence (2) To keep the data distribution with the sparsity property and be suitable for bi-level coding, a novel 2Dlayer-block extraction method is proposed to keep the Laplacian or Gaussian distribution of data in 2D blocks. Furthermore, by rearranging the order of data encoded, the extraction method can decrease the entropy of data and improve CR (3) A novel variable-size 2D-block encoding method with built-in bi-level is proposed to improve CR and robust to bit-error environment just as the bilevel coding method. The mean or min value in one 2D block and key information bits in built-in bi-level coding are protected with hamming code. So, the image can be recovered and useful The rest of this paper is organized as follows. In Section 2, related works on lossless compression are discussed. In      3 Journal of Sensors of contexts, like horizontal edge, vertical edge, or smooth area. Many static predictors can be found in [6,19]. Median edge detector (MED) used in LOCO-I uses only three causal pixels to determine a type of pixel area which is currently predicted [20]. LOCO-I is further improved and standardized as the JPEG-LS lossless compression algorithm, which has eight different predictive schemes including three onedimensional and four tow-dimensional predictors [21]. To detect edges, Gradient Adjusted Predictor (GAP) embedded in the CALIC algorithm uses local gradient estimation and three heuristic-defined thresholds [22]. Gradient edge detection (GED) predictor combines simplicity of MED and efficiency of GAP [23]. In [19], the prediction errors are encoded using codes adaptively selected from the modified Golomb-Rice code family. To enable processing of images with higher bit depths, a simple context-based entropy coder is presented [6].
LS-based optimization is proposed as an approach to accommodate varying statistics of coding images. To reduce computational complexity, edge-directed prediction (EDP) initiates the LS optimization process only when the prediction error is beyond a preselected threshold [24]. In [25],       Journal of Sensors the LS optimization is processed only when the coding pixel is around an edge or when the prediction error is large. And a switching coding scheme is further proposed that combines the advantages of both run-length and adaptive linear predictive coding [26]. Minimum Mean Square Error (MMSE) predictor uses least mean square principle to adapt k-order linear predictor coefficients for optimal prediction of the current pixel, from a fixed number of m causal neighbors [27]. The paper [28] presents a lossless coding method based on blending approach with a set of 20 blended predictors, such as recursive least squares (RLS) predictors and Context-Based Adaptive Linear Prediction (CoBALP+).
Although individual prediction is favored, the morphology of 2D context region would be destructed accordingly and inherent statistical correlation among the correlated region gets obscure. As an alternative, spatial structure has been considered to compensate the pixelwise prediction [29]. In [14], quadtree-based variable block-size partitioning is introduced into the adaptive prediction technique to remove spatial redundancy in a given image and the resulting prediction errors are encoded using context-adaptive arithmetic coding. Inspired by the success of prediction by partial matching (PPM) in sequential compression, the paper [30] introduces the probabilistic modeling of the encoding symbol based on

Color space and predictor information
Hamming protection data Channel-1 data Channel-2 data Channel-3 data   [15], superspatial structure prediction is proposed to find an optimal prediction of the structure components, e.g., edges, patterns, and textures, within the previously encoded image regions instead of the spatial causal neighborhood. The paper [17] presents a lossless color image compression algorithm based on the hierarchical prediction and context-adaptive arithmetic coding. By exploiting the decomposition and combinatorial structure of the local prediction task and making the conditional prediction with multiple maxmargin estimation in a correlated region, a structured set prediction model with max-margin Markov networks is proposed [29]. In [16], the image data is treated as an interleaved sequence generated by multiple sources and a new linear prediction technique combined with templatematching prediction and predictor blending method is proposed. Our method uses a variable-size 2D-block extraction and encoding method with built-in bi-level to improve the compression rate.

Engineered Lossless Compression
Algorithms. PNG remove redundancies from the RGB representation with autoregressive filters and then the deflate algorithm based on the LZ77 algorithm and Huffman coding is used for data compression. Lossless WebP compression uses many types of transformation including spatial transformation, color transformation, green subtraction transformation, color indexing transformation, and color cache coding and then performs the entropy coding which uses a variation of LZ77 Huffman coding [8]. FLIF use Adam7 interlacing and YCoCg interleaving to traverse the image and perform entropy coding with "meta-adaptive near-zero integer arithmetic coding" (MANIAC) based on context-adaptive binary arithmetic coding CABAC [9].

Deep
Learning-Based Lossless Compression. Huffman, arithmetic coding, and asymmetric numeral systems are the algorithms for implementing lossless compression, but they do not cater for latent variable models, so bits back with asymmetric numeral systems (BB-ANS) are proposed to solve the issue [10]. But BB-ANS become inefficient when the number of latent variables grows, to improve its performance on hierarchical latent variable models, Bit-Swap is proposed [11]. In contrast to these works focusing on smaller datasets, a fully parallel hierarchical probabilistic model (termed L3C) to enable practical compression on superresolution images [12].

Our Proposed Method
The proposed method is shown in Figure 1. In color image data, R, G, and B are highly correlated. So, their straightforward encoding is not efficient. Therefore, a linear transformation from RGB to the YCrCb color space is used for the   (1) and (2) in [32] is adopted in our algorithm. As [7,19] mentioned, the prediction residues have reduced amplitudes and are assumed to be statistically independent with an approximate Laplacian distribution. Therefore, a predictor in Figure 1 is employed to further remove data correlation in Y, Cb, and Cr channels, respectively. The predictor value of X p can be obtained with equation (3), where A p , B p , and C p are the pixel value and their location is illustrated in Figure 2. After the prediction step, variable-size 2D blocks are extracted and key information about the blocks are encoded with Hamming code. Finally, these 2D blocks are separately encoding with built-in bilevel coding to make use of the sparsity property of Laplacian distribution and achieve better signal quality and robust to bit errors [7]. Predictor

Journal of Sensors
The procedure of 2D-block extraction and encoding is further shown in Figure 3; the 2D-layer-block extraction method is used to keep the Laplacian or Gaussian distribution of data in Layer-1~n or 2D blocks, which have the sparsity property and are suitable for bi-level coding. n in Layern represents n bits required to encode in the extracted blocks, and the remaining data not belonging to any blocks is left to the next layer for extraction. The 2D-block encoding method with built-in bi-level is used to improve CR and keeps robust to bit errors. The built-in bi-level procedure split the 2D block into many one-dimension signals, and each signal is encoding separately. It is because the bilevel method has the maximum encoding length, which is normally the same as the width of image.

2D-Layer-Block Extraction Method
3.1.1. Principle of the Extraction Method. In the proposed algorithm, to keep the data distribution have the sparsity property and be suitable for bi-level coding, a novel 2Dlayer-block extraction method is proposed to keep the Laplacian or Gaussian distribution of data in 2D blocks. In addition, the extraction method can rearrange the order of data encoded and the entropy of data is decreased, so CR can be improved. The principle of the method is introduced as follows.
For encoding residues, if a two-dimension block, called 2D block, can be encoded with n bits per residue, all of these datum x in the block must be satisfied with condition shown in (4). Therefore, it is feasible to find these blocks for n = 1 bits, then n = 2, :: 8.
Let us consider all these datum x in the block governed by a probability density f ðxÞ, and the entropy is calculated by (6) [33].
By inserting (5) into (6), the entropy for a Gaussian distribution is expressed as Since the residue sequence with Gaussian distribution has maximum entropy, the following inequality holds in general.
According to (4), the fixed standard deviation σ is less than ð2 n − 1Þ/2 and (9) can be deducted when we assume μ = 0; L is the sample size in all of these datum x in one block. By substituting (9) to (8), equation (10) can be obtained. When blocks for n bits are found starting from 1 to 8, the entropy of these data in blocks is increased later and later according to (10). So, the entropy is decided by n and it is possible to improve the compression ratio with this method.
According to the discussion above, we assume μ = 0. After performing prediction and making it zero-mean by removing the average, many residue values are close to zero and the residues follow a Laplacian distribution as shown in Figure 4(a). That is, all the data in one of these encoding 2D blocks will satisfy (11). Note that the sample size L in the block is above a threshold value of th n and data in the block possess a Laplacian or Gaussian distribution approximately.
To proceed, if all of these 2D blocks with n = 1 bits are found, they will be extracted from residue data. The rest of the residue data consists of three portions. The first portion has values bigger than ð2 n − 1Þ/2, and the second portion has values smaller than −ð2 n − 1Þ/2, while the third portion contains data which size is smaller than th n . It is noted that after the residues ∈½−ð2 n − 1Þ/2, ð2 n − 1Þ/2 shown in Figure 4(a) are extracted, the rest of the residue data will nearly keep the Laplacian distribution. When the extraction is  (4) (5) (6) (7) (8) (9) (10) (11) (12) Field 9 Journal of Sensors repeated from n = 1 to 8, the Laplacian distribution of the remained residue data will change with decreasing the probability density around zeros as shown in Figure 4(b). In addition, the Laplacian or Gaussian distribution in these 2D blocks will be flattened as depicted in Figure 4(c) because of increasing value of ð2 n − 1Þ. In this paper, the procedure is called as layer extraction.

Procedure of the Extraction Method.
According to (11), 2D-layer blocks each having a sample size above the threshold of th n are extracted repeatedly. For example, in Figure 5, image residue data is given as a matrix 40 * 40 and many 2D blocks belonging A are extracted. The data which are not included in blocks are reshaped as a matrix M with the same height as the original residue image, while other remaining data are collected as an array of B. After the first layer is finished, matrix M is processed similarly in the next layer. With the extraction and matrix reshape operations, many edge values will be merged with other data and have less effect on compression [24]. The pseudocode of block extraction procedure is shown in Figure 6.

Built-In Bilevel Coding
3.2.1. Bilevel Coding. As most of these data in one of 2D blocks has a sparse distribution discussed above, a bi-level coding scheme proposed by our previous works [7,34] in Figure 7 can be applied.
Let p 0 as the probability of a data sample requiring more than N 1 bits and less or equal to N 0 bits to encode. Assuming that n b p 0 ≤ 0:3 [34], the average total length is expressed in the following: For a given 2D block for n bits, N 0 = n, the original total length is N 0 * n s . When bi-level block coding is applied, the compression ratio will be improved according to (12).     3) (4) (5) (6) (7) (8) (9) (10) (11) (12) Field

Key information
Zero-mean Positive integers    (12), for an 8-bit gray image data, n s is a constant, N 0 = 8. Given N 1 , p 0 , which can be estimated, optimal n b can be determined to achieve a minimum length of L ave . By taking the derivative of (12) and setting it equal to zero, the optimized block size x can be calculated by equation (13). So, the minimum N 1 satisfying n b p 0 ≤ 0:3 can be found through Figure 8. Finally, the "bits" value in Figure 7 can start from the minimum N 1 and the efficiency of 2D-block extraction will be improved.

12
Journal of Sensors 3.3. 2D-Block Encoding. Figured 9(a)-9(c) show the details of the encoding scheme. When one color image is given, three channels are separately encoded and the head information including the color space, predictor information, and their hamming coding in (a). In each channel, 2D blocks are extracted with extraction method layer by layer so encoding is implemented recursively layer by layer as well and each layer is encoded separately. In each layer of (b), head data and image data are separately encoded. (c) shows the encoding scheme of head data. Width of parent matrix * the height of image is the size of M, and length of B is the length of the remaining data in Figures 5 and 6. Every block has start position ðx, yÞ, its size ðw, hÞ, mean value of the data in block, the maximum bits used of each data in block, and the key information of built-in bi-level coding including N 0 , N 1 , n b , the number of block n s /n b , and bitstream of block type. Particularly, the mean value of data in block has two functions. One is used to improve the capability of robust to bit error because the mean value keeps the key information of one block. Another is used to ensure zero-mean of the block data, which is the feature in bi-level coding.
In the extraction of 2D blocks, an optimal threshold of sample size th n is given in Table 1. All these data in one layer will be split into many pieces with 512 samples, which is the same as the width of image. To evaluate the effect of 2Dblock encoding, built-in bi-level coding and color space, etc., experiments with different combinations are implemented. All the results are the average values from 10 runs. A bit-error rate (BER) is default set as 0.001. The bi-level method is applied for RGB color image [7].

Journal of Sensors
4.1. Comparison. In this experiment, our proposed method is compared with many state-of-the-art methods from Refs. [16,17], engineered lossless compression algorithms including PNG, Jpeg2000, WebP, FLIF, and deep learning-based lossless compression algorithm L3C. As all of these methods are not suitable to be applied in bit-error situation, these methods with hamming code (7,4) are supposed a solution robust to bit-error environment. The results are given in Tables 2 and 3 and Figure 10.
In Table 2, the results are taken from Refs. [15,16] and the best results of CR with hamming code are listed in the last second column. The average CR of our method is 1.31296 and better than 1.116592. In Table 3, the results of Jpeg2000, WebP, and FLIF are achieved through the compression tools including OpenJpeg, WebP from Google, and FLIF from Cloudinary. As a deep learning method, the result of L3C is achieved by using the neuron model trained with Open Images dataset to compress images. The average CR of our method is 1.682933 and better than others such as bi-level (1.66655), FLIF (1.429898), and L3C (1.396234). In addition, it is noticed that CR of L3C with images (5) and (9) from the CLIC dataset get the worst results, which are 1.098277 and 0.918406. One reason that leads to the result is that the neuron model used to compress is trained with Open Images dataset and L3C do not perform well on the images from a different dataset CLIC. Figure 10 shows the image quality assessment results. PSNR, SSIM and MSSSIM_Y can better reflect the situation of bit-error channel. Therefore, only these three assessment results are discussed in the late section.
According to the comparison results in Tables 2 and 3 and Figure 10, the compression ratio of our proposed method is higher than bi-level coding although 2D-block encoding requires more header bits to encode the information about the position and size of block. And similar image quality with bi-level coding is kept in Figure 10. The reason is that the 2D-layer-block extraction method rearranges the data order to decrease the entropy and the data distribution of one-layer blocks nearly keeps as Laplacian distribution which is suitable for bi-level coding as discussed before. In addition, the analysis will be further discussed in Section 4.6-4.7.

The Effect of Built-In Bilevel
Encoding. To investigate the advantage of the bi-level coding method, two experiments including "built-in bi-level" and "no bi-level" are implemented. The results are shown in Figures 11 and 12.
When the center of the Laplacian distribution is located at zero, bi-level coding can require less bits to encode. So, the built-in bi-level encoding method achieves the best compression ratio as shown in Figure 11. As bi-level coding is proposed for noisy channel [7], it gets higher PSNR, SSIM, and MSSSIM_Y and maintains a better image quality just as shown in Figure 12

Comparison between Zero-Mean and Positive Integers.
As we know, min value can also be as the key information to improve image quality just as mean value. In this experiment, positive integer values by removing min value are encoded. The results are shown in Figures 13 and 14.
As discussed in Section 4.2, the compression ratio with built-in bi-level coding is higher when zero-mean values are used. While without the bi-level coding used, the compression ratio with positive integer values by removing the min value is higher than with zero-mean values in  Figures 14(a) and 14(b), encoding with zero-mean values can achieve better PSNR and SSIM than with positive integer, which is consistent with Section 3.3. Figure 15 gives the reconstructed images in different methods. According to the results of image (9) at the last row, encoding with "positive integer" values shows worse image quality than others. But in Figure 14(c), the MSSSIM value of "positive integer" is higher than "zero-mean." Therefore, the MSSSIM result cannot be consistent with the real image quality in some cases.

Evaluation with Optimization of 2D-Block Start Bits.
In the 2D-block coding method, the excessive number of blocks can lead to a decrease of compression ratio. So, the experiment based on optimization is conducted and the results are given in Figure 16. It is observed that the optimization does work and improves the compression ratio. Figure 17, "RGB Direct" denotes that RGB image is directly encoded without a predictor while "RGB Predictor" designates that the RGB residue with a predictor is used. It shows that color space YCrCb in "2D block" performs better than color space RGB and "RGB Direct" is the worst, which validates that the difference between pixel values and their predictions is generally smaller numbers than the pixel values themselves [13].

Evaluation with Different Color Spaces and Predictors. In
4.6. The Decrease of Entropy. In Figure 18, the original entropy of gray image of (4) is 3.8453 while the average entropy with the 2D-block encoding method is 3.772696. It is indicated that the compression ratio is improved and less than the value according to the information theory, which is coincident with the principle in Section 3.1. Figure 19 shows all the images with 2D blocks and the data distribution of remained data or all the data in blocks from gray image of (4) without the optimization of 2D-block start bits. The images indicated reshape operator has changed the distribution of edge cared about by many predictors [24]. The remained data distribution is close to the Laplacian distribution shown in the left histogram, and the data distributions in blocks are close to Gaussian or Laplacian distribution shown in the right histogram. All of these are coincident with the analysis in Section 3.1.

Discussion.
Through these experiments, the results of comparison proved that our method performs better than state-of-the-art methods, engineering lossless compression algorithms and deep learning methods under bit-error situation. There are four main reasons.
First, the 2D block extraction method extracts the data encoded with smaller bits layer by layer; thus, the entropy is decreased as Section 4.6 show.
Second, the edge data always cause poor compression rate but the 2D-block extraction method has changed the edge data distribution. And the data distribution of each layer block nearly keeps as Laplacian distribution which is suitable for bi-level coding as Figure 19 of Section 4.7.
Third, built-in bi-level coding with zero-mean value can preserve high image quality under bit-error environment as Section 4.2 and 4.3 discussed.
At last, optimization of 2D-block start bits and color space used in "2D block" is an important mechanism to improve the compression rate as the discussion in Sections 4.4 and 4.5.

Conclusions
When image data is transferred through wireless communication systems, bit errors may occur and will cause corruption of image data. To reducing the bit-error effect, a bitaware lossless image compression algorithm based on bilevel coding can be applied. But bi-level coding is one of the one-dimension coding methods and has not considered the inherent statistical correlation in 2D context region. So, to resolve this shortage, a novel 2D-layer-block extraction and encoding method with built-in bi-level coding is proposed to improve the compression ratio. With the layer extraction method, the data distribution is close to the Laplacian distribution after each layer extraction, which is suitable for bi-level coding. For color image, a lossless color transformation from RGB to the YCrCb color space is used for the decorrelation of color components. Through experiments, it is demonstrated that our proposed method obtains the better lossless compression ratio and keeps the same image quality with the bi-level method under noise transmission channel. Although it is not as efficient when compared to state-of-the-art methods in terms of lossless compression ratio sometimes, it is more robust to bit errors caused by noisy channel. Furthermore, after applying the feed-forward error control scheme, different predictor, and coding method, we can achieve better compression efficiency, since the bi-level block coder requires a smaller number of bits by the bit-error protection algorithm than the amount required by the entropy coder. Also, it is noted that deep learning methods are trained with Open Images dataset but perform poor on the images from a different dataset. Therefore, the generalization ability of deep learning methods is required to be improved in the future.

Data Availability
The [CLIC mobile dataset], [Open Images] and other classic images used to support the findings of this study are included within the article.

Ethical Approval
If the images or other third-party material in this article are not included in the article's Creative Commons license, it is required to obtain permission directly from the copyright holder.