VLSI Design Based on Block Truncation Coding for Real-Time Color Image Compression for IoT

It has always been a major issue for a hospital to acquire real-time information about a patient in emergency situations. Because of this, this research presents a novel high-compression-ratio and real-time-process image compression very-large-scale integration (VLSI) design for image sensors in the Internet of Things (IoT). The design consists of a YEF transform, color sampling, block truncation coding (BTC), threshold optimization, sub-sampling, prediction, quantization, and Golomb–Rice coding. By using machine learning, different BTC parameters are trained to achieve the optimal solution given the parameters. Two optimal reconstruction values and bitmaps for each 4 × 4 block are achieved. An image is divided into 4 × 4 blocks by BTC for numerical conversion and removing inter-pixel redundancy. The sub-sampling, prediction, and quantization steps are performed to reduce redundant information. Finally, the value with a high probability will be coded using Golomb–Rice coding. The proposed algorithm has a higher compression ratio than traditional BTC-based image compression algorithms. Moreover, this research also proposes a real-time image compression chip design based on low-complexity and pipelined architecture by using TSMC 0.18 μm CMOS technology. The operating frequency of the chip can achieve 100 MHz. The core area and the number of logic gates are 598,880 μm2 and 56.3 K, respectively. In addition, this design achieves 50 frames per second, which is suitable for real-time CMOS image sensor compression.


Introduction
In recent years, people have obtained various information from digital images and videos, whether in the fields of entertainment, education, medicine, and traffic monitoring. For example, in the medical field, the problem of a high patient ratio is very difficult. Overcrowding in emergency departments is a serious global healthcare issue [1]. It has always been difficult for hospitals to obtain real-time information on their patients' critical circumstances. Emerging Internet of Things (IoT) frameworks enable us to create tiny devices capable of processing, sensing, and communicating [2,3]. Signal processing and signal collection frequently use image sensors. Image sensor arrays, including CMOS image sensors for visible light and the thermal imaging sensor, have been used more and more in a variety of applications due to their constant performance improvement and cost where the low reconstruction value is represented by x l , the high reconstruction value is represented by x h , and the number of pixels above average is represented by q. The average value and standard deviation of each 4 × 4 block are represented by x and σ, respectively. In variable-length coding, the number of data occurrences can determine the code length. Data with a high probability of occurrence will be encoded into a shorter length. By contrast with a high probability of occurrence, data will be encoded into a longer length. The variable-length coding length is shorter than that of fixed-length encoding, which improves the compression rate. One of the most famous is the Huffman coding [18], which is widely cited in image compression technology. Huffman coding creates the shortest code based on a binary tree. The Golomb code [19] was invented by Solomon W. Golomb in 1960. It is another variable-length encoding that uses an adjustable parameter M to divide the input data into quotient and remainder. Although Huffman coding has a higher compression rate, it consumes a lot of time in calculations because it needs to calculate the entire image and count the probability before encoding. In addition, Huffman coding needs to store the code information for comparison during encoding and decoding. As a result, the hardware design requires additional memory as a code comparison table, which increases the area and cost. Chen et al. [20] proposed a chip design for lossless image compression for wireless capsule endoscopy. This proposal selects Golomb-Rice coding to reduce area and real-time processes in hardware design. The study presented ultimately aims to improve image compression technology and make contributions to IoT devices. And this allows on-chip integration with image sensors to fulfill the requirement of highspeed applications. For example, WNSs tend to transmit data information in real-time [21]. These are the innovations of this methodology:

1.
Adaptive threshold is a new feature added by this research to image compression technology. Different BTC parameters are trained using machine learning to acquire the best possible parameter solution.

2.
This study suggests a cutting-edge BTC with 4 × 4 block construction for numerical conversion technology, effectively eliminate pixel redundancy to reduce duplicate information and utilize quantization, prediction, and subsampling. It demonstrates the simplicity and efficiency of the suggested approach. Additionally, the suggested approach outperforms the conventional 4 × 4 block image compression technique based on BTC in terms of compression ratio. The figure of merit (FOM) has increased by roughly 33.79%, reaching 334.6391.

3.
The innovation of this work is to realize a real-time image compression chip design based on low-complexity and pipelined architecture by using TSMC 0.18 µm CMOS technology. It can perform three-stage compression work at the same time. The final compression rate of the circuit architecture proposed in this proposal is as high as 50 frames per second. Experiments prove that it can be applied to real-time CMOS image sensor compression.
The layout of this study is as follows: The Materials and Methods for the compression method are presented in Section 2. The assessment techniques and experimental findings of compression technology and hardware use are mostly described and examined in Section 3. The study's conclusions are discussed in Section 4. In Section 5, conclusions and outlooks are presented. This proposal aims to address the issue of large-scale transmission of image sensors in IoT by using BTC and Golomb-Rice coding to conduct image compression with high compression rate and low complexity, and to achieve high performance through pipeline circuit design. Figure 1 depicts the flowchart of the compression method developed in this study. First, the original image in RGB color space is converted to YEF color space and is sampled in 4:2:0 format to the E and F color spaces afterward. Second, in order to determine the two ideal reconstruction values and bitmap for each 4 × 4 block, BTC training is then carried out. Third, the E and F color spaces are sampled in a 4:2:2 format. Fourth, the reconstructed values are subtracted to obtain the predicted symbol. Finally, Golomb-Rice coding is quantized based on the quantization table.

The Proposed Lossy Image Compression Algorithm
pression with high compression rate and low complexity, and to achieve high performance through pipeline circuit design. Figure 1 depicts the flowchart of the compression method developed in this study. First, the original image in RGB color space is converted to YEF color space and is sampled in 4:2:0 format to the E and F color spaces afterward. Second, in order to determine the two ideal reconstruction values and bitmap for each 4 × 4 block, BTC training is then carried out. Third, the E and F color spaces are sampled in a 4:2:2 format. Fourth, the reconstructed values are subtracted to obtain the predicted symbol. Finally, Golomb-Rice coding is quantized based on the quantization table.

YEF Color Space Sampling
According to the literature review, changes in luminance are easier to observe by the human visual system than changes in chrominance. Image processing benefits from converting an image from its original RGB color space to luminance and chrominance space. The conversion formula of the color space from RGB to YEF is shown in Equation (3).
The conversion formula of the YEF color space is relatively simple to divide the image into luminance and chrominance. It only needs some adders and shifters, which can make the hardware design easier to implement. It can also reduce the cost of the circuit area and the timing. For this reason, YEF color space is more suitable for this proposed image compression.
The distribution of pixels in the RGB color space is shown in Figure 2. Figure 3 shows the pixel distribution in the YEF color space. The range of pixel change in the RGB color space is absolutely wider than that of the YEF color space, because these three-color planes store the information of the chrominance and luminance. By contrast, the values of the adjacent pixels on the E and F spaces are concentrated and smooth. This indicates that color sampling, fused in E and F spaces, can be employed to save data storage. In addition, during the decompression process, nearby pixels may readily compute the loss value of color sampling. The Y space stores the luminance strength, which is important information about an image, thus it will not be sampled to avoid degradation of image quality.

YEF Color Space Sampling
According to the literature review, changes in luminance are easier to observe by the human visual system than changes in chrominance. Image processing benefits from converting an image from its original RGB color space to luminance and chrominance space. The conversion formula of the color space from RGB to YEF is shown in Equation (3).
The conversion formula of the YEF color space is relatively simple to divide the image into luminance and chrominance. It only needs some adders and shifters, which can make the hardware design easier to implement. It can also reduce the cost of the circuit area and the timing. For this reason, YEF color space is more suitable for this proposed image compression.
The distribution of pixels in the RGB color space is shown in Figure 2. Figure 3 shows the pixel distribution in the YEF color space. The range of pixel change in the RGB color space is absolutely wider than that of the YEF color space, because these three-color planes store the information of the chrominance and luminance. By contrast, the values of the adjacent pixels on the E and F spaces are concentrated and smooth. This indicates that color sampling, fused in E and F spaces, can be employed to save data storage. In addition, during the decompression process, nearby pixels may readily compute the loss value of color sampling. The Y space stores the luminance strength, which is important information about an image, thus it will not be sampled to avoid degradation of image quality.    The proposed study selects the 4:2:0 sampling format to sample E and F which can achieve a higher compression rate. The 4:2:0 sampling is to disca vertical and horizontal pixels of the image. Hence, there are many different 4 positions. For different sampling positions, the image quality results are n One is to consider whether E and F are sampled in the same position, and whether E and F are sampled in the same vertical direction. This study ta sampling positions as shown in Figure 4.

BTC Algorithm and Threshold Optimization
The BTC algorithm [21] has a lower complexity than the traditional BT algorithm, which can save hardware costs. In view of this, this research pro advanced threshold optimization, the BTC algorithm. This method not only image compression rate, but also improves the image quality. A bitmap is mation. The threshold in the traditional BTC algorithm is the average of th lized. The conversion formulas are shown in Equations (4) and (5): where the current pixel value is X(i,j),and BM(i,j) is the result of binary from The proposed study selects the 4:2:0 sampling format to sample E and F color space, which can achieve a higher compression rate. The 4:2:0 sampling is to discard half of the vertical and horizontal pixels of the image. Hence, there are many different 4:2:0 sampling positions. For different sampling positions, the image quality results are not the same. One is to consider whether E and F are sampled in the same position, and the other is whether E and F are sampled in the same vertical direction. This study takes the 4:2:0 sampling positions as shown in Figure 4.  The proposed study selects the 4:2:0 sampling format to sample E and F color space, which can achieve a higher compression rate. The 4:2:0 sampling is to discard half of the vertical and horizontal pixels of the image. Hence, there are many different 4:2:0 sampling positions. For different sampling positions, the image quality results are not the same. One is to consider whether E and F are sampled in the same position, and the other is whether E and F are sampled in the same vertical direction. This study takes the 4:2:0 sampling positions as shown in Figure 4.

BTC Algorithm and Threshold Optimization
The BTC algorithm [21] has a lower complexity than the traditional BTC or AMBTC algorithm, which can save hardware costs. In view of this, this research proposes a more advanced threshold optimization, the BTC algorithm. This method not only maintain the image compression rate, but also improves the image quality. A bitmap is binary information. The threshold in the traditional BTC algorithm is the average of the blocks utilized. The conversion formulas are shown in Equations (4) and (5): where the current pixel value is X(i,j),and BM(i,j) is the result of binary from X(i,j).

BTC Algorithm and Threshold Optimization
The BTC algorithm [21] has a lower complexity than the traditional BTC or AMBTC algorithm, which can save hardware costs. In view of this, this research proposes a more advanced threshold optimization, the BTC algorithm. This method not only maintain the image compression rate, but also improves the image quality. A bitmap is binary information. The threshold in the traditional BTC algorithm is the average of the blocks utilized. The conversion formulas are shown in Equations (4) and (5): where the current pixel value is X(i,j),and BM(i,j) is the result of binary from X(i,j).
where the decoded pixel value is R(i,j). The low and high reconstruction values are a and b, respectively. When the current pixel value is greater than the average value, and the distance between the reconstructed value a and the current pixel value is less than the reconstructed value b, the current pixel value is considered to be the high reconstructed value b. However, the original pixel is closer to the low reconstructed value a. Because the decoded value of the pixel depends on the threshold, selecting an appropriate threshold for decompression is important. This method can effectively improve the PSNR of the image without affecting the compression rate. Therefore, the improved formula based on the research in [22] is shown in Equation (6):

BTC Training
The two reconstruction value procedures of BTC employed in this research are shown in Equations (7) and (8). The calculation of threshold optimization complexity is reduced, but without the standard deviation, it cannot maintain the high performance of image quality. In order to improve this problem, the variable parameters α1 and α2 will enable the algorithm to calculate the best reconstruction value to compensate for the image quality. The machine learning adjusts and trains the parameters α1 and α2, which are variable values from 0 to 1. Each cycle will increase by 0.25 and the current two reconstruction values are calculated for each iteration.

Subsampling
Because the E and F color spaces are more concentrated on the adjacent pixel change, the reconstruction value of each 4 × 4 block after BTC training will be very similar. Therefore, sampling the reconstruction value in the E and F color spaces can effectively increase the compression rate. The sampling format adopts the 4:2:2 format as shown in Figure 5, and it is possible to only use half of the original data for the color spaces of E and F. Considering the real-time implementation of the hardware design, the 4 × 4 block bitmap that are discarded during the up-sampling is equal to the previous 4 × 4 block bitmap. The discarded reconstructed value will be replaced by the average of the adjacent values.
Sensors 2023, 23, x FOR PEER REVIEW 7 of 17 difference value will be quantized into a smaller range of encoding and decoding as shown in Table 1.

Prediction and Quantization
The prediction method used in this subsection is shown in Equation (9), which uses the difference between the two adjacent values as a prediction. Compared with the prediction Sensors 2023, 23, 1573 7 of 18 method in the research in [23], it has a better performance of image quality since the prediction method used in this research will cause the edge pixels of the image to be discontinuous during decompression. However, the prediction method just needs the previous and current pixel values. It does not need to store the entire data for prediction.
In order to make the encoding process beneficial, the range of the difference value −255~255 is quantized into 0~31, thereby shortening the code length during encoding and further improving the compression rate. The difference is usually small because of the characteristics of the similar distribution of adjacent pixels. This means that the probability of a small difference value is higher than a large difference value. Therefore, the smaller difference value will be quantized into a smaller range of encoding and decoding as shown in Table 1.

Golomb-Rice Coding
Golomb-Rice coding does not need to calculate the entire image, nor does it need an encoding table. The number of bits about the quotient and remainder is determined by the M value. In order to realize hardware circuits, Robert F. Rice proposed the use of Rice coding to limit the value of M to the power of 2. The detailed encoding steps are as follows: Step 1: set the parameter M, which can determine the bit number of the remainder as shown in Equation (10).
Step 2: calculate the quotient q and the remainder r as shown in Equations (11) and (12).
Step 3: add the q numbers of 0 s.
Step 4: add the truncation code 1.
Step 5: add the remainder r. b = M where b is required for the remainder, while q and r are quotient and remainder, respectively. If the M value is smaller, the number of bits in the remainder will decrease but the quotient will increase. Conversely, if it is larger, the number of bits in the remainder will increase but, in the quotient, it will decrease. Here are three different values of M for testing as shown in Table 2. Table 3 is the encoding result of Golomb-Rice coding with M = 4.

The Proposed Lossy Image Compression Hardware Design
Real-time processing is the primary goal of the lossy image compression hardware design proposed in this research. The VLSI design for the proposed image compression technique is presented in Figure 6. The RGB to YEF conversion module, RAM, parameter calculator, BTC training, prediction, Golomb-Rice coder, and packer are all included in the hardware design.

The Proposed Lossy Image Compression Hardware Design
Real-time processing is the primary goal of the lossy image compression hardware design proposed in this research. The VLSI design for the proposed image compression technique is presented in Figure 6. The RGB to YEF conversion module, RAM, parameter calculator, BTC training, prediction, Golomb-Rice coder, and packer are all included in the hardware design. The parameter calculator architecture shown in Figure 7 is used to calculate the minimum, maximum, and average values for each block because the BTC algorithm used in this proposal is based on a 4 × 4 block size. In order to calculate the MAE in the BTC training module, there is a need to store the input values in the line buffer and shift them with each clock. The BTC training module generates the two best reconstructions and bitmap of each block. The prediction module determines the difference in pixel values between the actual and predicted values and quantizes it to conserve the number of bits. Finally, the Golomb-Rice coder module and the packer module pack bit streams once they have been encoded. The packer module consists of three-line buffers and a multiplexer. First, it outputs the Bitmap; second, it encodes the reconstruction value a; and finally, it encodes the reconstruction value b.

BTC and Golomb-Rice Coding
The BTC training is the most important part of this proposal and the overall hardware architecture. This module finds the best reconstruction value and bitmap for each block. Block diagrams for BTC and Golomb-Rice encoding are shown in Figure 8, including all calculations of the training in parallel time, replacing the original periodic training to improve the throughput and achieve real-time hardware design.

BTC and Golomb-Rice Coding
The BTC training is the most important part of this proposal and the overall hardware architecture. This module finds the best reconstruction value and bitmap for each block. Block diagrams for BTC and Golomb-Rice encoding are shown in Figure 8, including all calculations of the training in parallel time, replacing the original periodic training to improve the throughput and achieve real-time hardware design.

BTC and Golomb-Rice Coding
The BTC training is the most important part of this proposal and the overall hardware architecture. This module finds the best reconstruction value and bitmap for each block. Block diagrams for BTC and Golomb-Rice encoding are shown in Figure 8, including all calculations of the training in parallel time, replacing the original periodic training to improve the throughput and achieve real-time hardware design. According to formulas (7) and (8), the hardware circuit will be implemented in the reconstruction calculator module. Floating-point calculation of parameters can use shifters and adders to reduce the circuit area. In addition, there is also a line buffer to store the previous pixel value. The bitmap generator calculates the MAE and simultaneously generates the threshold-optimized bitmap. This design of a bitmap generator circuit architecture calculates 25 terms of reconstruction values in parallel, which reduces the processing time. Reconstruction values a0 and b0 are used as an example as shown in Figure 9. The circuit of "Compare" finds the smallest MAE value, which can determine the best reconstruction value and Bitmap. According to formulas (7) and (8), the hardware circuit will be implemented in the reconstruction calculator module. Floating-point calculation of parameters can use shifters and adders to reduce the circuit area. In addition, there is also a line buffer to store the previous pixel value. The bitmap generator calculates the MAE and simultaneously generates the threshold-optimized bitmap. This design of a bitmap generator circuit architecture calculates 25 terms of reconstruction values in parallel, which reduces the processing time. Reconstruction values a0 and b0 are used as an example as shown in Figure 9. The circuit of "Compare" finds the smallest MAE value, which can determine the best reconstruction value and Bitmap.

Prediction
The prediction module is composed of an adder, a subtractor, four registers, and a quantization table as shown in Figure 10. The register "pre" stores the previous reconstruction value, "diff" stores the difference between the current reconstruction value and the previous predicted reconstruction value, "temp" stores the prediction result of the previous reconstruction value, and "re" stores the prediction error after quantization. The

Prediction
The prediction module is composed of an adder, a subtractor, four registers, and a quantization table as shown in Figure 10. The register "pre" stores the previous reconstruction value, "diff " stores the difference between the current reconstruction value and the previous predicted reconstruction value, "temp" stores the prediction result of the previous reconstruction value, and "re" stores the prediction error after quantization. The quantized result is stored in the "out" register and outputs every 16 clocks. Figure 9. The architecture of the BTC training module. State 1 is calculating the different between current value, a0, and b0. State 2 is comparing the different from State 1. State 3 is calculating the MAE value and deciding the best reconstruction value and bitmap.

Prediction
The prediction module is composed of an adder, a subtractor, four registers, and a quantization table as shown in Figure 10. The register "pre" stores the previous reconstruction value, "diff" stores the difference between the current reconstruction value and the previous predicted reconstruction value, "temp" stores the prediction result of the previous reconstruction value, and "re" stores the prediction error after quantization. The quantized result is stored in the "out" register and outputs every 16 clocks.

Golomb-Rice Coder
The circuit architecture and FSM of the Golomb-Rice coder are illustrated in Figures  11 and 12. The coding parameter M is set to 4 in this study. The concept is to divide the input signal by four and add a one-bit truncation code and two-bit remainder where the lowest two input signal bits will be the output representing the remainder in the Golomb-Rice code.

Golomb-Rice Coder
The circuit architecture and FSM of the Golomb-Rice coder are illustrated in Figures 11 and 12. The coding parameter M is set to 4 in this study. The concept is to divide the input signal by four and add a one-bit truncation code and two-bit remainder where the lowest two input signal bits will be the output representing the remainder in the Golomb-Rice code.

Results
The quality of this approach and other algorithms may be measured and compared using the byte per pixel (BPP), peak signal-to-noise ratio (PSNR), and compression ratio (CR). The compression ratio (CR) is the ratio of the data before compression to the data after compression. The higher the compression ratio, the less pixel redundancy in the im-

Results
The quality of this approach and other algorithms may be measured and compared using the byte per pixel (BPP), peak signal-to-noise ratio (PSNR), and compression ratio (CR). The compression ratio (CR) is the ratio of the data before compression to the data

Results
The quality of this approach and other algorithms may be measured and compared using the byte per pixel (BPP), peak signal-to-noise ratio (PSNR), and compression ratio (CR). The compression ratio (CR) is the ratio of the data before compression to the data after compression. The higher the compression ratio, the less pixel redundancy in the image. This is computed using the formula in Equation (13).

Compression Ratio =
The size o f image be f ore compression The size o f image a f ter compression (13) In an uncompressed RGB image, any pixel is represented by 8 bits. After image compression, the total number of bits in the entire image is reduced and each pixel can be represented by fewer bits. When the BPP value is smaller, the compression rate is higher. This is another index that can define the compression rate which is represented using Equation (14): Total number o f bits a f ter compression Number o f pixels The PSNR is the maximum possible signal-to-mean square error ratio (MSE). It becomes harder to discern between original and decompressed distortion for higher PSNR. It is a measure of image quality and follows Equations (15) and (16): MSE is short for mean square error. The original pixel and the decompressed pixel are denoted by I (i,j) and K (i,j) , respectively, and MAX denotes the highest pixel value in the m × n size image. Figures 13 and 14 show the test images used in this study, which are widely used as test images for image compression techniques. Tables 4 and 5 provide the outcomes for CR, BPP, and PSNR. Table 4 displays the outcomes from utilizing Figure 13, which is the Kodak photo collection [24]. Table 4 presents the results of the analysis using four test images ( Figure 13).     Table 5. Resulting in CR, PSNR, and BPP using the test images in Figure 14.  Table 6 lists the image compression performance of this study compared with other BTC algorithms. In terms of CR and BPP, the algorithm outperforms the existing 4 × 4 BTC methods [25]. FOM is defined as an index of compression performance to objectively judge image quality. The concept combines the performance of compression rate and PSNR as shown in Formula (17).   Table 5. Resulting in CR, PSNR, and BPP using the test images in Figure 14.  Table 6 lists the image compression performance of this study compared with other BTC algorithms. In terms of CR and BPP, the algorithm outperforms the existing 4 × 4 BTC methods [25]. FOM is defined as an index of compression performance to objectively judge image quality. The concept combines the performance of compression rate and PSNR as shown in Formula (17). The designed chip performs very well in terms of throughput. It can compress up to 50 frames per second for full-high-definition (FHD) images. Table 7 lists the chip information simulated using the EDA tool. This study employs a pipelined chip architecture to achieve high-throughput compression operations. In addition, it also computes three-color planes in parallel. The revised hardware architecture is shown in Figure 15. The designed chip performs very well in terms of throughput. It can compress up to 50 Figure 15. Block diagram of the revised hardware architecture. State 1 is converting RGB color space to YEF color space. State 2 is calculating the max and min of Y. State 3 is calculating the value of reconstruction value and bitmap. State 4 is calculating the MAE which is helpful to decide the best reconstruction value. State 5 is quantizing the reconstruction value. State 6 is coding the reconstruction value with Golomb-rice.

Conclusions
Image transmission has become a significant burden because of the increasing volume of data. Compression techniques must be used to achieve faster data transmission Figure 15. Block diagram of the revised hardware architecture. State 1 is converting RGB color space to YEF color space. State 2 is calculating the max and min of Y. State 3 is calculating the value of reconstruction value and bitmap. State 4 is calculating the MAE which is helpful to decide the best reconstruction value. State 5 is quantizing the reconstruction value. State 6 is coding the reconstruction value with Golomb-rice.

Conclusions
Image transmission has become a significant burden because of the increasing volume of data. Compression techniques must be used to achieve faster data transmission while maintaining image quality. This research proposes a low-complexity, high-compressionratio image compression method and develops a real-time color image compression VLSI design. Color sampling technology is employed in the new BTC algorithm to reduce computing complexity and enhance compression ratio. The experimental findings demonstrate that utilizing machine learning to train several BTC parameters in order to acquire the best parameter solution yields good results. According to the comparison results, the suggested technique has the best compression performance and the highest throughput. The FOM was successfully improved by 33% as a result. In addition, the BTC algorithm now includes a threshold optimization mechanism to prevent image distortion. To lessen the cost of sampling redundant data and to address the difficult issue of enormous data transfer in the Internet of Things, using cutting-edge technologies can be expected for further improvements in the future. The readout circuit can boost the frame rate of high-speed image sensors or reduce power consumption. Large-scale image sensors will be demonstrated in the future, along with the improvement of effective sampling algorithms and further noise and power consumption reductions.