Low-Complexity Rate-Distortion Optimization of Sampling Rate and Bit-Depth for Compressed Sensing of Images

Compressed sensing (CS) offers a framework for image acquisition, which has excellent potential in image sampling and compression applications due to the sub-Nyquist sampling rate and low complexity. In engineering practices, the resulting CS samples are quantized by finite bits for transmission. In circumstances where the bit budget for image transmission is constrained, knowing how to choose the sampling rate and the number of bits per measurement (bit-depth) is essential for the quality of CS reconstruction. In this paper, we first present a bit-rate model that considers the compression performance of CS, quantification, and entropy coder. The bit-rate model reveals the relationship between bit rate, sampling rate, and bit-depth. Then, we propose a relative peak signal-to-noise ratio (PSNR) model for evaluating distortion, which reveals the relationship between relative PSNR, sampling rate, and bit-depth. Finally, the optimal sampling rate and bit-depth are determined based on the rate-distortion (RD) criteria with the bit-rate model and the relative PSNR model. The experimental results show that the actual bit rate obtained by the optimized sampling rate and bit-depth is very close to the target bit rate. Compared with the traditional CS coding method with a fixed sampling rate, the proposed method provides better rate-distortion performance, and the additional calculation amount amounts to less than 1%.


Introduction
Compressed sensing (CS), also known as compressive sensing or compressive sampling, shows that a small group of linear, non-adaptive measurements can reconstruct finite-dimensional signals with sparse or compressible representations [1][2][3][4][5][6]. By simultaneous sampling and data compression, a CS-based imaging system abandons the traditional architecture so that the encoder does not require too much time and hardware [7][8][9][10].
Many compressive sensing studies describe the constraints of the measurement budget, such as allocating sensing resources for regions of interest [11,12] and adaptive sampling for block compressive sensing [13,14]. However, real-valued CS measurements must be quantified in CS-based imaging systems, and there is a given bit-budget constraint rather than a measurement budget.
In a classical imaging system, the quantizer determines the bit rate and the distortion [15]. However, compressive sensing is different from that situation. The bit rate and distortion are determined by the sampling rate and bit-depth in the CS-based imaging system. Therefore, there is a tradeoff between the sampling rate and the bit-depth in CS-based imaging systems with bit-budget constraints [16].
In order to obtain a high-quality image, rate-distortion optimization (RDO) must be performed on the encoder to allocate the optimal sampling rate and bit-depth to minimize the distortion of the

Problem Formulation
Let x ∈ R N×1 represent the vector form of the image after raster scanning. Assume θ ∈ R N×1 is the coefficient of x in the orthogonal transform Ψ ∈ R N×N , that is x = Ψθ. When x can be approximately represented by only K of N non-zero coefficients, x is called a K-sparse signal. Natural images are usually sparse in discrete cosine transform (DCT) and discrete wavelet transforms [19,20]. The CS theory states that the sparse signal x can be accurately reconstructed through M(M < N) linear and non-adaptive measurements with an overwhelming probability. The measurement vector y ∈ R M×1 is obtained by the following: where Φ ∈ R M×N is the measurement matrix which should satisfy restricted isometry property (RIP) and incoherence property [1,6,21], and a Gaussian random matrix is often used [5]. When reconstructing the image signal x from the measurement vector y, y = Φx is an ill-posed problem with infinite solutions. In order to obtain a unique solution, the sparsity [1,5] of images is usually used as the prior condition to constrain the solution space of y = Φx. Some other prior conditions exist, such as total variational (TV) minimization [22] and non-local similarity [23], which are also considered as sparsity of an image in a particular transform domain. Moreover, a usual requirement is that the number of measurements meet at least M = O(K log(N)) to ensure high-quality reconstructed images. m = M N is often called the measurement rate or sampling rate.
In practice, real-valued CS measurements must be mapped to discrete bits by a quantizer. Therefore, the CS acquisition model with quantization [16] is as follows: where Q b : R → Q is a scalar quantization function of b-bit that maps real-valued measurements to discrete sets Q with |Q| = 2 b . Considering the low complexity requirements of CS encoders, the uniform scalar quantization method is often used [17,18]. In order to improve the compression performance, entropy coding is performed after the quantizer [18]. The encoded data is as follows: where f enc : Q → C is the encoding function that maps the quantized measurements to the binary codeword c enc ; arithmetic coding is used in this paper. After the image is compressed by CS measurement, quantizer, and entropy coder, the average number of bits per pixel (bpp) in the image can be expressed as follows: where R is called bit rate and L y Q represents the average codeword length after entropy coding of y Q . In practice, we are often constrained by a bit budget when transmitting or storing the compressed data. At this point, the sampling rate and bit-depth must be balanced [16]. On the one hand, we can increase the depth of the quantization bits by reducing the sampling rate, thereby improving the reconstruction quality. On the other hand, when we reduce the sampling rate, the reconstruction quality will decrease. How to allocate the sampling rate m and the quantization bit-depth b is expressed as an optimization problem based on the rate-distortion criterion and is given as the following: where D(x, m, b) represents the distortion for the image x with sampling rate m and bit-depth b, R(x, m, b) represents the bit rate for the image x with sampling rate m and bit-depth b, and R max represents the budget for the bit rate of the image compression data. Compressed sensing can significantly reduce the complexity of the encoder. When solving model (5), the computational complexity is very important for CS-based imaging systems. If the complexity of rate-distortion optimization is too high, this will run counter to our original intention of using CS coding. Because the calculation of the bit rate and distortion far exceeds the calculation of CS acquisition, we proposed a bit-rate model estimating R(x, m, b) and a relative PSNR model estimating distortion. Based on the sampling method in the adaptive compression video sampling framework of [17], we designed a CS-based image coding framework, as shown in Figure 1. The proposed CS framework contains two measurement processes. The first one is a partial measurement whose purpose is to extract the image features for the bit-rate model and relative PSNR model by using a small number of measurements. The second one is complementary CS measurement, which completes CS measurement according to the sampling rate obtained by the rate-distortion optimization.

Bit-Rate Model
According to Equation (4), the average codeword length Q y L of the entropy coder is the key to calculating the bit rate. The average codeword length can be approximated by information entropy [24]. However, the calculation of information entropy requires the use of all measured value information, which cannot be achieved before the sampling rate is determined. In order to calculate the information entropy of the measurements sampled by a sampling rate m , we proposed

Bit-Rate Model
According to Equation (4), the average codeword length L y Q of the entropy coder is the key to calculating the bit rate. The average codeword length can be approximated by information entropy [24]. However, the calculation of information entropy requires the use of all measured value information, which cannot be achieved before the sampling rate is determined. In order to calculate the information entropy of the measurements sampled by a sampling rate m, we proposed estimating the information entropy based on a small number of measured values from the first sampling. However, the information entropy is only the lower boundary of the average codeword length for entropy coding, and there are some errors between the real information entropy and the information entropy estimated by a few measurements. Therefore, we used the second-order Taylor expansion method to approximate the estimation model of information entropy, in which the model can be expressed as the additive model of each characteristic variable. Then, we modified the coefficients of the additive model by fitting off-line data, which can improve the estimation accuracy of the average codeword length L y Q .

Estimation of Information Entropy
When sampling with the Gaussian random matrix, the CS measurements obey the Gaussian distribution [25]. Moreover, the density function of the quantized CS measurements follows the distribution of the corresponding real-value CS measurements [26], that is, the quantized measurements also obey the Gaussian distribution, so the information entropy [27] of the quantized CS measurements can be estimated as follows: where V 0 is the variance of the quantized CS measurements. In order to facilitate the uniform quantization, the measurements are first scaled to the integer interval corresponding to the quantization bit-depth, and then the rounding operation is used. The uniform quantization function can be expressed as follows: where y i is a measurement, y max is the maximum element of the measurement vector y, and y min is the minimum element of the measurement vector y. We used the independent random variables ε that obey uniformly distributed U[−0.5, 0.5] to represent the rounding error in the uniform quantization function [28], then Q(y i ) ≈ y i −y min (y max −y min ) (2 b − 1) + ε. Let σ 2 m denote the variance of the measurements when the sampling rate is m and σ 2 ε denote the variance of the random variable ε, and σ 2 ε = 1 12 can be calculated. Assuming that the variable y i obeys a Gaussian distribution with a variance of σ 2 m , and the rounding error variable ε is independent of the variable y i , the variance of Q(y i ) can be expressed as follows: Combining Equations (6) and (8), the information entropy H y Q of the b-bit quantized measurements with sampling rate m is the following: According to Equation (9), in addition to the bit-depth b, the variance σ 2 m , the maximum measurement y max , and the minimum measurement y min are the keys to calculating H y Q . The measurements must be sampled according to the known sampling rate m, which means that the variance σ 2 m , maximum measurement y max , and minimum measurement y min of the measurements cannot be obtained before the sampling rate m is determined. Therefore, we proposed to perform the first sampling to obtain a small number of measurements before the rate-distortion optimization, and then to use this part of the measurements to extract the features that we needed. In statistical theory, the characteristics of a sample are often used to estimate the characteristics of the population. Since measurements of different sampling rates can be considered as different samples in the population of measurements, there is a close relationship between the characteristics of the different samples. In this paper, the characteristics of the first sampled measurements were used to estimate the characteristic of the measurement with the sampling rate m. We used the maximum measurement y max and the minimum measurement y min obtained by the first sampling to estimate the maximum measurement y max and the minimum measurement y min of the sampling rate m. When using the sample to estimate the population variance, there is an unbiased estimate of the variance [29]: where p i (i = 1, . . . , Num) is a sample, p is the mean of the samples, and Num is the number of samples.
According to the definition of variance, we know that s 2 = Num Num−1 σ 2 . Suppose σ 2 0 is the variance of the measurement with the sampling rate m 0 for the first sampling and σ 2 m is the variance of the measurement with the sampling rate m. We used s 2 of the first sampled measurement to estimate the s 2 of the measurement with the sampling rate m, that is: where M m = round(Nm) is the number of the measurement with sampling rate m and M 0 = round(Nm 0 ) is the number of the measurement with sampling rate m 0 . Let ∆ y 0 = y max − y min , combined with H y Q ≈ L y Q and Equations (6)-(11); we then obtain the following: 3.2. Simplified Model of Average Codeword Length L y Q Model (12) takes the variance as well as the maximum and minimum of the measurements obtained by the first sampling as the main features of estimating the average codeword length. However, there is a particular error between the information entropy and the average codeword length of the entropy coder. In this paper, we constructed an additive model of the average codeword length according to model (12) and solved the coefficients of the additive model by the least-squares method. The obtained additive model minimizes the mean squared error (MSE) between the estimated and actual values of the average codeword length, which improves the accuracy of the estimated average codeword length.
In approximating model (12), we performed a second-order Taylor expansion on functional form log 2 (a n i=1 x i + C) for the variable x i , then obtained an addition form as follows: where c i (i = 1, . . . , n + 1) is the second-order Taylor coefficient (see Appendix A).  (12) as variables; according to Equation (13), L y Q is approximated as follows: where c i (i = 1, . . . , 5) is the Taylor coefficient and can be obtained according to the appendix. The first item in Equation (14) can be expanded as follows: Due to the limited range of parameters, we approximated the logarithmic function in Equation (15) by the square root function, and approximated 1 2 Combining Equations (12), (14) and (15), we constructed an additive model of average codeword length as follows: where c 1 ∼ c 7 are model coefficients which are obtained by using the training dataset to fit model (16). Combining Equation (4) with model (16), we obtained the following bit-rate model: Model (17) has no logarithmic operation, and the maximum y max , the minimum y min , and the variance σ 2 0 of the first sampling can be used to estimate the bit rate at the sampling rate m and bit-depth b, which significantly reduces the computational complexity for estimating the bit rate.

Relative PSNR Model
As the objective function of the optimization problem (5), distortion is often measured by the error between the original image and the reconstructed image, such as the sum of absolute difference (SAD), mean squared error (MSE), and peak signal-to-noise ratio (PSNR). Due to the complexity of the CS reconstruction algorithm is much higher than the complexity of CS sampling, directly calculating the distortion loses the advantage of low complexity. Therefore, estimating the distortion ensures low complexity for the CS-based encoder. However, in addition to measurement, factors affecting the reconstructed image include the reconstruction algorithm and the degree to which the original image matches the prior constraints. The latter two factors cannot be described objectively, which makes it difficult to directly estimate the error between the original image and the reconstructed image. Since distortion is used to judge the quality of CS coding parameters, the best CS coding parameters can also be solved by the level of distortion. Therefore, we proposed relative peak signal-to-noise ratio (relative PSNR) instead of distortion as the objective function. Relative PSNR is used to measure the difference of PSNR between the reconstructed image and the original image with different parameters in the same image. Although the relative PSNR cannot represent the error between the original image and the reconstructed image, it can be used to evaluate the quality level of the reconstructed image under different parameters. The relative PSNR comes from the PSNR comparison of the same reconstruction algorithm, where we can abandon the impact of the reconstruction algorithm. Thus, the factors that estimate the relative PSNR come mainly from the sampling rate m, the bit-depth b, and the image.

Relative PSNR
The relative PSNR reflects the level of the peak signal-to-noise ratio, which can achieve the same effect as the PSNR for the optimization of sampling rate m and the bit-depth b. The peak signal-to-noise ratio is often used to evaluate the visual quality of the reconstructed image. The relative PSNR not only reflects the level of distortion but also demonstrates the quality of the decoded visual quality. Let f (b, m, x) = 10 × log 10 denote the PSNR between the original image x and the image x reconstructed by the measurements obtained by the parameter (b, m). Several relative peak signal-to-noise ratios can be constructed according to f (b, m, x), denoted as F(b, m, x), as in Equations (18)- (21).
where b 1 , m 1 and b 2 , m 2 are reference parameters for the relative PSNR and are known fixed values. It is easy to prove that the optimization result using F 1 F 2 F 3 F 4 is consistent with the optimization result using f for the parameters (b, m).

Relative PSNR Model with Feedforward Neural Network Learning
In order to accurately reveal the mapping model between relative PSNR, sampling rate m, and bit-depth b, we used a four-layered feedforward neural network to train the map between relative PSNR and its factors. The four-layered feedforward neural network is not necessary to reveal the mapping relationship between variables in advance, and the backpropagation algorithm is used to learn the mapping between input and output [30,31]. The feedforward neural network can minimize the loss function between the estimated value and the real value, and is widely used in regression prediction [32,33].
The input of the four-layered feedforward neural network is significant for estimating the accuracy of the relative PSNR. The CS image reconstruction model typically consists of measurement data fidelity and sparsity of an image in a particular transform domain. When the sampling rate and the bit-depth are fixed, the reconstruction quality of the image is closely related to the sparsity. According to the large-scale random matrix spectrum analysis theory, the literature [34] infers that the sparsity of the signal can be estimated based on the average energy of the measurements, because the average energy of the measurements can be calculated based on the variance and the mean (Equation (22)). To increase the diversity of the input variables, we used the variance and the mean as an alternative to sparsity as follows: Therefore, we proposed the sampling rate m, the bit-depth b, the variance σ 2 0 of the first sampled measurements, and the mean y 0 of the first sampled measurements as input variables of the relative PSNR model. In this paper, we designed four neurons in the input layer, one neuron in the output layer, and two layers in the hidden layer for the relative PSNR network, as shown in Figure 2. The mathematical form of the relative PSNR model can be expressed as follows: where g(v) = 2 1+e −2v − 1 is an activation function, u 1 is the input variable vector, F is the relative PSNR as the output, j is the number of network layers, and W j , d j is the model parameter. When training the network, the loss function uses the mean squared error (MSE) between the actual value and the estimated value.

Rate-Distortion Optimization for Sampling Rate and Bit-Depth
In this part, we use the designed bit-rate model and relative PSNR model to optimize the sampling rate and bit-depth jointly.

Rate-Distortion Optimization Algorithm
We introduced the relative PSNR substitution distortion into problem (5). The optimization problem of sampling rate m and bit-depth b can be expressed as follows: From bit-rate model (17), let max RR  ; there is a correspondence between the sampling rate and the quantization depth as follows: The number of bit-depth b is less than the number of sampling rate m , and is much less than the number of combinations for bit-depth and sampling rate. According to Equation (25), the number of candidate parameters of problem (5) can be reduced to the same number as the bitdepth. Therefore, the proposed adaptive CS image coding framework with rate-distortion optimization follows the main steps below: Sampling rate is 0 m , and the original image is measured to obtain partial measurements

Rate-Distortion Optimization for Sampling Rate and Bit-Depth
In this part, we use the designed bit-rate model and relative PSNR model to optimize the sampling rate and bit-depth jointly.

Rate-Distortion Optimization Algorithm
We introduced the relative PSNR substitution distortion into problem (5). The optimization problem of sampling rate m and bit-depth b can be expressed as follows: From bit-rate model (17), let R = R max ; there is a correspondence between the sampling rate and the quantization depth as follows: The number of bit-depth b is less than the number of sampling rate m, and is much less than the number of combinations for bit-depth and sampling rate. According to Equation (25), the number of candidate parameters of problem (5) can be reduced to the same number as the bit-depth.
Therefore, the proposed adaptive CS image coding framework with rate-distortion optimization follows the main steps below: (1) Input: R max (2) First sampling.
Sampling rate is m 0 , and the original image is measured to obtain partial measurements y 0 ∈ R round(Nm 0 )×1 .
(3) Extracting features. Calculate the mean y 0 , the variance σ 2 0 , the maximum y max and the minimum y min of y 0 . (4) Reducing the candidate set.
Calculate the sampling rate m corresponding to each bit-depth b based on Equation (25), obtaining a candidate parameter set (b 1 , m 1 ), . . . , (b λ , m λ ) , where λ represents the number of quantization depths.
Estimate the relative PSNR of all candidate parameters according to the four-layered feedforward neural network, and select the parameter (b * , m * ) for which relative PSNR is best. m * is the optimized sampling rate and b * is the optimized bit-depth. Sampling rate is m = m * − m 0 , and the original image is measured to obtain the remaining measurements.
The measurements of the two samplings are quantized using the bit-depth b * , and then are entropy encoded.

Model Parameter Estimation for the Bit-Rate Model and the Relative PSNR Model
In order to estimate the model parameters of the proposed average codeword length model and the relative PSNR model, 100 images in the BSDS500 dataset [35] were randomly selected for training, and the BSD68 dataset [36] was used for testing, each image being cropped to a 256 × 256 size. At the encoder, the same orthogonal Gaussian measurement matrix was first used for block CS sampling, in which the image block size was 32 × 32 (the measurement still obeys the approximate Gaussian distribution [26]), and then uniform quantization and arithmetic coding were performed. At the decoder, arithmetic decoding and inverse quantization were first performed, and then CS reconstruction was performed using a non-local low-rank algorithm (NLR-CS) [23], in which the initial image was reconstructed total variation iterative threshold regularization image reconstruction algorithms (BCS-TVIT) [37].
The initial sampling rate m 0 determines the accuracy of the image features estimated by σ 2 0 and y 0 . The larger it is, the better it is to estimate the bit rate and PSNR accurately. However, if m 0 is too large, there may be unnecessary measurements and calculations. When a Gaussian random matrix is used, the number of measured values for reconstructing a high-quality signal is at least M = O(K log(N)) [21], so the best choice of the initial sampling rate m should be O(K log(N))/N, which is difficult to estimate it accurately. We analyzed the sample data of the training set and found that when the sampling rate was lower than 0.013, the visual quality of all reconstructed images was bad, and the PSNR value did not exceed 15 dB. Therefore, we used m 0 = 0.013.
As shown in Table 1, the parameters of our model (16) were obtained by least square fitting with the L y Q in the training set. To quantify the accuracy of the fitting, we also measured the mean squared error (MSE), the Pearson correlation coefficient (PCC), and R-squared (R 2 ) [38] between actual L y Q and predicted L y Q in the test set. The closer R 2 and PCC are to 1, the better the degree of fit of the model.  As can be seen from Table 1, all parameters are non-zero except for the value of c 6 , which verifies the mapping relationship between the sampling rate m, bit-depth b, variance σ 2 0 , interval ∆ y 0 , and average codeword length L y Q . c 6 is the coefficient of 1 can be ignored in model (16).
In Table 2, the R-squared of model (12) reaches 0.9809 and the PCC reaches 0.9904. The R-squared of model (16) reaches 0.9903 and the PCC reaches 0.9952, which is better than the estimation of model (12). The results show that both model (12) and model (16) can describe well the relationship between sampling rate m, bit-depth b, variance σ 2 0 , mean y 0 , and the average codeword length L y Q , and that model (16) is better than model (12). Moreover, bit-rate model (17) based on model (16) has no logarithmic operation, and can quickly calculate the sampling rate based on the bit-depth b and the R max to narrow the parameter candidate set, which is more conducive to practical application. When collecting data about the relative PSNR, we took b 1 = 3, m 1 = 0.013, b 2 = 8, m 2 = 0.4 for F 1 F 2 F 3 F 4 . We used the "newff" function in MATLAB 2018b software for training PSNR, F 1 , F 2 , F 3 and F 4 , respectively, where the input and the four-layered feedforward neural network are the same. The training and testing performances are shown in Table 3.  Table 3 shows that the effect of fitting the PSNR using the same input variables and network structure is the worst, because PSNR is calculated from the difference between the original image and the reconstructed image. In addition to being related to the sampling rate m, quantized bit-depth b, and the variance σ 2 0 and average y 0 of some measurements, PSNR is also closely related to other factors. Compared with the estimated PSNR, the performance of the estimated F 1 , F 2 , F 3 , and F 4 is improved. Among them, the effect of estimating F 4 is the best, which shows that the mapping relationship between sampling rate m, bit-depth b, variance σ 2 0 , mean y 0 , and F 4 is closer than that with F 1 , F 2 , and F 3 . Therefore, we chose F 4 to evaluate distortion.

Computational Complexity of the Rate-Distortion Optimization Algorithm
The additional computational complexity of the rate-distortion optimization for sampling rate and the bit-depth is mainly derived from feature extraction, rate estimation, and relative PSNR estimation.
The calculation of extracting features is mainly from the σ 2 0 , y 0 , y max , and y min values. Assuming the image size is I × I and the block size is 32 × 32, the number of measurements obtained by the first sampling is 0.013 × I 2 . The calculation of y 0 requires 0.013 × I 2 − 1 additions and one multiplication. The calculation of σ 2 0 requires 0.013 × I 2 × 2 − 1 additions and 0.013 × I 2 +1 multiplications. The y max and y min require a total of up to 0.013 × I 2 −1 × 2 comparisons. Assuming that a comparison requires two subtractions, a total of 0.013 × I 2 −1 × 4 subtractions are required. The first sampling requires 0.013 × I 2 × 1023 additions and 0.013 × I 2 × 1024 multiplications. Assuming the same computational complexity of subtraction and addition, extracting features require a total of 0.078 × I 2 − 6 additions and 0.013 × I 2 + 2 multiplications. The extracted feature additionally adds 0.11% multiplication and 0.59% addition compared to the first sampling. The calculation of the rate estimation process mainly comes from the calculation of Equation (25).
Since the bit-depth is a finite discrete value, (2 b − 1) 2 can be calculated using a lookup table in the equation. At this point, calculating Equation (25) requires seven additions and seven multiplications. We chose seven bit-depths as candidate values, and then Equation (25) had to calculate a total of 49 additions and 49 multiplications. The calculation of the relative PSNR estimation process mainly comes from the calculation of the neural network model (23). The network input layer has four neurons, and the output layer has one neuron. The network has two hidden layers, each with six neurons. The number of network parameters is 4 × 6 + 6 + 6 × 6 + 6+6 × 1 + 1 = 79. Networks without activation functions include 4 × 6 + 6 × 6 + 6 × 1 = 66 multiplications and 3 × 6 + 6 + 5 × 6 + 6 + 5 + 1 = 66 additions. The hidden layer uses the sigmoid activation function. It is assumed that the series approximation calculates the exponential power. When the precision is 10 −7 , it takes about 60 multiplications and 10 additions to calculate an activation function. Calculating 12 activation functions requires 720 multiplications and 120 additions. The calculation of the network model once is about 782 multiplications and 182 additions. If we select seven bit-depths as candidate values, we must calculate the relative PSNR of seven candidate parameters. In this case, we had to calculate 5474 multiplications and 1274 additions in total.
A measurement requires 1024 multiplications and 1023 additions. The computation of the estimated bit rate and relative PSNR does not exceed the multiplications of six measurements and the additions of two measurements. When compressing an image of size 256 × 256, the first sampling can obtain 852 measurements. The computation of the estimated bit rate and relative PSNR increases the multiplications by 6/852 ≈ 0.7% and the additions by 2/852 ≈ 0.23%. Compared with the computation of the first sampling, the additional computation of the entire rate-distortion optimization process increases by 0.81% multiplication and 0.82% addition.

Numerical Results and Analysis
We performed some numerical tests to check the performance of the proposed algorithm. In our simulation, we tested Monarch, Cameraman, Peppers, and Lena (as shown in Figure 3), as well as 68 images from the BSD68 dataset, which were cut to a size of 256 × 256. All simulations were run on MATLAB 2018b software on a Core i5 machine with 8 GB of RAM. multiplications. The extracted feature additionally adds 0.11% multiplication and 0.59% addition compared to the first sampling.

multiplications. The
The calculation of the rate estimation process mainly comes from the calculation of Equation (25). Since the bit-depth is a finite discrete value, The calculation of the relative PSNR estimation process mainly comes from the calculation of the neural network model (23). The network input layer has four neurons, and the output layer has one neuron. The network has two hidden layers, each with six neurons. The number of network parameters is 4 × 6 + 6 + 6 × 6 + 6+6 × 1 + 1 = 79. Networks without activation functions include 4 × 6 + 6 × 6 + 6 × 1 = 66 multiplications and 3 × 6 + 6 + 5 × 6 + 6 + 5 + 1 = 66 additions. The hidden layer uses the sigmoid activation function. It is assumed that the series approximation calculates the exponential power. When the precision is 7 10  , it takes about 60 multiplications and 10 additions to calculate an activation function. Calculating 12 activation functions requires 720 multiplications and 120 additions. The calculation of the network model once is about 782 multiplications and 182 additions. If we select seven bit-depths as candidate values, we must calculate the relative PSNR of seven candidate parameters. In this case, we had to calculate 5474 multiplications and 1274 additions in total.
A measurement requires 1024 multiplications and 1023 additions. The computation of the estimated bit rate and relative PSNR does not exceed the multiplications of six measurements and the additions of two measurements. When compressing an image of size 256 × 256, the first sampling can obtain 852 measurements. The computation of the estimated bit rate and relative PSNR increases the multiplications by 6 / 852 0.7%  and the additions by 2 / 852 0.23%  . Compared with the computation of the first sampling, the additional computation of the entire rate-distortion optimization process increases by 0.81% multiplication and 0.82% addition.

Numerical Results and Analysis
We performed some numerical tests to check the performance of the proposed algorithm. In our simulation, we tested Monarch, Cameraman, Peppers, and Lena (as shown in Figure 3), as well as 68 images from the BSD68 dataset, which were cut to a size of 256 × 256. All simulations were run on MATLAB 2018b software on a Core i5 machine with 8 GB of RAM.  In order to verify the accuracy of the bit-rate model, we set the target bit rate to 0.1, 0.2, ..., 1 bit per pixel (bpp), where the bit-depth set was {3, 4, ..., 9}. The actual bit rate of the optimized result with the proposed algorithm is shown in Tables 4 and 5.
In Table 4, the error represents the difference of the actual bit rate minus the target bit rate, the error percentage represents the percentage of the error in the target bit rate, and the absolute error percentage is the absolute of the error percentage. Table 4 shows that the actual bit rate is very close to the target bit rate for Monarch, Cameraman, Peppers, and Lena coded by the proposed method. When the target bit rate is 0.1, although the bit-rate error percentage is the largest, the error is between 0.0017 bpp and 0.0027 bpp, which belongs to a smaller range.  In order to verify the accuracy of the bit-rate model, we set the target bit rate to 0.1, 0.2, ..., 1 bit per pixel (bpp), where the bit-depth set was {3, 4, ..., 9}. The actual bit rate of the optimized result with the proposed algorithm is shown in Tables 4 and 5.
In Table 4, the error represents the difference of the actual bit rate minus the target bit rate, the error percentage represents the percentage of the error in the target bit rate, and the absolute error percentage is the absolute of the error percentage. Table 4 shows that the actual bit rate is very close to the target bit rate for Monarch, Cameraman, Peppers, and Lena coded by the proposed method. When the target bit rate is 0.1, although the bit-rate error percentage is the largest, the error is between 0.0017 bpp and 0.0027 bpp, which belongs to a smaller range.  It can be seen from Table 5 that the average of the actual bit rate is very close to the target bit rate for the BSD68 test set coded by the proposed method. The average absolute error percentage in the BSD68 test set is between 1.81% and 2.33%, which is slightly higher than the results in Table 4. According to specific data, it can be observed that the bit-rate error of image "test20" is the largest in the BSD68 test set. This is due to a large number of white background areas in image "test20", which leads to multiples of the entropy coder far exceeding other images for quantized measurements. Even so, the compression performance of "test20" is still better than the CS encoding method without the entropy coder.
In order to verify the validity of the relative PSNR model, we first calculated the parameter candidate set (b 1 , m 1 ), . . . , (b num , m num ) based on Equation (25) for each image in the test set (BSD68), then performed compression decoding on each parameter and calculated the PSNR value of the decoded image, and finally compared the real PSNR and the degree of PSNR based on the relative PSNR model. The results are shown in Table 6. In Table 6, the optimal percentage indicates the percentage of the number of images in which the relative PSNR model selects the optimal parameters from the candidate set. The suboptimal percentage indicates the percentage of the number of images in which the relative PSNR model selects the suboptimal parameters from the candidate set. The average PSNR error represents the average of the PSNR errors for all test images. When calculating the PSNR error of an image, we first calculated the candidate parameter (b 1 , m 1 ), . . . , (b num , m num ) based on the target bit rate and Equation (25). Second, we calculated the PSNR of decoded images for all candidate parameters, then estimated the optimal parameters based on the relative PSNR model and found the corresponding PSNR. Finally, we took the absolute difference between the PSNR of the estimated parameters and the maximum PSNR as the PSNR error. Table 6 shows that the percentage of the optimal parameters and the suboptimal parameters is between 92.65% and 100%. When the target bit rate is 1 bpp, the ratio of successful selection of the optimal and suboptimal is at least 88.24%. This occurs because, with the increase in the target bit rate, the PSNR difference between different parameters is small, resulting in estimation errors. Although the optimal percentage is not very high, the average PSNR error is between 0.128 dB and 0.299 dB, which is a small range. There is some error in the optimization result of the relative PSNR model, and it is acceptable compared to the computational complexity of undergoing the distortion cost of all candidate parameters.
In order to verify the rate-distortion (RD) performance of the proposed method, a comparative experiment was performed with the conventional CS coding method. In the traditional method, we first used the fixed m = 0.1 and verified b to obtain different bit rates; we then obtained different RD curves by using m = 0.2, 0.3, and 0.4, respectively. Taking the images Cameraman and Lena as examples, we easily obtained the five different RD curves shown in Figure 4. of successful selection of the optimal and suboptimal is at least 88.24%. This occurs because, with the increase in the target bit rate, the PSNR difference between different parameters is small, resulting in estimation errors. Although the optimal percentage is not very high, the average PSNR error is between 0.128 dB and 0.299 dB, which is a small range. There is some error in the optimization result of the relative PSNR model, and it is acceptable compared to the computational complexity of undergoing the distortion cost of all candidate parameters. In order to verify the rate-distortion (RD) performance of the proposed method, a comparative experiment was performed with the conventional CS coding method. In the traditional method, we first used the fixed m = 0.1 and verified b to obtain different bit rates; we then obtained different RD curves by using m = 0.2, 0.3, and 0.4, respectively. Taking the images Cameraman and Lena as examples, we easily obtained the five different RD curves shown in Figure 4. As can be seen from Figure 4, the rate-distortion performance of the proposed method is the best. The main reason is that the method of fixed sampling rate cannot adjust the sampling rate. Nevertheless, the proposed method can adaptively select the sampling rate according to the bit-rate As can be seen from Figure 4, the rate-distortion performance of the proposed method is the best. The main reason is that the method of fixed sampling rate cannot adjust the sampling rate.
Nevertheless, the proposed method can adaptively select the sampling rate according to the bit-rate model and the bit-depth, and we combined the relative PSNR model for parameter optimization. The proposed method has therefore the best rate-distortion performance.

Conclusions
Both quantization and CS sampling cause distortion in a CS-based imaging scheme. Given a bit budget, it is essential to assign quantization bit-depth and sampling rate. Rate-distortion optimization plays a crucial role for the image/video encoder. In this work, we proposed a low-complexity rate-distortion optimization method to jointly optimize the sampling rate and the quantization bit-depth through the proposed bit-rate model and distortion model. First, we proposed a simple bit-rate model based on the information entropy and the second-order Taylor expansion. The bit-rate model can estimate the sampling rate according to the quantization bit at a given target bit rate, thereby reducing the range of the parameter candidate set. Second, we introduced the relative PSNR as the equivalent function of distortion. We proposed a four-layered feedforward neural network to learn the relative PSNR model, where the model can improve the accuracy of estimating the level of distortion. The experimental results show that the actual bit rate of compression with the proposed method is very close to the target bit rate. Compared with the traditional CS coding method, this method provides a better rate-distortion performance with very little extra computation.

Acknowledgments:
The authors would like to thank the anonymous reviewers for their valuable comments, and thank the authors of [23,37] for providing their respective codes.

Conflicts of Interest:
The authors declare no conflict of interest.
It is easy to find two points x 1 . Using Taylor's second-order expansion formula, we obtain the following: 1 ) +
We can also find two points (x (1) 2 ) and (x (2) 1 , x 2 ), which satisfy Using Taylor's second-order expansion formula, we obtain the following: 2 ) + The k-order partial derivative of g 1 (x 1 , x 2 ) at point (x (1) 1 , x (1) 2 ) are equal to the k-order partial derivative of g 2 (x 1 , x 2 ) at point (x (2) 1 , x (2) 2 ) based on ax 2 ) = g 2 (x (2) 1 , x (2) 2 ), where k = 1, . . . , n. So Therefore, log 2 (ax 1 x 2 + C) ≈ log 2 (ax 1 x 2 ) + 2 i=1 c i x i + c 3 , where In the same way, we only need to find two sets of points that satisfy a n i=1 x i + C = a n i=1 x i , and use the second-order Taylor series expansion to obtain the following: log 2 (a n i=1 x i + C) ≈ log 2 (a n i=1 x i ) + n i=1 c i x i + c n+1 According to log 2 (a n i=1 x i ) = n i=1 log 2 (x i ) + log 2 (a), we can obtain the following: where c n+2 = c n+1 + log 2 (a).