VLSI Implementation of a High-Performance Nonlinear Image Scaling Algorithm

This study implements the VLSI architecture for nonlinear-based picture scaling that is minimal in complexity and memory efficient. Image scaling is used to increase or decrease the size of an image in order to map the resolution of different devices, particularly cameras and printers. Larger memory and greater power are also necessary to produce high-resolution photographs. As a result, the goal of this project is to create a memory-efficient low-power image scaling methodology based on the effective weighted median interpolation methodology. Prefiltering is employed in linear interpolation scaling methods to improve the visual quality of the scaled image in noisy environments. By decreasing the blurring effect, the prefilter performs smoothing and sharpening processes to produce high-quality scaled images. Despite the fact that prefiltering requires more processing resources, the suggested solution scales via effective weighted median interpolation, which reduces noise intrinsically. As a result, a low-cost VLSI architecture can be created. The results of simulations reveal that the effective weighted median interpolation outperforms other existing approaches.


Introduction
Digital image interpolation or scaling is an issue that has recently received great attention. Image scaling is a process of resizing a digital image, and it is a nontrivial process that involves a tradeoff between efficiency, smoothness, and sharpness. Nowadays, the image scalar is widely adopted in portable healthcare devices, digital electronic equipment, digital camera, digital photo frame, mobile phone, touch panel computers, etc. It has become a significant trend to design a low-cost, high-quality, and high-performance image scalar by the VLSI technique for multimedia products. As the graphic and video applications of mobile handset devices grow up, the demand and significance of image scaling are more and more outstanding.
e image scaling algorithms based on interpolation are basically of two types: linear and nonlinear interpolation methods. e simplest linear interpolation method is a nearest neighbour algorithm which is a low-complexity algorithm, but it results in scaled images with blocking and aliasing artifacts. e most widely used scaling method is bilinear interpolation algorithm by which the target pixel can be obtained by using the linear interpolation model in both horizontal and vertical directions. Another popular polynomial-based method is bicubic interpolation algorithm, which uses an extended cubic model to acquire the target pixel by a 2D regular grid.
e nonlinear interpolation methods such as weighted median interpolation, curvature interpolation, bilateral filter, and autoregressive model greatly improve image quality by reducing blocking, aliasing, and blurring effects compared to linear methods.
Many image scaling algorithms have been developed mostly based on interpolation and are edge-oriented. In this section, major scaling algorithms are explained. An edgeoriented area-pixel scaling processor was implemented with low-complexity VLSI architecture [1]. A simple edgecatching technique is adopted to preserve the image edge features. A JPEG edge-oriented area-pixel scaling processor performs scale-up/scale-down transformation by using the area-pixel model instead of the common point model with a simple edge-catching technique to preserve edge features effectively so as to achieve better image quality [2]. e direct implementation of area-pixel scaling requires some extensive floating-point computations so that a suitable approximate low-cost VLSI implementation technique has been used. A novel image zooming algorithm using curvature interpolation was developed. It results in clear images of sharp edges which are already denoised and superior to those obtained from linear methods and PDE-based methods [3]. A real-time FPGA architecture of the extended linear convolution for the image scaling method [4] provides simple hardware architecture design with low computation cost. Compared to the latest bicubic hardware design [5], the architecture saves about 60% of hardware cost.
is unstructured way of big data information holds huge irrelevancy along with redundant image details which are usually difficult to handle and access. So, the researchers proposed several new BD approaches [39,40] to acquire the relevant details from the web. By these data, an effective image scaling approach is designed to insist its performance in the VLSI architecture.
In the proposed work, instead of using linear interpolation, a nonlinear method is adopted to enhance the performance of image scaling with reduced hardware complexity. For analyzing the performance of the proposed work, two recently developed image scaling techniques [7,41] are explained in the following section.

Existing Techniques
A low-cost high-quality image scaling processor has been recently proposed [41]. It consists of a sharpening spatial filter, clamp filter, and bilinear interpolation. Figure 1 shows the block diagram of the bilinear interpolation-based image scaling processor. e combined sharpening and clamp filters serve as the prefilter to reduce blurring and aliasing artifacts in the scaled image. Hence, the computing resources and memory buffers are reduced by using this technique [41]. e clamp filter, a low-pass filter, is combined with the sharpening spatial filter as the prefilter to reduce the blurring effect. For efficient hardware implementation, a 3 × 3 clamp filter and 3 × 3 sharpening filter are combined together into a 5 × 5 filter as kernel [prefilter] � 1 1 1 where C is the clamp parameter used to enhance the differences along the direction of edges to reduce the unwanted discontinuous edges and aliasing effects and S is the sharp parameter which is used to vary the degree of sharpening. ese parameters are set according to the characteristics of the image. e kernel of the combined filter is given as (2) Figure 2 shows the low-cost VLSI implementation of the prefilter for the local window of size 5 × 5 [6,41]. It consists of 25 shift registers to store 25 pixels of the 5 × 5 local window which is convolved with the coefficients of the prefilter. e convolution operation needs 8 shifters (SH), 5 shifter-adders (SA), 8 calculation units (CU), 1 multiplieradder (MA), and 24 adders. e calculating unit is designed with a reconfigurable feature for computing clamp and sharp parameters.
e main limitation of this technique is high complexity due to combined filter design, logic for implementing hardware sharing, and reconfigurable techniques.
Another recent technique, a nonadaptive image scaling algorithm, using high-boost filtering has been proposed [7]. e image scaling is performed by linear interpolation, and then enhancement is done using high-boost filtering.
is technique results in high-quality image scaling, but the VLSI implementation needs complex hardware. ough many efficient image scaling algorithms have been developed, additional processing is required to enhance the scaling performance. Hence, the main focus of this work is to develop an efficient image scaling algorithm with less hardware complexity. Figure 3 shows the block diagram of nonadaptive linear interpolation-based image scaling.
For real-time applications, another VLSI architecture is implemented by using the anisotropic probabilistic neural network (APNN) [42]. APNN is one of the interpolation techniques employed within the VLSI architecture that sharply improves the edge region and greatly reduces the blurred effect of an image. In this implementation, the processing speed is usually four times faster than that of the personal computer at 3.4 GHz. However, a huge amount of resource utilization is one of the shortcomings made in the hardware APNN. To save hardware resource utilization, a new approach of the VLSI system is proposed based on unified textual and dynamic compressive features (UTDCF) [43]. It performs several paradigms of memory-centric levels, multiple pipelines, and processing circuits to attain a high frame rate of object tracking capability. is approach not only consumes fewer resources and high speed but also attains reasonable memory consumption. Obviously, these   Journal of Healthcare Engineering massive parallel circuits attain greater advantages in terms of real-time performance. However, it cannot compete with the majority of embedded applications.

Proposed Work
3.1. Motivation. Image scaling operation enlarges or reduces the size of the image (spatial resolution) in terms of pixels. Image resolution refers to the amount of information an image can hold and is controlled by the number of pixels or bit depth/pixel. As the resolution of an image changes from the capturing device to display or to print device, image scaling is normally required, especially when to match lowresolution display devices to high-resolution devices, and vice versa. When the resolution is larger, the scaling (enlarging) can be possible without any loss of sharpness and image details. Figure 4 shows the enlarged version of the images captured with different resolutions and squaring effects found in edges of the low-resolution image. Also, the scaled image needs larger memory and longer processing time. Hence, the main focus of this work is to develop a memory-efficient, high-performance VLSI architecture for image scaling algorithm.

Effective Weighted Median Interpolation-Based Image
Scaling. In this work, an effective weighted median interpolation-(EWMI-) based image scaling algorithm is developed. It is a nonlinear method which performs interpolation as well as denoising. Hence, a low-cost hardware architecture is implemented, and it results in scaled images of high visual quality without using any prefiltering compared to linear interpolation methods. Figure 5 shows the block diagram of the proposed EWMI image scaling architecture. e major blocks are the register set to hold four neighbours, sorting block, and impulse noise detector and remover blocks. For computing the effective weighted median value for interpolation and denoising, a 3 × 3 local window is considered. An efficient sorter architecture is designed with two features such as precomputation logic and low computation complexity. Precomputation logic is added for power saving, and the sorted array size is made to odd for selecting the median value without addition and division operations. For image scaling, first, an empty array of size 2N × 2N, where N × N is the input image size, is constructed and stored in memory. e array elements are labelled as shown in Figure 6. e elements with label a 00 i,j are replaced by the original pixel values. e remaining elements are interpolated using the proposed EWMI image scaling algorithm. Next, the elements with label x 11 i,j are taken for interpolation. Its four diagonal neighbours (original pixel values) in the local window are multiplied with a weight value of 1 and given to the sorter unit. e median is computed from the sorted array, and then the effective median value is computed to replace x 11 i,j . Next, elements with labels x 10 i,j and x 01 i,j are interpolated by considering two horizontal neighbours (original pixels) and two vertical neighbours (previously interpolated pixels). e neighbours with interpolated values are assigned with a suitable weight value in the range of 0.2 to 0.9. Pixels in the scaled array are interpolated by taking diagonal or vertical and horizontal neighbours in a local window of size 3 × 3 as shown in Figure 7. In order to speed up this processing, interpolation using diagonal neighbours can be overlapped with interpolation using vertical and horizontal neighbours using the pipelining technique. e sorter unit contains binary comparator and swap units. Bitwise comparisons are performed, and the precomputation logic is used to avoid unnecessary switching. Significant power reduction is achieved with negligible area overhead. Sorted array X is with four elements, and two centre values are to be added and divided by 2 for computing the median value. To reduce the hardware complexity, the maximum value or minimum value of the sorted array is duplicated to make the array size as 5. X(3) is now chosen as the median value for Xmed extra computations. Next, X med is tested whether it is a noisy pixel (equal to 0 or 255) or not. If not, X eff � X med ; otherwise, X eff suitable representative value is computed as per the proposed algorithm. e effective median value is used for interpolation; also, the relative distance between the pixels in the local window is computed, and the elements which are deviating much from the distance are also identified and replaced by X eff . Hence, the proposed EWM computation not only performs scaling but also is used for detecting and removing the impulse noise. Figure 5 represents the modified image scaling algorithm.

EWMI Algorithm
Step 1: given input image of size N × N, construct an array of size 2N × 2N Step 2: label array elements in pixel positions as a 00 i,j , x 01 i,j , x 10 i,j , x 11 i,j , where i, j � 1 to 3, as shown in Figure 7 Step 3: replace elements labelled a 00 i,j by the respective original pixel values Step 4: define a 3 × 3 local window for each array element x 11 i,j , and perform the following: (a) Read and sort the four diagonal neighbours in an ascending order (b) Duplicate the max value into sorted array X to make the array size as X [5] (c) Select the median value as the centre value X � X med [3] (d) Define impulse noise values I1 � 0 and I2 � 255 (e) If X med ≠ I1 or X med ≠ I2, then X eff � X med , and go to "(g)" (f ) Else, compute X eff � (I1 + I2)/4 (g) Find the relative difference between adjacent pixels (d i � X i + 2 − X i + 1) and choose D max � max(d i ) (h) Replace x 11 i,j by X eff and its diagonal neighbours whose D i > D max Step 5: define a 3 × 3 local window for each array element x 10 i,j and (a) Apply the scale value of 0.6 to two horizontal neighbours (b) Read and sort the two horizontal and two vertical neighbours in an ascending order (c) Repeat steps from 4 (b) to 4 (h) Step 6: define a 3 × 3 window for each array element x 10 i,j x 01 i,j and (a) Apply the scale value of 0.6 to two horizontal and two vertical neighbours (b) Read and sort the eight neighbours in an ascending order (c) Repeat steps from 4 (b) to 4 (h).

Simulation Results and Performance Analysis
Xilinx ISE design suite 13.2 tool has been used for implementing the VLSI architecture of the proposed effective WMI image scaling algorithm using Verilog HDL, and the MATLAB R2010b image processing tool box is used to verify the visual quality of the scaled images. e performance of scaled images of the proposed and existing techniques [7,[41][42][43][44] is analyzed in terms of PSNR. Image samples from the LIVE image quality database and real-time blur image database [45,46] are used for the performance analysis. e real-time blur image database [46] contains 585 images with resolutions ranging from 1280 × 960 to 2272 × 1704 pixels. In this work, based on the resolution, two types of image samples with high-and low-resolution ranges of different sizes are taken, and scaling is performed. A high-resolution image of size 256 × 256 is scaled into 512 × 512. From Figures 8 and 9, it is confirmed that the edge details are well preserved and better in the proposed than the existing algorithms for both digital natural images and magnetic resonance (MR) image (medical data). Table 1 represents the comparison of the PSNR value of existing and proposed image scaling techniques with the scale factor as 2. Table 2 highlights the test sample images. Table 3 describes the comparison of computing resources and memory requirement. Table 1 gives PSNR values of scaled images. Some of the sample images from the LIVE image quality database [45] and real-time blur database [46] are given in Table 2. From Table 1, the proposed technique results have better quality (PSNR) than than the existing algorithms for various resolutions, and the artifacts are effectively removed by the proposed algorithm. e performance is even better for noisy images. Figure 10 shows the image of size 128 × 128 with 0.4 impulse noise and the scaled images by existing and proposed scaling algorithms. Table 3 gives details of computing resources used for the implementation of existing [41] and proposed architectures for image scaling. It is found that the proposed EWMI image scaling architecture is of low cost compared to the existing one [41]. It inherently removes the noise and is used for scaling operations, whereas in the existing techniques, separate filters are needed to preserve edge details. Hence, the proposed architecture is a low-cost, memory-efficient, and high-quality image scaling algorithm.

Conclusion
In this work, VLSI implementation of an effective WMI image scaling algorithm is proposed. e main contribution of this work is developing an effective weighted median technique capable of performing interpolation as well as denoising. As the degree of scaling increases, the proposed technique removes the blurring and preserving edge details compared to other existing techniques. e proposed work yields better performance with reasonable hardware complexity. In future, techniques to minimize the hardware complexity of effective WMI image scaling will be performed. e major limitation of the proposed image scaling algorithm is that it is used only for zooming and for scale down, and proper modifications will be done.  Data Availability e data that support the findings of this study are available within the article.

Conflicts of Interest
e authors declare that there are no conflicts of interest.