Homogeneous Image Compression Techniques with the Shannon-Fano Algorithm

: Compression is a field that needed at this time where increasingly high-tech digital and computing makes it possible to process data in a size that is large enough like multimedia. Compression needed to keep pressing the storage media consumption of data and information stored on computer media. The Shannon-Fano compression algorithm is one of the well-known compression algorithms and is useful in saving data storage space. Shannon-Fano compression algorithms can be performed on text and digital images. In this study applying the Shannon-Fano method to homogeneous digital images using applications created with the MATLAB Program. The product of this research will be tested using several images taken using a webcam. The product tested to find out whether the coding used is correct or has errors. The product will be successful if it can reduce the size of the compressed image from the original image. Homogeneous images that have tested using the Shannon-Fano image compression application have an average compression ratio value of 52%. The size of the compression results reaches half of the original image. The Shannon-Fano compression algorithm detects the number of times that characters appear in each experiment, then the coding of the frequency of characters appearing in binary numbers.


Introduction
Along with the development of large-capacity storage media, people no longer encounter problems if they have large files.The more so if the file we have is an image file.However, sometimes the large file size is annoying if we must manage the storage media that we have for various data [1].Especially if we will send the file electronically, of course the file capacity becomes a problem.Images are vital and become an integral part of daily life.In specific interests, the image (picture) used as a tool to express reasoning, interpretation, illustration, representations, memorization, education, communication, evaluation, navigation, surveying, and entertainment [2].
The development of camera technology has helped humans in many ways, with the advantages provided.Currently the camera not only used to capture the moment but the current camera can also use as a safety tool [3].This type of camera commonly referred to as a security camera.Security cameras are cameras that use for surveillance purposes in public places that considered vulnerable and dangerous such as factories, schools, supermarkets, offices, warehouses.To minimize the risk of crime and threats [4].The results of the security camera are not only in the form of images with JPG format as usual cameras, but there are also in the form of video with AVI format.The great benefits of cameras and video besides used as a security system can also use as documentation and legal evidence [5].
The large file size is a drawback of the current camera sophistication.As a result, it takes a more extensive storage media to store an image/frame.The use of more extensive data storage media has implications for the use of higher costs.To reduce the use of more substantial costs, the file size of an image/frame should be smaller [6].
Reducing the size of image/frame or video files can be done by three things, namely, changing the number of frames per second, saving images/frames or videos when objects detected and compressing images/video frames.Changes in the number of frames per second have done to change the size of video files.The results of CCTV security camera footage will look like a video with a disjointed object motion (not smooth).In the second method, the picture/frame begins to recorded after there is a moving object, causing a change in the intensity of the image at a location.Then, the third method uses the image or video compression method.Compression Algorithm is a data compression technique in order to obtain a file with a smaller size than the original size or the process of converting a set of data into a form of code to save storage space requirements and time for data transmission [7]- [9].
To overcome the large image size, namely by making an image compression application using data compression methods.The compression method divided into two groups, namely: Static methods that use code maps that are always the same.This method requires two phases (two-pass): the first phase to calculate the probability of the appearance of each symbol/character and determine the code map and the second phase to convert the message into a set of code to transmit.For example in static Huffman, arithmetic coding such as Shannon-Fano coding [10], [11], static/dynamic/adaptive Huffman coding [12]- [15] for example in LZW and DMC algorithms and arithmetic codes [16], [17].And then the compression technique by replacing the character/fragment in the input file with the location index of the character/fragment in a dictionary such as the Lemp-Zif-Welch (LZW) data compression technique [18]- [20].
Data compression usually obtained by making an original symbol in the data with symbols that are smaller in size.Symbols can be any character, word, phrase, or another unit that can be stored in a symbol dictionary [21].Image compression using the Shannon-Fano method has widely studied and implemented, as done by Shanmugasundaram and Lourdusamy [22]; Kodituwakku and Amarasinghe [23]; Vaidya, Walia, and Gupta [24].Research on the application of the Shannon-Fano method to images that have been determined at the color depth level still not studied.

Literature Review
Digital image processing refers to processing 2dimensional images using a computer.In a broader context, digital image processing refers to the processing of any 2-dimensional data.Digital images are arrays that contain real or complex values represented by a particular sequence.An image can define as a function ƒ (x, y) measuring M rows and N columns, where x and y are spatial coordinates, and the amplitude ƒ at the coordinate point (x, y) called the intensity or gray level of the image at that point.If the value of x, y and amplitude value ƒ limited and discrete, it can say that the image is a digital image.Values on a slice between rows and columns (at position x, y) called picture elements, image elements, and pixels.The last term (pixel) most often used in digital images.The formation of digital images (discrete) through several stages, namely Image Acquisition, Sampling, Quantization.
The image resolution is the level of detail of an image.The higher the image resolution, the higher the level of detail of the image.The unit in measuring image resolution can be a physical size (number of lines per mm/number of lines per inch), or it can also be a total image size (number of lines per image height).The resolution of an image can measure in various ways as follows: a. Pixel resolution Pixel resolution is a calculation of the number of pixels in a digital image.Another understanding of image resolution is the result of multiplying the number of pixels in width and height, then divided by 1 million.This type of pixel resolution often found in digital cameras.An image that has a width of 2,048 pixels and a height of 1,536 pixels will have a total pixel of 2,048 × 1,536 = 3,145,728 pixels or 3.1 megapixels.b.Spatial Resolution Spatial resolution shows how close each line is to the image.The distance depends on the system that created the image.Spatial resolution produces the number of pixels per unit length.c.Spectrum Resolution A digital image distinguishes intensity into several spectra.Multi-spectrum images will provide a better spectrum or wavelength that will use to display colors.

d. Temporal Resolution
Temporal resolution is related to video.A video is a collection of static frames in the form of sequential images and displayed quickly.The temporal resolution gives the number of frames that can display every second in units of frames per second (fps).

e. Radiometric Resolution
This resolution provides the value or level of smoothness of the image that can display and usually displayed in units of examples 8-bit images and 256-bit images.The higher the radiometric resolution, the more significant the difference in intensity displayed.

Image
The value of a pixel has a value in a specific range, from the minimum value to the maximum value.The range used varies depending on the type of color.However, in general, the range is 0 -255.Here are the types of images based on pixel values: a. Binary images are digital images that have only two possible pixel values, namely black and white.Binary imagery also called B & W (black and white) or monochrome image.Only 1-bit needed to represent the value of each pixel of a binary image, as seen in Figure 1.bit number making a total of 16,777,216 color variations.This variation is more than enough to visualize all the colors that human vision can see.
Human vision believed only to be able to distinguish up to 10 million colors.Each RGB pixel information point stored in 1-byte data.The first 8-bits store the blue value, then followed by the green value in the second 8-bit, and the last 8-bits are red.

Research Approach
The research used is experimental.Experimental research is the most appropriate research and can genuinely test hypotheses about the cause and effect of a relationship when compared with other research methods.The process of making an image compression application using the Shannon-Fano algorithm will explained in this design section.From the determination of components to the stages of completion.

Designing Tools
MATLAB programming language will use in this study to display image data (images).It due to the image data basically in the form of a color matrix with a certain degree of gray (grayscale).MATLAB is a program for analysis and numerical computation and is a high-level programming language specifically for technical computing.MATLAB provides an interactive system that uses the concept of arrays/matrices as standard variable elements without the need to declare arrays as in other languages.

Product Trial
The product of this research will be tested using several images taken using a webcam.The product tested to find out whether the coding used is correct or has errors.Some images will take using a webcam, then the image from the webcam taker will compress using an application that made, then the size will be compared between the original image with the compressed image from the product made.The product will be successful if it can reduce the size of the compressed image from the original image.

Shannon-Fano Encoding Procedure
The Shannon-Fano method based on variable lengthwords, which means that some symbols in the data represented by symbol codes that are shorter than the symbols in the data.The higher the probability, the shorter the symbol code.In estimating the length of each symbol code, the Shannon-Fano method produces symbol codes that are not the same length, so the codes are unique and can decode.
The procedure in Shannon-Fano encoding is: • Arrange the probability of the symbol from the highest source to the lowest.• Divide into two equal parts and give a value of 0 for the top and 1 for the bottom.• Repeat step two, each division with the same probability until it cannot divide again.• Encoding each original symbol from the source into a binary sequence that generated by each of these sharing processes The Shannon-Fano Algorithm Flow Chart in research can see in Figure 3 as follows:

Image Compression Application Testing
The research results generated in the form of an application developed using MATLAB programming software with its pseudocode to form an image compression application using the Shannon-Fano algorithm.Figures 6 to 10 represent the GUI (Graphic User Interface) image compression application using the Shannon-Fano algorithm.Display the program after it has run by using several homogeneous images as follows: The image contained in Figure 6 included in the category of images that are homogeneous because it has a low color intensity variation that is plain black.Can see in the histogram graph, image 1 is too dark because the intensity of the image gathered at a low gray level value that is 0. The histogram of an image that is too dark usually tends to gather at a low gray level (towards the value of 0).
The captured image is an RGB image with a size of 160×120.The image then compressed using the Shannon-Fano algorithm and produces a compression ratio of 6.13827 with redundancy data of about 0.837 (84%).The lower the color intensity of the compressed image, the higher the amount of redundant data that can generate.This will also affect the size of the compressed image which  The captured image is an RGB image with a size of 160×120.The image then compressed using the Shannon-Fano algorithm and produces a compression ratio of 1.31027 with redundancy data of around 0.236 (24%).Excessive data generated in Figure 7 is lower than redundant data in Figure 6 because the captured image has more color intensity compared to the previous image in Figure 7.The difference can see in the histogram where the histogram in Figure 6 converges to only one value point which is 0 whereas in Figure 7 the histogram is gathered around 55-155.The difference in length of bytes between the original image and the image compression results in figure 7, which is 13634 bytes.

Figure 8. Results Display Image 3
The image contained in Figure 8 also included in the category of images that are homogeneous because it has a low color intensity variation that is plain white.Can see in the histogram graph, image 3 is too bright because the intensity of the image gathered at a high gray level value which is a value of 255.A histogram of an image that is too bright usually tends to collect at a high gray level value (towards the value of 255).
The captured image is an RGB image with a size of 160×120.The image then compressed using the Shannon-Fano algorithm and produces a compression ratio of 7.96777 with redundancy data of around 0.874 (87%).The lower the color intensity of the compressed image, the higher the amount of redundant data that can generate.This will also affect the size of the compressed image which will be smaller than the original image.The difference in the length of bytes between the original image and the image resulting from compression in figure 8 is huge namely 50367 bytes.The image contained in Figure 9 also included in the category of images that are homogeneous because they have a low color intensity variation.Can see in the histogram graph, image 4, the intensity of the image is more directed to the value of 0 and has a higher number of frequencies at the value of image 0.
The captured image is an RGB image with a size of 160×120.The image then compressed using the Shannon-Fano algorithm and produces a compression ratio of 1.60778 with redundancy data of around 0.378 (38%).Excessive data generated in Figure 4.5 is lower than redundant data in Figures 6 and 8 because the captured image has more color intensity brown, white, and looks dark.The difference in length of bytes between the original image and the image compression results in Figure 9, which is 21771 bytes.The image contained in Figure 10 still included in the category of similar images because it has a low color intensity variation, namely red.It can see in the histogram graph, image five that the image intensity is more directed to the value of 0 and has a higher number of frequencies at the value of image 0.
The captured image is an RGB image with a size of 160 × 120.The image then compressed using the Shannon-Fano algorithm and produces a compression ratio of 1.41106 with redundancy data of around 0.291 (29%).The lower the color intensity of the compressed image, the higher the amount of redundant data that can generate.This will also affect the size of the compressed image which will be smaller than the original image.The difference in length of bytes between the original image and the image compression results in Figure 10 which is 16776 bytes.Complete data for similar images 1 to 4 (Figure 7 until 9) taken using a webcam can see in more detail in Table 1.Based on complete data on similar images that can see in table 1, it concluded that five similar images that have tested using the Shannon-Fano image compression application have an average compression ratio value of 52%.The compression results reach half of the original image.Any data is a series of bits 0 and 1.What distinguishes between data with other data is the size of the series of bits and how the 0 and 1 placed in the series of bits.For example, data in the form of text and data in the form of images, in-text data, a series of specific bits represents one character, whereas in image data a series of bits represents a color in one pixel.The more complex a data, the size of the required series of bits is more extended, thus the overall size of the data is also getting bigger [25], [26].

Product Testing
Testing is the process of running a program to find errors [27].Testing of image compression applications using the Shannon-Fano algorithm will tested using the Black Box testing method and measuring the accuracy of the results of the compression process with objective fidelity criteria (MSE), square error, absolute difference, signal to noise ratio (SNR), peak signal to noise ratio (PSNR), and Average of absolute difference.However, in image compression applications using the Shannon-Fano algorithm, only MSE objective truth criteria displayed in the text box.
Black-Box testing focuses on the functional requirements of the software that allows engineers to obtain a set of input conditions that will fully implement the functional requirements for a program [28].The blackbox testing method carried out in research to ascertain whether the function runs according to the input provided.The results of this test will get the suitability of the compressed image.In this test, several images will be captured using a webcam and then compressed to get a smaller size, and after that will be decompressed again to produce an output image where the decompressed image will produce the same size as the original image.Based on this stage, test data will be generated suitability to take pictures, compression, decompression, save, RD, and CR.Table 2 is a homogeneous image suitability testing table.Data on the original image and the results of decompression from images 1 to 5 have the same byte length that is 57600-bytes and stored in the file extension format.This is following what will generate in the test data.
The next test is to measure the truth criteria of the compression process with objective fidelity criteria.Criteria calculation is done with the following formula:

Figure 6 .
Figure 6.Results Display Image 1 Shannon-Fano Tree by Dividing by Frequency Approaching RecursivelyForming the Shannon-Fano Code, the Left Binary Tree Side labeled 0 (zero) and the right side 1 (one) than the original image.The difference in the length of the bytes between the original image and the resulting image in Figure6is huge at 48213 bytes.The image said to successfully compress if the measuring criteria are 0 (zero) because the Shannon-Fano algorithm uses a lossless compression technique.Figure6said to compress successfully because it gets the Mean Square Error value of 0 (zero).

Table 1 .
Summary Homogeneous Image Data with the Shannon-Fano Compression Technique

Table 2 .
Homogeneous Image Suitability Test Results