Implementation of new secured data compression technique using huffman code and symmetric key algorithm

Data compression is a common requirement for most of the computerized applications. There are many number of unsecured data compression algorithms, which are dedicated to compress different unsecured data formats. Even for a single data type there are number of different compression algorithms, which use different approaches. In this research, we propose a simple and efficient data compression algorithm particularly suited to be used on available commercial basis using secured manner. Our intention is transmitting text data in secured as well as compressed in the open environment. It is using double compression technique based on Huffman coding algorithm and simple symmetric key algorithm. Experiment itself evaluates the performance of new secured data compression algorithm with other data compression algorithm. © 2015 Elixir All rights reserved. Elixir Comp. Sci. & Engg. 79 (2015) 30528-30531 Computer Science and Engineering Available online at www.elixirpublishers.com (Elixir International Journal)


Introduction
Compression technique used for the purpose of utilization of storage space is important even with today's huge storage volumes [8]. Data compression has been adopted in hardware designs to improve performance and power. Cache compression increases the cache capacity by compressing block data and accommodating more blocks in a fixed space [5], [6]. It is the art of representing the information in a compact form rather than its original or uncompressed form [9]. In other words, using the data compression, the size of a particular file can be reduced. This is very useful when processing, storing or transferring a huge file, which needs lots of resources. If the algorithms used to encrypt works properly, there should be a significant difference between the original file and the compressed file. When data compression is used in a data transmission application, speed is the primary goal. Speed of transmission depends upon the number of bits sent, the time required for the encoder to generate the coded message and the time required for the decoder to recover the original collection. In a data storage application, the degree of compression is the primary concern [1]. Various lossless data compression algorithms have been proposed and used. Some of the main techniques in use are the Huffman Coding, Run Length Encoding, Arithmetic Encoding and Dictionary Based Encoding [3].
Symmetric or secret key cryptography, a single key is used for both encryption and decryption. Sender uses the key using some set of rules to encrypt the plaintext and sends the ciphertext to the receiver. The receiver applies the same key or rule set to decrypt the message and recover the plaintext. Because a single key is used for both functions, secret key cryptography is also called symmetric key algorithm. The biggest difficulty with this approach, of course, is the distribution of the key [2].
This algorithm is shown to be the best solution currently available in all situations, including archivers, distribution, and on-line compression such as disk compression or network datagram compression [7]. Related Works David A. Huffman in the year 1952 proposed an Encoding Algorithms use the probability distribution of the alphabet of the source to develop the code words for symbols. The frequency distribution of all the characters of the source is calculated in order to calculate the probability distribution. According to the probabilities, the code words are assigned. Shorter code words for higher probabilities and longer code words for smaller probabilities are assigned. For this task a binary tree is created using the symbols as leaves according to their probabilities and paths of those are taken as the code words. Two families of Huffman Encoding have been proposed: Static Huffman Algorithms and Adaptive Huffman Algorithms. Static Huffman Algorithms calculate the frequencies first and then generate a common tree for both the compression and decompression processes [2]. Details of this tree should be saved or transferred with the compressed file. The Adaptive Huffman algorithms develop the tree while calculating the frequencies and there will be two trees in both the processes. In this approach, a tree is generated with the flag symbol in the beginning and is updated as the next symbol is read. Glen G. Langdon, Jr. in 1984 discussed about An Introduction to Arithmetic Coding. In this paper presents the key notions of arithmetic compression coding by means of simple examples Arithmetic coding is a data compression technique that encodes data (the data string) by creating a code string which represents a fractional value on the number line between 0 and 1. The coding algorithm is symbol wise recursive; i.e., it operates upon and encodes (decodes) one data symbol per iteration or recursion. On each recursion, the algorithm successively partitions an interval of the number line between 0 and I , and retains one of the partitions as the new interval. Thus, the algorithm successively deals with smaller intervals, and the code string, viewed as a magnitude, lies in each of the nested Implementation of new secured data compression technique using huffman code and symmetric key algorithm

A B S T RA CT
Data compression is a common requirement for most of the computerized applications.
There are many number of unsecured data compression algorithms, which are dedicated to compress different unsecured data formats. Even for a single data type there are number of different compression algorithms, which use different approaches. In this research, we propose a simple and efficient data compression algorithm particularly suited to be used on available commercial basis using secured manner. Our intention is transmitting text data in secured as well as compressed in the open environment. It is using double compression technique based on Huffman coding algorithm and simple symmetric key algorithm. Experiment itself evaluates the performance of new secured data compression algorithm with other data compression algorithm.
intervals. The data string is recovered by using magnitude comparisons on the code string to recreate how the encoder must have successively partitioned and retained each nested subinterval. Arithmetic coding differs considerably from the more familiar compression coding techniques, such as prefix (Huffman) codes. Also, it should not be confused with error control coding, whose object is to detect and correct errors in computer operations. [2] S.R. Kodituwakku, U. S.Amarasinghe, An experimental comparison of a number of different lossless compression algorithms for text data is carried out. Several existing lossless compression methods are compared for their effectiveness. Although they are tested on different type of files, the main interest is on different test patterns. By considering the compression times, decompression times and saving percentages of all the algorithms, the Shannon Fano algorithm can be considered as the most efficient algorithm among the selected ones. Those values of this algorithm are in an acceptable range and it shows better results for the large files [1]. Prakash Kuppuswamy, Dr. Saeed Q Y Al-Khalidi proposed Implementation of security through simple symmetric key algorithm based on modulo 37 in October 2012 proposed new symmetric key algorithm. Encryption and key generation became a vital tool for preventing the threats to data sharing and tool to preserve the data integrity so we are focusing on security enhancing by enhancing the level of encryption in network. This study's main goal is to reflect the importance of security in network and provide the better encryption technique for currently implemented encryption techniques in simple and powerful method. In our research we have proposed a modular 37 and select any number and calculate inverse of the selected integer using modular 37. The symmetric key distribution should be done in the secured manner. Also, we examine the performance of our new SSK algorithm with other existing symmetric key algorithm. [10] Proposed Technique One of the effective tools for ensuring the safety of compressed data transactions is the secured encryption techniques. It combines the Huffman encoding technique and simple symmetric algorithm. The proposed method of data compression technique focuses on the data confidentiality issue. Although security mechanisms, this method is very easy to adopt the coding of bulk and more compressed secured data. Also it is very safe enough on the other side. The tools for designing methods were as follows a. Huffman Code Huffman Code assigns shorter encodings to elements with a high frequency, F:e. It differs from block encoding in that it is able to assign codes of different bit lengths to different elements. Elements with the highest frequency, F:e, get assigned the shortest bit length code. The key to decompressing huffman code is a huffman tree.

b.Huffman tree
A huffman tree is a special binary tree called a trie. A binary trie is a binary tree in which a 0 represents a left branch and a 1 represents a right branch. The numbers on the nodes of the binary trie represent the total frequency, F, of the tree below. The leaves of the trie represent the elements, e, to be encoded. The elements are assigned the encoding which corresponds to their place in the binary trie.

c. Inverse function
An inverse of a matrix, usually written as f -1 (x), is a reflection of the original function, f(x), around the line y = x. Basically, every x value is changed to a y value and every y value is change to an x value.

d. Modular Arithmetic
Modular arithmetic over a number 'n' involves arithmetic operations on integers between 0 and n -1, where n is called the modulus. If the number happens to be out of this range in any of the operation the result, r, is wrapped around in to the range 0 and n -1 by repeated subtraction of the modulus n from the result r. This is equivalent in taking the remainder of division operation r/n.

e. Selecting random positive and negative integer
The reason for selecting the random positive and negative integer to send the data compressed and secured. The random integer should satisfy (1≤ x ≤ 37) because we need inverse of the selected random integer for the purpose of decryption technique.

Encoding sequence
Step 1: Find out the element frequency from the given message Step 2: Assign Huffman code Step 3: Assign decimal value for the Huffman code Step 4: Assign n=37 (prime number) Step 5: Take random positive integer which satisfy mod 37;(x*x -1 )=1 Step 6: Again take random negative integer for more securing Step 7: multiply with the decimal value and selected positive, negative numbers Step 8: Use mod 37 Step 9: Use again Huffman frequency code Step 10: Now derived code is secured encoded message

Implementation
In order to provide quick and simple data compression/decompression, the bits size of the secret key has to be chosen effectively. For compression small amount of data, there should not be any overhead to the encrypting system as well as there should not be any compromise on the security level. Thus an optimized size of data "DAD BAD CAB CAFE" is chosen for experiment.

Result Analysis
The proposed method of Data compression technique is the combination of the Huffman coding and symmetric key algorithm. More number of data transferring daily across the world. All the data transaction is not secured. Some of the data transferring method searching for secured transaction using various cryptography and data security algorithm. The other methods, looking for the new compression technique for bulk data transaction. This proposed new method of secured data compression technique, which will satisfy both the type of user.
The algorithm executes on PC computer of CPU Intel Pentium 4, 2.2 MHz Dual Core. The programs implemented using Microsoft Visual Studio 2008 (C#). It is tested with three messages and with different in length (1000, 2000, 3000 characters).

Conclusion
It has been clear that the result of our "new proposed technique" is better result producing as compared normal and Huffman coding. It is new technic of compressing data with secured manner. It is essential to achieve few goals like confidentiality and integration across the data transaction between the medium. The proposed compression technique is very simple in nature and there are two compressing methods present in this compression algorithm. So, It would make it more secured. For large amount of data transaction and commercial communication purpose this algorithm will work very smoothly. The proposed compression technique wouldn't be cost effective since those are not designed for large amount of data in minimal cost.