A Novel Compressing a Sparse Matrix using Folding Technique

There are many application problems that emerge in the areas of engineering simulations, scientific computing, information retrieval and economics which use matrixes where non-zero elements are a significant minority with less than 10%. These are universal in many mathematical and scientific applications. These matrixes enable the reduction of storage and computational requirements by storing and carrying out arithmetic with, only the non-zero elements. This is the sparse matrix which must be compressed for these applications. The sparse matrix compression represents non-zero matrix entries. This study presents a novel algorithm for compressing a sparse matrix, which involves three steps. Firstly it involves the division of sparse matrix into sub-matrixes; secondly conducting several transformations; finally coding them. The novel algorithm is called folding. The compressed matrix reduces memory requirement with a good rate compared with the original sparse matrix. C++ is used in the implementation of this algorithm.


INTRODUCTION
Sparse matrixes are normally of considerable size with comparatively limited non-zero elements unlike dense matrixes that possess non-zero elements on almost all positions of the matrix (Kourtis, 2010).Sparse matrixes are present in several applications such as structural analysis, computational fluid dynamics, modeling in economics, numerical analysis, numerical optimization, statistical modeling, analysis of power network, electromagnetics, meteorology, medicallyrelated imaging, data mining, finite-element simulations, systems that support decision making in management practice, simulating of circuits, retrieving of information and many other applications (Farzaneh et al., 2009).In most applications involving sparse matrixes the size of the matrix is very large.Storing such large matrixes is impossible, even on super computers.Besides, most of the storage and calculations would be wasted on zeros (Stanimirović and Tasić, 2009).The storage and computational requirements of these matrixes can be minimized by different approaches (Stanimirović and Tasić, 2009;Kourtis, 2010;Neelima and Raghavendra, 2012;Farzaneh et al., 2009).Compression can be considered as trading data for computation: which leads to a reduction in data and higher computational over head and more space.There have been several significant developments in sparse matrix computations in recent years and some of them have involved the compression of sparse matrix by minimizing its zero elements which are Compressed Row Storage, the Jagged Diagonal Formator Compressed Diagonal Storage format (Kourtis, 2010;Farzaneh et al., 2009).As each method depends on the benefits derived from the characteristic of a particular sparse matrix, hence results vary in levels of space efficiency.Sparse matrixes are operated by direct utilization of their storage formats which should offer economy of storage and computational activity.Sparse matrix has the advantage over dense matrix in that it can handle substantial problems that dense matrix is incapable.Sparse matrixes comprise structured and unstructured types.In the case of a structured matrix its non-zero entries have a consistent pattern along a few diagonals.Also, it could comprise the non-zero elements in blocks such as similarly-sized dense sub matrixes, which shape a consistent pattern along a few block diagonals.In comparison, an irregularly structured matrix has irregular entries (Kourtis, 2010).In this study a novel proposal to compress large sparse matrix into a dense matrix is introduced.This proposal can work on structured and unstructured sparse matrixes.Early results indicate that this proposed algorithm possesses excellent results that are related to memory requirement.

LITERATURE REVIEW
The related works are as follows: Firstly: the initial works relate to the improved application performance that is sparse-based on different representations numbering more than thirteen.Compressed Sparse Row  (Farzaneh et al., 2009).
Compressed Sparse Row (CSR) was a proposal of A. Brameller and D.J. Rose, which has beena widelyused method to store very sizeable sparse matrixes.CSR provides storage for the sparse matrixes arranged in a series of sparse vectors (one for each row) and permits random access to whole rows.In specific terms, the storage of the matrix is in three arrays: values, row_ptr and col_ind.The values array provides storage for the non-zero elements of the matrix in row-major order, where as the other two arrays provide storage for indexing information: row_ptr containing the location of the first (non-zero) element of each row within the values array and col_ind containing the column number for every non-zero element.A sample of the CSR format is shown in Fig. 1 (Farzaneh et al., 2009;Kourtis, 2010;Stanimirović and Tasić, 2009).
Compressed Sparse Column (CSC) is comparable to CSR, except for the consecutive storage in columns of the non-zero elements (Stanimirović and Tasić, 2009;González-Domínguez et al., 2013).
Coordinate Format (COO) is a very simple sparse storage format.In CCO the compression of a sparse matrix is directly transformed from the dense format which retains the non-zero elements together with their corresponding indices in their matrix location.For example, the COO format for a vector is referred to as a compressed sparse vector or just sparse vector, within which the non-zeroes are maintained contiguously in an array val and the indices of these elements are maintained in another array ind.This means that val[i] maintains the element in position ind [i].An instance of the COO format is shown in Fig. 2 (Vazquez et al., 2009;Kourtis, 2010).

Secondly:
The related works that try to enhance sparse matrixes in different direction; some of them are mentioned as follows: A new format of the sparse matrix for representation is produced.This format is subjected to graphics processor architecture and gives 2x to 5x better performance than Compressed Row Format (CSR) and Coordinate Format (COO).It is also 3x to 10x better in performance in comparison with CSR vector format.Furthermore it provides 10% to 133% improvement in transferring memory (of only access information of sparse matrix) between CPU and GPU (Neelima and Raghavendra, 2012).
A new coefficient and method for storage coefficient of a large sparse matrix was presented.This method is simple and in expensive.The aim of this technique is to decrease the storage of substantial nonsymmetric sparse matrixes.Consequently it is shown that the suggested technique is significantly inexpensive in comparison with other available techniques including Coordinate format, Compressed Sparse Row (CSR) format and Modified Sparse Row (MSR) format (Farzaneh et al., 2009).
Comprehensive comparison and evaluation of the storage efficiency for various sparse matrices storage such as CSR, Compressed Sparse Column CSC and COO are presented.The performance results of matrixvector multiplication using these storage formats are also presented (Stanimirović and Tasić, 2009).

PROPOSED FOLDING TECHNIQUE
When and where this study was conducted: 2017 -Iraq-Kerbala university.
This proposed Folding Technique consists of three steps: • Dividing sparse matrix into four quarters equivalent to sub matrices • Applying transformation based on permutation concept with the sub matrices • Coding the sub matrices.However, before describing the proposal, the three steps are explained in detail in the following sections.
Dividing sparse matrix into four quarters equivalent to sub matrixes: In this step sparse matrix is divided into four quarters named as A, B, C, D. and all dimensions of quarts are equal.In this step all the following cases are taken into considering: • If the sparse matrix is square and has odd dimensions; one zero column and row are added.Mathematically this can be described as: where where • If the sparse matrix is not square, the following action is performed: o Suggest sparse matrix has dimensions 30, 50; 50 minus 30 is performed which is equal to 20. o Add 20 zero rows to the matrix then divide it to four quarters as described above Mathematically this can be described as: where Applying transformations based on the premutation with the sub matrixes: A permutation is the simple exchange in the positions of elements within a message, vector and matrix.Mathematically, a permutation process generates a permutation of the input data, that is, the data is simply rearranged.For example, the group of all permutations of n elements is referred to as the symmetric group ܵ and it is not difficult to verify that there are ݊! Permutations in ܵ (Davis, 2003;Mulholland, 2013).In this proposal, permutation is conducted on the elements of four sub-matrixes as transformation.Briefly, we will explain these transformations on one of the sub-matrixes called A.
In mathematical notation, consider: Is square matrix and P is resultant matrix after conducting transformations on the elements of ‫.ܣ‬ Four transformations were conducted as follows: This transformation replaces all the rows of a given matrix with columns and vice-versa as follows: This transformation reverses the order of the columns.First column will be last column and viceversa as follows: This transformation reverses the order of the rows.First row will be last row and vice-versa as follows: This transformation reverses the order of the rows and columns.First row will be last row and vice-versa as well as the first column will be last column as follows: Coding the sub matrixes: Coding involves the organization and categorization of data.Codes offer an option for labeling, compiling and organizing data, which can be carried out in several ways but normally involves categorizing words, phrases, numbered or symbols in various appropriate coding categories systematically (Baralt, 2012).In this proposal coding is defined as follows: Give code value to every three symmetric elements: ܽ , ܾ , ܿ , ݀ are zeroes and one of ܽ , ܾ , ܿ , ݀ is not zero for ݅, ݆ = 1, 2, … … , ݊.

Describing the folding technique:
In this technique the sparse matrix is divided into four sub-matrixes as described in above Section.Coding procedure will be conducted on the elements of the sub-matrixes, of which at least every three elements must equal to zero as described in above Section, otherwise use transformations to change positions of the elements in each sub-matrix as described in above section.This transformation which satisfies the case that at least three of the symmetric elements ܽ , ܾ , ܿ , ݀ are zeroes for ݅, ݆ = 1, 2, … … , ݊. Table 1 describes a part from the transformations which is applied on four submatrixes.Then coding procedure will be conducted to get new coded matrix.The symbol "---" in Table 1 means other transformations exist between row 19 and row 51, as well as between row 57 and row 81.Folding Algorithm is described in algorithm 1. Unfolding algorithm retrieves the original sparse matrix from compressed matrix as described in algorithm 2.
2. Output: Y is coded array of X, T is a list of transformer numbers and D1 is a list of dimensions of array of each division.3. Divide array X into 4 quarters, each quarter is a square array and all of them have same size as described in Section above.4. Store first quarter in array A, second quarter in array B, third quarter in array C and last quarter in array D. 5. Find the transformation that can be applied to all elements in arrays A, B, C and D. 6.If the transformation exists, encode the elements in arrays A, B, C, D using encoding procedure, set X=Y, store the transformation number in list T, store the dimension of Y in list D1. 7. If the transformation does not exist, end algorithm.8. Go to 3.

EXPERIENTIAL RESULTS
In this section several experiments are carried out to compress sparse matrix with different dimensions as follows: Figure 3 shows part of sparse matrix with dimension 599x599, where as Fig. 4 to 7 show the first to fourth divisions with dimensions 300x300, 150x150, 75x75 and 38x38 respectively.In Fig. 8 the dimensions  of the fifth and sixth divisions were 19x19 and 10x10 respectively.Figure 9 shows the sixth and seventh divisions with dimensions 10x10 and 5x5 respectively.The number of transformations applied was 1 in all divisional stages except the seventh division which was 5.The compress matrix was the matrix in the seventh division with dimension 5x5.
Figure 10 shows part of sparse matrix with dimension 1000x1000, while Fig. 11 to 15 shows the first to fifth divisions with dimensions 500x500, 250x 250, 125x125, 63x63, 32x32 respectively.Figure 16 illustrates the sixth and seventh divisions which have dimensions 16x16 and 8x8 respectively.The number   Briefly the sparse matrix with dimension 3000 x 3000 is present only in the two last divisions where the compressed matrix resulted from the ninth division with dimension of 6x6 as shown in Fig. 17.
From the above results it can be noticed that any matrix can be compressed with any dimension.

CONCLUSION AND RECOMMENDATIONS
The research addressed the limitation of the sparse matrix which utilizes a large memory to store numerous zero elements.It is unsuitable for small devices with limited memory.The novel algorithm should satisfy the important requirement which is reducing memory requirement.The sparse matrix requires memory 4.76 MB with dimension 1000 x 1000 while after compressing it requires memory 400 bytes only.Also the sparse matrix requires 6 MB with dimension 3000 x 3000 but after compressing it requires 264 bytes only.It can be shown the memory requirements decreased when the size of the sparse matrix is compressed.
Future research could investigate the reduction of memory requirements and overhead in computation by compressing the sparse matrix through one division only.

Fig
Fig. 1: CSR format The following procedure illustrates the coding operation.Procedure coding elements of matrixesInput: a, b, c and d as integer elements Output: x as integer value if a, b, c and d are equal 0 then set x = 0 If a, b and c are equal 0 and d greater than 0 then set x = 4*(d-1) +1 If a, b and d are equal 0 and c greater than 0 then set x = 4*(c-1) +2 If a, c and d are equal 0 and b greater than 0 then set x = 4*(b-1) +3 If b, c and d are equal 0 and a greater than 0 then set x = 4*(a-1) +4

Fig. 17 :
Fig. 17: Eighth and ninth divisions of the transformation applied was 1 in all divisional stages.The compress matrix was the matrix in the seventh division with dimension of 8x8.Briefly the sparse matrix with dimension 3000 x 3000 is present only in the two last divisions where the compressed matrix resulted from the ninth division with dimension of 6x6 as shown in Fig.17.From the above results it can be noticed that any matrix can be compressed with any dimension.

Table 1 :
Part of the transformations on the sub-matrixes No. Transformations 1