AN EFFICIENT STORAGE FORMAT FOR LARGE SPARSE MATRICES

In this paper we consider linear system Ax = b where A is a largesparse matrix. A new e¢ cient, simple and inexpensive method for storage of coefficient matrix A was presented. The purpose of this method is to reduce thestorage volume of large non-symmetric sparse matrices. The results shows thatthe proposed method is very inexpensive in comparison with current methodssuch as Coordinate format, Compressed Sparse Row (CSR) format and Modified Sparse Row (MSR) format


Consider a linear system
Ax = b (1.1) where A is a large random nonsingular sparse matrix of the order n n, and b is given column vector of order n. Such systems of linear equations are frequently encountered in almost all scienti…c and engineering applications. For instance, sparse matrices appear in various applications including structural analysis, computational ‡uid dynamics, economic modeling, …nancial analysis, numerical optimization, statistical modeling, power network analysis, electromagnetic, meteorology, medical imaging, data mining, …nite-element simulations, decision support systems in management science, circuit simulations, information retrieval and many more. A number of signi…cant advancements in sparse matrix computations have been made in recent years [1,8].
The irregular nature of sparse matrix-vector multiplication, Ax = b, has led to the development of a variety of compressed storage formats, which are widely used because they do not store any unnecessary elements. In this paper we introduce a new method for storage of matrix A which we called it Compressed Sparse Vector (CSV) format. We show that storage volume and computational cost in CSV format is less than other existent methods such as Coordinate, CSR and MSR storage formats. The CSV format can be used for all arbitrary sparse matrices such as non-square matrices.
The outline of the paper is as follows. First, three popular storage format (Coordinate, CSR and MSR formats) illustrated in the next section. In section 3, the CSV format for storing the matrix A is given. In section 4, advantages of CSV storage method is described brie ‡y, and …nally, numerical examples are given to illustrate performance and e¤ectiveness of the new method in section 5. We present a case study of a 5 5 sparse matrix to show the data structures and the algorithm to storage coe¢ cient matrix of Ax = b using the CSV format.

Storage Schemes
In many scienti…c computations the manipulation of sparse matrices is considered the crux of the design. Generally the non-zero elements in a sparse matrix constitutes a very small percentage of data. This irregular nature of sparse matrix problems has led to the development of a variety of compressed storage formats [1, 3 -6, 9-11]. There are more than thirteen di¤erent storage formats for coe¢ cient matrix A [2]. which are widely used. The Coordinate format, Compressed Sparse Row (CSR) and Modi…ed Sparse Row (MSR) formats are three important storage methods which have been widely used in most sources [2 -11].

The Coordinate Storage Format.
The simplest storage scheme for sparse matrices is the so-called Coordinate format. The data structure consists of three arrays: (1) AA a real array containing all the real or complex values of the nonzero elements of A in any order; (2) JR an integer array containing their row indices; and (3) JC a second integer array containing their column indices. All three arrays are of length N z, the number of nonzero elements [3].
For example let A be an square matrix of the order 5 5 In this example, the elements are listed in an arbitrary order. But, they are usually listed by row or columns.

The CSR Storage Format.
The CSR format was originally suggested by A. Brameller [3] and D. J. Rose [4]. This format is the most popular scheme for storing large sparse matrices. In the above example (coordinate format), if the elements were listed by row, the array JC which contains redundant information might be replaced by an array which points to the beginning of each row instead. This would involve non-negligible saving in storage. Storing given matrix A with a CSR scheme requires three one-dimensional arrays AA, JR, and JC of length N z, N z, and n + 1 respectively, where n is the number of rows and N z is the total number of nonzero elements in the matrix A.
The array AA contains the non-zero elements of A stored row-by-row, JR contains the column indices which correspond to the non-zero elements in the array AA, and JC contains n + 1 pointers which delimit the rows of non-zero elements in the array AA, as illustrated below.
Consider matrix (2), this matrix will be represented by: The Modi…ed Sparse Row (MSR) format has only two parallel arrays of equal length (N z+1): A real array AA and an integer array JA. The …rst n position in AA contains the diagonal elements of the matrix in order. The position n + 1 of the array AA is not used, but may sometimes be used to carry other information concerning the matrix. Starting at position n + 2, the non-zero elements of AA excluding its diagonal elements, are stored by row. For each element AA(k), the integer JA(k) represents its column index on the matrix. The n + 1 …rst position of JA contains the pointer to the beginning of each row in AA and JA. Thus for Matrix (2), the two arrays will be as follows: The star denotes an unused location. Notice that JA(n) =JA(n + 1) = 14, indicating that the last row is a zero row, once the diagonal element has been removed [2]. The restriction of MSR method is that principal diagonal element of coe¢ cient matrix must be non-zero [2].

The Compressed Sparse Vector (CSV) Storage Format
Now we introduce a new e¢ cient method which like MSR format has two arrays but can be used for storing sparse matrices with arbitrary sparsity patterns. First we consider the following method which is the main idea for the Compressed Sparse Vector (CSV) format. Suppose A is a non-square m n large sparse matrix. We consider two arrays as AA and IA of length N z + 1, where N z is the number of non-zero elements in A. The …rst n position in the array AA contains the non-zeroes of A stored row-by-row and the …rst n position in IA contains the indices of non-zero elements that results from row counting indexing which assigns a number for any element and only saves the indices of non-zero elements in IA and non-zero elements in AA; and m and n are the last elements in AA and IA respectively.
In order to overcome this problem, we implement new version of indexing called Compressed Sparse Vector (CSV) format, in which we start indexing from the …rst element of matrix, a 11 , by considering two states as follows: Case 1. If a 11 = 0, it takes No.1 as index and we go to the next element, and counting continues till the …rst non-zero element, then we store coupled non-zero element and its associated index, and indexing continues starting from No.1 from next element until the last non-zero element is received. Case 2. If a 11 6 = 0, it takes No.1 as index then we store this non-zero element and its related index as …rst coupled, and indexing continues from next element starting from the number 1 as described in Case 1 and Case 2. The new data structure has two arrays with the following function: AA a real array of the length N z + 1, that the …rst N z element in AA is set aside to store non-zero elements of matrix A of any order and the position N z + 1 indicates the number of rows.
IA an integer array of the length N z + 1, that the …rst N z element contains the indices which correspond to the non-zero elements in the array AA and the element N z + 1 stands for the number of columns. For example compressed format of matrix (2) using CSV format is as bellow: The pseudocode of the CSV storage method is given as follows: 3.

Advantages of CSV format
The CSV storage method with a very simple algorithm has special advantages. Here we illustrate some advantages of this format.

Less Storage Volume.
Considering the construction process of arrays AA and IA, storing volume of this method has been reduced considerably in comparison with other methods. Restarting indices values in the CSV, after passing each nonzero element has great e¤ect in reducing storage volume of array IA. For instance in example 2, the results of Table 2 show that for a hepta-diagonal matrix of order 2000 2000, 7:62 MB of space is needed if the matrix stored with all its zero and non-zero elements, if we use coordinate, CSR and MSR storage methods, the …le sizes reduces to 148:0 KB, 98:1 KB and 89:1 KB respectively, but a considerable decrease in required volume appears using CSV storage format, with storage volume of only 60:5 KB.
More illustrated examples have been given in numerical results.

Ease of Transpose Matrix Calculation.
Calculating of transpose matrix A in current methods has di¢ culties and needs more computation. For example Coordinate format needs 3 N z operations to calculate transpose matrix from the compressed format. But in CSV format we only replace the elements N z + 1 in both arrays AA and IA with together, and after that we change counting method. This can be done by changing the place of the last elements in AA and IA and counting in the opposite way of the one that has been used to create the compressed matrix; meaning that if we used row counting for the compressed matrix we should use column approach for the transpose matrix.
For example consider matrix (2), A T can be calculated as follows: Indexing algorithm of the CSV method shows that this method is very simple, needing a decreased number of operations in the computing and retrieving processes which makes it a quick-yielding method.

Broad Range for Storage Sparse Matrices.
Considering the size of the compressed matrix in the CSV storage method, broader range of matrices can be represented by this method. Results in Table 4 show that the performance of the CSV method in dense sparse matrices is better than the others. For instance, consider matrix No.3 of the order 200 200 in the Table 4, storage volume of the compressed matrix of all the methods that used here, except the CSV method, are more than the storage volume of the original matrix.

Numerical Examples
In this section we tested the general tridiagonal, hepta-diagonal, random and dense sparse matrices with di¤erent dimensions. The CSV method has been compared with the Coordinate, CSR and MSR methods. Table 1 and Table 2 contain results obtained using these methods for tridiagonal and hepta-diagonal matrices. Also in Table 3, we examined these methods on random matrices and Table 4 is for dense matrices created by MATLAB 1 .
In the Tables 1-4 the second column is for dimensions of matrices, the next column represent the number of non-zero elements, and "primary" column is for size of matrix that stored with its zero and non-zero elements, Furthermore the columns of "Coordinate", "CSR" 2 , "M SR" and "CSV ", represent the storage volume of the compressed matrices of relevant storage methods. Example 1. In this example three tridiagonal matrices of orders 1000, 2000, 3000 were compressed with Coordinate, CSR, MSR and CSV storage formats. Here, again, we used Coordinate, CSR, MSR and CSV storage methods for compressing sparse matrices, which have been tested on the following three randomly selected matrices. The systems of linear equations with the size of 5000, 6000 and 7000 were considered. The results are reported in Table 3.  Table 3: Results for random Matrices   Example 4. We, also, tested Coordinate, CSR, MSR and CSV storage methods for compressing dense sparse matrices on the following three dense random matrices with approximately 50 percent of non-zero elements. The systems of linear equations with the size of 100, 150, and 200 were considered. Results of this example show that the CSV storage format keeps its e¤ectiveness for storage of dense matrices in comparison with above formats. The results are reported in Table 4

Conclusion
A new storage method for large sparse matrices was presented in this paper. This new method which we called it Compressed Sparse Vector (CSV) format, for storage of coe¢ cient matrix A of linear system (1), has been based on row counting indexing, in CSV method, growing rate of indices values has been controlled by restarting indices after passing each non-zero element. In this work we considered the case when matrix A is multi-diagonal (tri-and hepta-diagonal) and also the case of random matrices. Results show that storage compaction in this new method is better than other methods. Also, we showed that calculating of transpose of matrix A is very simple without any computation cost. Furthermore, we can conclude that application of CSV method for representing sparse matrices not only reduces the storage volume of the compressed matrix, but also it increases the speed of the computers in practice. Also, using this method is suitable for dense sparse matrices, therefore, a broad range of sparse matrices could be compressed. Thus, since memory is an issue, the method's low storage requirements provide a means to tackle very large problems which would otherwise be out of reach.