Abstract
The Block Wiedemann (BW) and the Block Lanczos (BL) algorithms are frequently used to solve sparse linear systems over GF(2). Iterative sparse matrix-vector multiplication is the most time consuming operation of these approaches. The necessity to accelerate this step is motivated by the application of these algorithms to very large matrices used in the linear algebra step of the Number Field Sieve (NFS) for integer factorization. In this paper we derive an efficient CUDA implementation of this operation using a newly designed hybrid sparse matrix format. This leads to speedups between 4 and 8 on a single GPU for a number of tested NFS matrices compared to an optimized multi-core implementation.
Keywords
Download to read the full chapter text
Chapter PDF
References
Aoki, K., Franke, J., Kleinjung, T., Lenstra, A.K., Osvik, D.A.: A Kilobit Special Number Field Sieve Factorization.. In: ASIACRYPT (2007)
Aoki, K., Shimoyama, T., Ueda, H.: Experiments on the linear algebra step in the number field sieve. In: Miyaji, A., Kikuchi, H., Rannenberg, K. (eds.) IWSEC 2007. LNCS, vol. 4752, pp. 58–73. Springer, Heidelberg (2007)
Bell, N., Garland, M.: Efficient Sparse Matrix-Vector Multiplication on CUDA. NVIDIA Technical Report NVR-2008-004, NVIDIA Corporation (December 2008)
Bell, N., Garland, M.: Cusp: Generic Parallel Algorithms for Sparse Matrix and Graph Computations, version 0.1.0 (2010), http://cusp-library.googlecode.com
Bonenberger, D., Krone, M.: Factorization of rsa-170. Tech. rep., Ostfalia University of Applied Sciences (2010), http://public.rz.fh-wolfenbuettel.de/~kronema/pdf/rsa170.pdf
Boyer, B., Dumas, J.G., Giorgi, P.: Exact Sparse Matrix-Vector Multiplication on GPU’s and Multicore Architectures. CoRR abs/1004.3719 (2010)
Choi, J.W., Singh, A., Vuduc, R.W.: Model-driven autotuning of sparse matrix-vector multiply on GPUs. SIGPLAN Not. 45, 115–126 (2010)
Coppersmith, D.: Solving Homogeneous Linear Equations Over GF(2) via Block Wiedemann Algorithm. Mathematics of Computation 62 (1994)
Gaudry, P., et al.: CADO-NFS (2010), http://cado-nfs.gforge.inria.fr/
Hwang, W., Kim, D.: Load Balanced Block Lanczos Algorithm over GF(2) for Factorization of Large Keys. In: HiPC, pp. 375–386 (2006)
Kleinjung, T., Nussbaum, L., Thomé, E.: Using a grid platform for solving large sparse linear systems over GF(2). In: 11th ACM/IEEE International Conference on Grid Computing (Grid 2010), Brussels Belgique (October 2010)
Kleinjung, T., et al.: A Heterogeneous Computing Environment to Solve the 768-bit RSA Challenge. Cluster Computing (2010)
Kleinjung, T., et al.: Factorization of a 768-Bit RSA Modulus. In: International Crytology Conference, pp. 333–350 (2010)
Monakov, A., Lokhmotov, A., Avetisyan, A.: Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures. In: HiPEAC, pp. 111–125 (2010)
Montgomery, P.L.: A Block Lanczos Algorithm for Finding Dependencies Over GF(2). In: Theory and Application of Cryptographic Techniques, pp. 106–120 (1995)
Nickolls, J., Buck, I., Garland, M., Skadron, K.: Scalable Parallel Programming with CUDA. Queue 6, 40–53 (2008)
Nickolls, J., Dally, W.J.: The GPU Computing Era. IEEE Micro. 30, 56–69 (2010)
Papadopoulos, J.: Msieve (2010), http://sourceforge.net/projects/msieve/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schmidt, B., Aribowo, H., Dang, HV. (2011). Iterative Sparse Matrix-Vector Multiplication for Integer Factorization on GPUs. In: Jeannot, E., Namyst, R., Roman, J. (eds) Euro-Par 2011 Parallel Processing. Euro-Par 2011. Lecture Notes in Computer Science, vol 6853. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23397-5_41
Download citation
DOI: https://doi.org/10.1007/978-3-642-23397-5_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23396-8
Online ISBN: 978-3-642-23397-5
eBook Packages: Computer ScienceComputer Science (R0)