ABSTRACT
Fast & efficient computing of web rank scores is a necessary issue of search engines today. Because of the enormous size of data and the dynamic nature of World Wide Web, this computation is generally executed on large web graphs (to billions webpages) and requires refreshing quite often, so it becomes a challenging task. In this paper, we propose an efficient method for computing PageRank score -- a Google ranking method based on analyzing the link structure of the Web on graphics processing units (GPUs). We have employed a slightly modification of a storage data format called binary 'link structure file' which inspirited from [2] for storing the web graph data. We then divided the PageRank calculating phases into parallel operations for exploiting the computing power of the graphics cards. Our program was written in CUDA language to experiment on a system equipped two double NVIDIA GeForce GTX 295 graphics cards, using two real datasets which were crawled from Vietnamese sites containing 7 million pages, 132 million links and 15 million pages, 200 million links, respectively. The experimental results showed that the computation speed increase from 10 to 20 times when compared to a CPU Intel Q8400 at 2.67 GHz based version, on both datasets. Our method can also scale up well for larger web graphs.
- S. Brin and L. Page. 1998. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the 7th WWW Conference. Google ScholarDigital Library
- A. Rungsawang and B. Manaskasemsak. 2004. Parallel PageRank Computation on a Gigabit PC Cluster. In Proceedings of the 18th International Conference on Advance Information Networking and Application. Google ScholarDigital Library
- A. Rungsawang and B. Manaskasemsak. 2003. PageRank computation using PC cluster. In Proceedings of the 10th European PVM/MPI User's Group Meeting.Google Scholar
- A. Rungsawang and B. Manaskasemsak. 2004. An Efficient Partition-Based Parallel PageRank Algorithm. In Proceedings of the 11th International Conference Parallel and Distributed Computing. Google ScholarDigital Library
- K. Sankaralingam, S. Sethumadhavan and J. C. Browne. 2003. Distributed PageRank for P2P system. In Proceedings of the 11th IEEE HPD'03 Conference. Google ScholarDigital Library
- Amy N. Langville and Carl D. Meyer. 2006. Google's PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, 41 William Street, Princeton, New Jersey, 2006, p. 31--46. Google ScholarDigital Library
- Nathan Bell and Michael Garland. 2008. Ecient Sparse Matrix-Vector Multiplication on CUDA. NVIDIA Technical Report.Google Scholar
- Xintian Yang, Srinivasan Parthasarathy, P. Sadayappan. 2011. Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining. Proceedings of the VLDB Endowment, Vol. 4, No. 4. Seattle, Washington. Google ScholarDigital Library
- Praveen K., Vamshi Krishna K., Anil Sri Harsha B., S. Balasubramanian, P. K. Baruah. 2011. Cost Efficient PageRank Computation using GPU. IEEE International Conference on High Performance Computing (HiPC), Student Research SymposiumGoogle Scholar
- Tianji WU, Bo WANG, Yi SHAN, Feng YAN, Yu WANG and Ningyi XU. 2010. Efficient PageRank and SpMV Computation on AMD GPUs. 39th International Conference on Parallel Processing, DOI 10.1109, p. 81--89 Google ScholarDigital Library
- Ali Cevahir, Cevdet Aykanat, Ata Turk, B. Barla Cambazoglu, Akira Nukada and Satoshi Matsuoka. 2010. Efficient PageRank on GPU Clusters. IPSJ SIG Technical Report, Vol. 2010-HPC-128.Google Scholar
- Chebyshev distance. http://en.wikipedia.org/wiki/Chebyshev_distanceGoogle Scholar
- M. Harris. 2007. Parallel Prefix Sum (Scan) with CUDA. NVIDIA Corporation.Google Scholar
- CUDA zone, http://www.NVIDIA.com/object/cuda_home_new.htmlGoogle Scholar
- NVIDIA, 2009 "NVIDIA CUDA Programming Guide 3.0".Google Scholar
Index Terms
- Parallel PageRank computation using GPUs
Recommendations
Efficient PageRank and SpMV Computation on AMD GPUs
ICPP '10: Proceedings of the 2010 39th International Conference on Parallel ProcessingGoogle's famous PageRank algorithm is widely used to determine the importance of web pages in search engines. Given the large number of web pages on the World Wide Web, efficient computation of PageRank becomes a challenging problem. We accelerated the ...
A performance study of general-purpose applications on graphics processors using CUDA
Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...
A Parallel Data Mining Algorithm for PageRank Computation
BDAW '16: Proceedings of the International Conference on Big Data and Advanced Wireless TechnologiesWe study the utility of graphics processing units (GPUs) for an acceleration of the data mining PageRank algorithm and a reduction of the memory size of the web graph. We first present a new web graph representation using a compressed format in order to ...
Comments