ABSTRACT
The diversity of hardware components within a single system calls for strategies for efficient cross-device data processing. For example, existing approaches to CPU/GPU co-processing distribute individual relational operators to the "most appropriate" device. While pleasantly simple, this strategy has a number of problems: it may leave the "inappropriate" devices idle while overloading the "appropriate" device and putting a high pressure on the PCI bus. To address these issues we distribute data among the devices by partially decomposing relations at the granularity of individual bits. Each of the resulting bit-partitions is stored and processed on one of the available devices. Using this strategy, we implemented a processor for spatial range queries that makes efficient use of all available devices. The performance gains achieved indicate that bitwise distribution makes a good cross-device processing strategy.
- D. Abadi, S. Madden, and N. Hachem. Column-stores vs. row-stores: How different are they really? In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 967--980. ACM, 2008. Google ScholarDigital Library
- P. Bakkum and K. Skadron. Accelerating SQL database operations on a GPU with CUDA. In Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, pages 94--103. ACM, 2010. Google ScholarDigital Library
- P. A. Boncz, M. L. Kersten, and S. Manegold. Breaking the memory wall in monetdb. Commun. ACM, 51(12): 77--85, 2008. Google ScholarDigital Library
- G. P. Copeland and S. N. Khoshafian. A decomposition storage model. In Proceedings of the 1985 ACM SIGMOD international conference on Management of data, SIGMOD '85, pages 268--279, New York, NY, USA, 1985. ACM. Google ScholarDigital Library
- S. Ding, J. He, H. Yan, and T. Suel. Using graphics processors for high performance IR query processing. In Proceedings of the 18th international conference on World wide web, pages 421--430. ACM, 2009. Google ScholarDigital Library
- R. Fang, B. He, M. Lu, K. Yang, N. Govindaraju, Q. Luo, and P. Sander. GPUQP: query co-processing using graphics processors. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pages 1061--1063. ACM, 2007. Google ScholarDigital Library
- W. Fang, B. He, and Q. Luo. Database compression on graphics processors. Proceedings of the VLDB Endowment, 3(1--2): 670--680, 2010. Google ScholarDigital Library
- W. R. Franklin. Adaptive grids for geometric operations. In Sixth International Symposium on Automated Cartography (Auto-Carto Six), pages 230--239, 1983.Google Scholar
- M. G. M. A computer oriented geodetic data base; and a new technique in file sequencing. Technical report, Ottawa, Canada: IBM Ltd., 1966.Google Scholar
- G. Graefe. Volcano-an extensible and parallel query evaluation system. Knowledge and Data Engineering, IEEE Transactions on, 6(1): 120--135, 1994. Google ScholarDigital Library
- C. Gregg and K. Hazelwood. Where is the data? why you cannot debate cpu vs. gpu performance without the answer. In Performance Analysis of Systems and Software (ISPASS), 2011 IEEE International Symposium on, pages 134--144. IEEE, 2011. Google ScholarDigital Library
- A. Guttman. R-trees: A dynamic index structure for spatial searching. In B. Yormark, editor, SIGMOD'84, Proceedings of Annual Meeting, Boston, Massachusetts, June 18--21, 1984, pages 47--57. ACM Press, 1984. Google ScholarDigital Library
- B. He, W. Fang, Q. Luo, N. Govindaraju, and T. Wang. Mars: a MapReduce framework on graphics processors. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques, pages 260--269. ACM, 2008. Google ScholarDigital Library
- B. He, M. Lu, K. Yang, R. Fang, N. Govindaraju, Q. Luo, and P. Sander. Relational query coprocessing on graphics processors. ACM Transactions on Database Systems (TODS), 34(4): 21, 2009. Google ScholarDigital Library
- B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander. Relational joins on graphics processors. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 511--524. ACM, 2008. Google ScholarDigital Library
- A. Munshi. OpenCL specification 1.1. Khronos OpenCL Working Group, 2010.Google Scholar
- C. Nvidia. Compute Unified Device Architecture Programming Guide. NVIDIA: Santa Clara, CA, 83: 129, 2007.Google Scholar
- H. Samet and R. E. Webber. Storing a collection of polygons using quadtrees. ACM Trans. Graph., 4(3): 182--222, 1985. Google ScholarDigital Library
- L. Sidirourgos, M. Kersten, and P. Boncz. Sciborq: Scientific data management with bounds on runtime and quality. In Proc. of the Int'l Conf. on Innovative Data Systems Research (CIDR), pages 296--301, 2011.Google Scholar
- M. Zukowski, S. Heman, N. Nes, and P. Boncz. Super-scalar ram-cpu cache compression. In Data Engineering, 2006. ICDE'06. Proceedings of the 22nd International Conference on, pages 59--59. IEEE, 2006. Google ScholarDigital Library
- X-device query processing by bitwise distribution
Recommendations
Efficient Query Processing on Many-core Architectures: A Case Study with Intel Xeon Phi Processor
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataRecently, Intel Xeon Phi is emerging as a many-core processor with up to 61 x86 cores. In this demonstration, we present PhiDB, an OLAP query processor with simultaneous multi-threading (SMT) capabilities on Xeon Phi as a case study for parallel ...
Exploring Query Processing on CPU-GPU Integrated Edge Device
Huge amounts of data have been generated on edge devices every day, which requires efficient data analytics and management. However, due to the limited computing capacity of these edge devices, query processing at the edge faces tremendous pressure. ...
In-cache query co-processing on coupled CPU-GPU architectures
Recently, there have been some emerging processor designs that the CPU and the GPU (Graphics Processing Unit) are integrated in a single chip and share Last Level Cache (LLC). However, the main memory bandwidth of such coupled CPU-GPU architectures can ...
Comments