research-article

Cache-Efficient Aggregation: Hashing Is Sorting

Authors:
Ingo Müller

Karlsruhe Institute of Technology / SAP SE, Karlsruhe / Walldorf, Germany

Karlsruhe Institute of Technology / SAP SE, Karlsruhe / Walldorf, Germany
View Profile

,
Peter Sanders

Karlsruhe Institute of Technology, Karlsruhe, Germany

Karlsruhe Institute of Technology, Karlsruhe, Germany
View Profile

,
Arnaud Lacurie

SAP SE, Walldorf, Germany

SAP SE, Walldorf, Germany
View Profile

,
Wolfgang Lehner

Dresden University of Technology, Dresden, Germany

Dresden University of Technology, Dresden, Germany
View Profile

,
Franz Färber

SAP SE, Walldorf, Germany

SAP SE, Walldorf, Germany
View Profile

SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of DataMay 2015Pages 1123–1136https://doi.org/10.1145/2723372.2747644

Published:27 May 2015Publication History

SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data

Pages 1123–1136

Editorial Notes

Computationally Reproducible. The experimental results of this paper were reproduced by a SIGMOD Review Committee and were found to support the central results reported in the paper. Details of the review process are found here:

http://db-reproducibility.seas.harvard.edu/#process

ABSTRACT

For decades researchers have studied the duality of hashing and sorting for the implementation of the relational operators, especially for efficient aggregation. Depending on the underlying hardware and software architecture, the specifically implemented algorithms, and the data sets used in the experiments, different authors came to different conclusions about which is the better approach. In this paper we argue that in terms of cache efficiency, the two paradigms are actually the same. We support our claim by showing that the complexity of hashing is the same as the complexity of sorting in the external memory model. Furthermore we make the similarity of the two approaches obvious by designing an algorithmic framework that allows to switch seamlessly between hashing and sorting during execution. The fact that we mix hashing and sorting routines in the same algorithmic framework allows us to leverage the advantages of both approaches and makes their similarity obvious. On a more practical note, we also show how to achieve very low constant factors by tuning both the hashing and the sorting routines to modern hardware. Since we observe a complementary dependency of the constant factors of the two routines to the locality of the input, we exploit our framework to switch to the faster routine where appropriate. The result is a novel relational aggregation algorithm that is cache-efficient---independently and without prior knowledge of input skew and output cardinality---, highly parallelizable on modern multi-core systems, and operating at a speed close to the memory bandwidth, thus outperforming the state-of-the-art by up to 3.7x.

References

A. Aggarwal and J. S. Vitter. The Input/Output Complexity of Sorting and Related Problems. Commun. ACM, 31(9):1116--1127, 1988. Google ScholarDigital Library
M.-c. Albutiu. Scalable Analytical Query Processing. PhD thesis, 2013.Google ScholarDigital Library
M.-C. Albutiu, A. Kemper, and T. Neumann. Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems. In PVLDB, volume 5, pages 1064--1075, 2012. Google ScholarDigital Library
L. Arge. The Buffer Tree: A New Technique for Optimal I/O-Algorithms. BRICS, pages 334--345, 1996.Google Scholar
L. Arge, G. S. l. Brodal, and R. Fagerberg. Cache-Oblivious Data Structures. In Handbook of Data Structures and Applications, pages 38/1--38/28. 2005.Google Scholar
L. Arge, M. T. Goodrich, M. Nelson, and N. Sitchinava. Fundamental parallel algorithms for private-cache chip multiprocessors. In SPAA, page 197, 2008. Google ScholarDigital Library
C. Balkesen, G. Alonso, J. Teubner, and M. T. Özsu. Multi-core, main-memory joins: Sort vs. hash revisited. In PVLDB, volume 7, pages 85--96, 2013. Google ScholarDigital Library
R. Barber, G. Lohman, I. Pandis, G. Attaluri, N. Chainani, S. Lightstone, V. Raman, R. Sidle, and D. Sharpe. Memory-Efficient Hash Joins. In PVLDB, pages 353--364, 2015. Google ScholarDigital Library
D. Bitton and D. J. DeWitt. Duplicate record elimination in large data files. TODS, 8(2):255--265, 1983. Google ScholarDigital Library
P. A. Boncz and M. L. Kersten. MIL primitives for querying a fragmented world. The VLDB Journal, 8(2):101--119, 1999. Google ScholarDigital Library
P. A. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper-Pipelining Query Execution. In CIDR, pages 225--237, 2005.Google Scholar
B. Chandramouli and J. Goldstein. Patience is a Virtue: Revisiting Merge and Sort on Modern Processors. In SIGMOD, pages 731--742, 2014. Google ScholarDigital Library
J. Cieslewicz and K. Ross. Adaptive Aggregation on Chip Multiprocessors. In PVLDB, pages 339--350, 2007. Google ScholarDigital Library
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction To Algorithms. MIT Press, 2001. Google ScholarDigital Library
E. D. Demaine. Cache-Oblivious Algorithms and Data Structures. BRICS, page 29, 2002.Google Scholar
D. J. DeWitt, R. H. Katz, et al. Implementation Techniques for Main Memory Database Systems. In SIGMOD, pages 1--8, 1984. Google ScholarDigital Library
C. Freedman, E. Ismert, and P.-A. k. Larson. Compilation in the Microsoft SQL Server Hekaton Engine. IEEE Data Eng. Bull., 37(1):22--30, 2014.Google Scholar
M. Frigo, C. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In FOCS, pages 285--297, 1999. Google ScholarDigital Library
G. Graefe. Query evaluation techniques for large databases. ACM Computing Surveys, 25(2):73--169, 1993. Google ScholarDigital Library
G. Graefe. New algorithms for join and grouping operations. Computer Science -- R&D, 27(1):3--27, 2011. Google ScholarDigital Library
G. Graefe, R. Bunker, and S. Cooper. Hash Joins and Hash Teams in Microsoft SQL Server. In PVLDB, pages 86--97, 1998. Google ScholarDigital Library
J. Gray, A. Bosworth, A. Lyaman, and H. Pirahesh. Data cube: a relational aggregation operator generalizing GROUP-BY, CROSS-TAB, and SUB-TOTALS. In ICDE, pages 152--159, 1996. Google ScholarDigital Library
S. Helmer, T. Neumann, and G. Moerkotte. Early Grouping Gets the Skew. Technical report, 2011.Google Scholar
Intel Corporation. Intel® 64 and IA-32 Architectures Optimization Reference Manual. 2009.Google Scholar
D. Jimenez-Gonzalez, J. Navarro, and J.-L. Larriba-Pey. CC-Radix: a cache conscious sorting based on Radix sort. In PDP, pages 101--108. IEEE, 2003.Google Scholar
C. Kim, T. Kaldewey, V. W. Lee, E. Sedlar, et al. Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs. PVLDB, 2(2):1378--1389, 2009. Google ScholarDigital Library
P.-k. Larson. Grouping and duplicate elimination: Benefits of early aggregation. Technical report, 1997.Google Scholar
V. Leis, P. Boncz, A. Kemper, and T. Neumann. Morsel-driven parallelism. In SIGMOD, pages 743--754, 2014. Google ScholarDigital Library
C. Lemke, K.-U. Sattler, F. Faerber, and A. Zeier. Speeding Up Queries in Column Stores -- A Case for Compression. In DaWaK, pages 117--129, 2010. Google ScholarDigital Library
S. Manegold, P. Boncz, and M. Kersten. Optimizing main-memory join on modern hardware. TKDE, 14(4):709--730, 2002. Google ScholarDigital Library
S. Manegold, P. Boncz, N. Nes, and M. Kersten. Cache-conscious radix-decluster projections. In PVLDB, volume 30, pages 684--695, 2004. Google ScholarDigital Library
S. Manegold, P. A. Boncz, and M. L. Kersten. Optimizing database architecture for the new bottleneck: memory access. The VLDB Journal, 9(3):231--246, 2000. Google ScholarDigital Library
Y. Matias, E. Segal, and J. S. Vitter. Efficient Bundle Sorting. SIAM Journal on Computing, 36(2):394, 2006. Google ScholarDigital Library
F. Nagel, G. Bierman, and S. D. Viglas. Code generation for efficient query processing in managed runtimes. In PVLDB, pages 1095--1106, 2014. Google ScholarDigital Library
T. Neumann. Efficiently compiling efficient query plans for modern hardware. volume 4, pages 539--550, 2011. Google ScholarDigital Library
O. Polychroniou and K. A. Ross. A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort. In SIGMOD, pages 755--766, 2014. Google ScholarDigital Library
V. Raman, G. Attaluri, R. Barber, N. Chainani, et al. DB2 with BLU Acceleration: So Much More than Just a Column Store. In PVLDB, page 773, 2013. Google ScholarDigital Library
N. Satish, C. Kim, J. Chhugani, A. D. Nguyen, V. W. Lee, D. Kim, and P. Dubey. Fast sort on CPUs and GPUs. In SIGMOD, page 351, 2010.Google ScholarDigital Library
A. Shatdal and J. F. Naughton. Adaptive Parallel Aggregation Algorithms. In SIGMOD, pages 104--114, 1995. Google ScholarDigital Library
R. Vernica, A. Balmin, K. S. Beyer, and V. Ercegovac. Adaptive MapReduce using situation-aware mappers. In EDBT, page 420, 2012. Google ScholarDigital Library
J. S. Vitter. External memory algorithms and data structures: dealing with massive data. ACM Computing Surveys, 33(2):209--271, 2001. Google ScholarDigital Library
J. Wassenberg and P. Sanders. Engineering a multi-core radix sort. In Euro-Par, pages 160--169, 2011. Google ScholarDigital Library
L. Wegner and J. Teuhola. The External Heapsort. TSE, 15(7):917--925, 1989. Google ScholarDigital Library
J. Wen. Revisiting aggregation techniques for data intensive applications. PhD thesis, 2013. Google ScholarDigital Library
Wikipedia. Integer sorting -- Wikipedia, the free encyclopedia, 2015. {Online; accessed 22-January-2015}.Google Scholar
Y. Ye, K. A. Ross, and N. Vesdapunt. Scalable Aggregation on Multicore Processors. In DaMoN, pages 1--9, 2011. Google ScholarDigital Library

Index Terms

Cache-Efficient Aggregation: Hashing Is Sorting
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
      2. Parallel and distributed DBMSs
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Recommendations

Efficient Sorting, Duplicate Removal, Grouping, and Aggregation
Database query processing requires algorithms for duplicate removal, grouping, and aggregation. Three algorithms exist: in-stream aggregation is most efficient by far but requires sorted input; sort-based aggregation relies on external merge sort; and ...
Read More
Cache-Oblivious Algorithms

This article presents asymptotically optimal algorithms for rectangular matrix transpose, fast Fourier transform (FFT), and sorting on computers with multiple levels of caching. Unlike previous optimal algorithms, these algorithms are cache oblivious: ...
Read More
Cache Efficient Radix Sort for String Sorting

In this paper, we propose CRadix sort, a new string sorting algorithm based on MSD radix sort. CRadix sort causes fewer cache misses than MSD radix sort by uniquely associating a small block of main memory called the key buffer to each key and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
May 2015
2110 pages
ISBN:9781450327589
DOI:10.1145/2723372
General Chair:
Timos Sellis
RMIT University, Australia
,
Program Chairs:
Susan B. Davidson
University of Pennsylvania, USA
,
Zack Ives
University of Pennsylvania, USA
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 May 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Results Reproduced / v1.1
Author Tags
adaptive algorithm
aggregation
cache-efficient
group by
grouping
hashing
robust performance
shared-memory
sorting
Qualifiers
- research-article
Conference

Acceptance Rates
SIGMOD '15 Paper Acceptance Rate106of415submissions,26%Overall Acceptance Rate785of4,003submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 32
  Total Citations
  View Citations
- 1,754
  Total Downloads
- Downloads (Last 12 months)134
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Cache-Efficient Aggregation: Hashing Is Sorting

SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data

Editorial Notes

ABSTRACT

References

Cited By

Index Terms

Recommendations

Efficient Sorting, Duplicate Removal, Grouping, and Aggregation

Cache-Oblivious Algorithms

Cache Efficient Radix Sort for String Sorting