Article

Iceberg-cube computation with PC clusters

Authors:
Raymond T. Ng

Univ British Columbia, 2366 Main Mall, UBC, Vancouver, BC

Univ British Columbia, 2366 Main Mall, UBC, Vancouver, BC
View Profile

,
Alan Wagner

Univ British Columbia, 2366 Main Mall, UBC, Vancouver, BC

Univ British Columbia, 2366 Main Mall, UBC, Vancouver, BC
View Profile

,
Yu Yin

Univ British Columbia, 2366 Main Mall, UBC, Vancouver, BC

Univ British Columbia, 2366 Main Mall, UBC, Vancouver, BC
View Profile

SIGMOD '01: Proceedings of the 2001 ACM SIGMOD international conference on Management of dataMay 2001Pages 25–36https://doi.org/10.1145/375663.375666

Published:01 May 2001Publication History

SIGMOD '01: Proceedings of the 2001 ACM SIGMOD international conference on Management of data

Pages 25–36

ABSTRACT

In this paper, we investigate the approach of using low cost PC cluster to parallelize the computation of iceberg-cube queries. We concentrate on techniques directed towards online querying of large, high-dimensional datasets where it is assumed that the total cube has net been precomputed. The algorithmic space we explore considers trade-offs between parallelism, computation and I/0. Our main contribution is the development and a comprehensive evaluation of various novel, parallel algorithms. Specifically: (1) Algorithm RP is a straightforward parallel version of BUC [BR99]; (2) Algorithm BPP attempts to reduce I/0 by outputting results in a more efficient way; (3) Algorithm ASL, which maintains cells in a cuboid in a skiplist, is designed to put the utmost priority on load balancing; and (4) alternatively, Algorithm PT load-balances by using binary partitioning to divide the cube lattice as evenly as possible.

We present a thorough performance evaluation on all these algorithms on a variety of parameters, including the dimensionality of the cube, the sparseness of the cube, the selectivity of the constraints, the number of processors, and the size of the dataset. A key finding is that it is not a one-algorithm-fit-all situation. We recommend a “recipe” which uses PT as the default algorithm, but may also deploy ASL under specific circumstances.

References

1.R. Agrawal, S. Agrawal, P. Deshpande, A. Gupta, J. Naughton, R. Ramakrishnan and S. Sarawagi. On the computation of multidimensional aggregates. In Proc. 1996 VLDB, pp. 506-521.]] Google ScholarDigital Library
2.E. Baralis, S. Paraboschi and E. Teniente. Materialized view selection in a multidimensional database. In Proc. 1997 VLDB, pp. 98-112.]] Google ScholarDigital Library
3.K. Beyer and R. Ramakrishnan. Bottom-Up Computation of Sparse and Iceberg CUBEs. In Proc. 1999 ACM SIGMOD, pp 359-370.]] Google ScholarDigital Library
4.M. Eberl, W. Karl, C. Trinitis, and A. Blaszczyk. Parallel Computing on PC Clusters - An Alternative to Supercomputers for Industrial Applications. In Proc. 6th European Parallel Virtual Machine/Message Passing Interface Conference, LNCS vol. 1697, pp. 493-498, 1999.]] Google ScholarDigital Library
5.M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani and J. Ullman. Computing iceberg queries efficiently. InProc. 1998 VLDB, pp. 299-310.]] Google ScholarDigital Library
6.S. Goil and A. Choudhary. High Performance OLAP and Data Mining on Parallel Computers. In The Journal of Data Mining and Knowledge Discovery, 1, 4, pp. 391-418, 1997.]] Google ScholarDigital Library
7.J. Gray, A. Bosworth, A. Layman and H. Pirahesh. Datacube: A relational aggregation operator generalizing group-by, cross-tab and sub-totals. In Proc. 1996 ICDE, pp. 152-159.]] Google ScholarDigital Library
8.H. Gupta, V. Harinarayan, A. Rajaraman and J. Ullman. Index selction for OLAP. InProc. 1997 ICDE, pp. 208-219.]] Google ScholarDigital Library
9.V. Harinarayan, A. Rajaraman and J. Ullman. Implementing data cubes efficiently. InProc. 1996 ACM SIGMOD, pp. 205-216.]] Google ScholarDigital Library
10.J. Hellerstein, J. Haas and H. Wang. Online Aggregation. In Proc. 1997 SIGMOD, pp. 171-182.]] Google ScholarDigital Library
11.M. Kamber, J. Han and J. Chiang. Metarule-guided mining of multi-dimensional association rules using data cubes. In Proc. 1997 KDD, pp. 207-210.]]Google Scholar
12.R.T. Ng, L.V.S. Lakshmanan, J. Han, and A. Pang. Exploratory mining and pruning optimizations of constrained associations rules. In Proc. 1998 SIGMOD, pp. 13-24.]] Google ScholarDigital Library
13.K. Ross and D. Srivastava. Fast Computation of Sparse Datacubes. In Proc. 1997 VLDB, pp. 116-125.]] Google ScholarDigital Library
14.S. Sarawagi. Explaining differences in multidimensional aggregates. In Proc. 1999 VLDB, pp. 42-53.]] Google ScholarDigital Library
15.A. Shukla, P. Deshpande and J. Naughton. Materialized view selection for multidimensional datasets. In Proc. 1998 VLDB, pp 488-499.]] Google ScholarDigital Library
16.A. Srivastava, E. Han, V. Kumar and V. Singh. Parallel formulations of decision-tree classification algorithm. In The Journal of Data Mining and Knowledge Discovery, 3, 3, pp. 237-262, 1999.]] Google ScholarDigital Library
17.M. Tamura and M. Kitsuregawa. Dynamic Load Balance for Parallel Association Rule Mining on Heterogeneous PC Cluster System. In Proc. 1999 VLDB, pp. 162-173.]] Google ScholarDigital Library
18.M. Zaki. Parallel and distributed association mining: a survey. InIEEE Concurrency, 7, 4, pp. 14-25, 1999.]] Google ScholarDigital Library
19.YiHong Zhao, Prasad Deshpande, and Jeffrey F. Naughton An Array-based algorithm for simultaneous Multidimensional aggregates. SIGMOD Conference 1997, pp. 159-170]] Google ScholarDigital Library

Index Terms

Iceberg-cube computation with PC clusters

Recommendations

Iceberg-cube computation with PC clusters

In this paper, we investigate the approach of using low cost PC cluster to parallelize the computation of iceberg-cube queries. We concentrate on techniques directed towards online querying of large, high-dimensional datasets where it is assumed that ...
Read More
Bottom-up computation of sparse and Iceberg CUBE

We introduce the Iceberg-CUBE problem as a reformulation of the datacube (CUBE) problem. The Iceberg-CUBE problem is to compute only those group-by partitions with an aggregate value (e.g., count) above some minimum support threshold. The result of ...
Read More
A Parallel Algorithm for Closed Cube Computation
ICIS '08: Proceedings of the Seventh IEEE/ACIS International Conference on Computer and Information Science (icis 2008)

Closed cubing is a very efficient algorithm for data cube compression proposed recently in the literature. It losslessly condenses a group of cells into one cell if these cells have the same aggregate value and preserve roll-up/drill-down semantics. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '01: Proceedings of the 2001 ACM SIGMOD international conference on Management of data
May 2001
630 pages
ISBN:1581133324
DOI:10.1145/375663
Editors:
Timos Sellis,
Sharad Mehrotra
ACM SIGMOD Record Volume 30, Issue 2
June 2001
625 pages
ISSN:0163-5808
DOI:10.1145/376284
Editors:
Timos Sellis
National Technical Univ. of Athens
,
Sharad Mehrotra
Univ. of California at Irvine
Issue’s Table of Contents
Copyright © 2001 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 May 2001
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
OLAP
parallel computation
Qualifiers
- Article
Conference

Acceptance Rates
SIGMOD '01 Paper Acceptance Rate44of293submissions,15%Overall Acceptance Rate785of4,003submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 60
  Total Citations
  View Citations
- 941
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Iceberg-cube computation with PC clusters

SIGMOD '01: Proceedings of the 2001 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Iceberg-cube computation with PC clusters

Bottom-up computation of sparse and Iceberg CUBE

A Parallel Algorithm for Closed Cube Computation