skip to main content
10.1145/375663.375666acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Iceberg-cube computation with PC clusters

Authors Info & Claims
Published:01 May 2001Publication History

ABSTRACT

In this paper, we investigate the approach of using low cost PC cluster to parallelize the computation of iceberg-cube queries. We concentrate on techniques directed towards online querying of large, high-dimensional datasets where it is assumed that the total cube has net been precomputed. The algorithmic space we explore considers trade-offs between parallelism, computation and I/0. Our main contribution is the development and a comprehensive evaluation of various novel, parallel algorithms. Specifically: (1) Algorithm RP is a straightforward parallel version of BUC [BR99]; (2) Algorithm BPP attempts to reduce I/0 by outputting results in a more efficient way; (3) Algorithm ASL, which maintains cells in a cuboid in a skiplist, is designed to put the utmost priority on load balancing; and (4) alternatively, Algorithm PT load-balances by using binary partitioning to divide the cube lattice as evenly as possible.

We present a thorough performance evaluation on all these algorithms on a variety of parameters, including the dimensionality of the cube, the sparseness of the cube, the selectivity of the constraints, the number of processors, and the size of the dataset. A key finding is that it is not a one-algorithm-fit-all situation. We recommend a “recipe” which uses PT as the default algorithm, but may also deploy ASL under specific circumstances.

References

  1. 1.R. Agrawal, S. Agrawal, P. Deshpande, A. Gupta, J. Naughton, R. Ramakrishnan and S. Sarawagi. On the computation of multidimensional aggregates. In Proc. 1996 VLDB, pp. 506-521.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.E. Baralis, S. Paraboschi and E. Teniente. Materialized view selection in a multidimensional database. In Proc. 1997 VLDB, pp. 98-112.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.K. Beyer and R. Ramakrishnan. Bottom-Up Computation of Sparse and Iceberg CUBEs. In Proc. 1999 ACM SIGMOD, pp 359-370.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.M. Eberl, W. Karl, C. Trinitis, and A. Blaszczyk. Parallel Computing on PC Clusters - An Alternative to Supercomputers for Industrial Applications. In Proc. 6th European Parallel Virtual Machine/Message Passing Interface Conference, LNCS vol. 1697, pp. 493-498, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani and J. Ullman. Computing iceberg queries efficiently. InProc. 1998 VLDB, pp. 299-310.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.S. Goil and A. Choudhary. High Performance OLAP and Data Mining on Parallel Computers. In The Journal of Data Mining and Knowledge Discovery, 1, 4, pp. 391-418, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.J. Gray, A. Bosworth, A. Layman and H. Pirahesh. Datacube: A relational aggregation operator generalizing group-by, cross-tab and sub-totals. In Proc. 1996 ICDE, pp. 152-159.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.H. Gupta, V. Harinarayan, A. Rajaraman and J. Ullman. Index selction for OLAP. InProc. 1997 ICDE, pp. 208-219.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9.V. Harinarayan, A. Rajaraman and J. Ullman. Implementing data cubes efficiently. InProc. 1996 ACM SIGMOD, pp. 205-216.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10.J. Hellerstein, J. Haas and H. Wang. Online Aggregation. In Proc. 1997 SIGMOD, pp. 171-182.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.M. Kamber, J. Han and J. Chiang. Metarule-guided mining of multi-dimensional association rules using data cubes. In Proc. 1997 KDD, pp. 207-210.]]Google ScholarGoogle Scholar
  12. 12.R.T. Ng, L.V.S. Lakshmanan, J. Han, and A. Pang. Exploratory mining and pruning optimizations of constrained associations rules. In Proc. 1998 SIGMOD, pp. 13-24.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13.K. Ross and D. Srivastava. Fast Computation of Sparse Datacubes. In Proc. 1997 VLDB, pp. 116-125.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14.S. Sarawagi. Explaining differences in multidimensional aggregates. In Proc. 1999 VLDB, pp. 42-53.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15.A. Shukla, P. Deshpande and J. Naughton. Materialized view selection for multidimensional datasets. In Proc. 1998 VLDB, pp 488-499.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16.A. Srivastava, E. Han, V. Kumar and V. Singh. Parallel formulations of decision-tree classification algorithm. In The Journal of Data Mining and Knowledge Discovery, 3, 3, pp. 237-262, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17.M. Tamura and M. Kitsuregawa. Dynamic Load Balance for Parallel Association Rule Mining on Heterogeneous PC Cluster System. In Proc. 1999 VLDB, pp. 162-173.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18.M. Zaki. Parallel and distributed association mining: a survey. InIEEE Concurrency, 7, 4, pp. 14-25, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19.YiHong Zhao, Prasad Deshpande, and Jeffrey F. Naughton An Array-based algorithm for simultaneous Multidimensional aggregates. SIGMOD Conference 1997, pp. 159-170]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Iceberg-cube computation with PC clusters

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              SIGMOD '01: Proceedings of the 2001 ACM SIGMOD international conference on Management of data
              May 2001
              630 pages
              ISBN:1581133324
              DOI:10.1145/375663

              Copyright © 2001 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 1 May 2001

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • Article

              Acceptance Rates

              SIGMOD '01 Paper Acceptance Rate44of293submissions,15%Overall Acceptance Rate785of4,003submissions,20%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader