skip to main content
article

Permuting data on random-access block storage

Published:01 July 2013Publication History
Skip Abstract Section

Abstract

Permutation is a fundamental operator for array data, with applications in, for example, changing matrix layouts and reorganizing data cubes. We consider the problem of permuting large quantities of data stored on secondary storage that supports fast random block accesses, such as solid state drives and distributed key-value stores. Faster random accesses open up interesting new opportunities for permutation. While external merge sort has often been used for permutation, it is an overkill that fails to exploit the property of permutation fully and carries unnecessary overhead in storing and comparing keys. We propose faster algorithms with lower memory requirements for a large, useful class of permutations. We also tackle practical challenges that traditional permutation algorithms have not dealt with, such as exploiting random block accesses more aggressively, considering the cost asymmetry between reads and writes, and handling arbitrary data dimension sizes (as opposed to perfect powers often assumed by previous work). As a result, our algorithms are faster and more broadly applicable.

References

  1. Agarwal, Agrawal, Deshpande, Gupta, Naughton, Ramakrishnan, and Sarawagi. On the computation of multidimensional aggregates. VLDB 1996. Google ScholarGoogle Scholar
  2. Balkesen, Teubner, Alonso, and Özsu. Main-memory hash joins on multicore CPUs: Tuning to the underlying hardware. ICDE 2013. Google ScholarGoogle Scholar
  3. Cao, Bramandia, Chan, and Tan. Optimized query evaluation using cooperative sorts. ICDE 2010.Google ScholarGoogle Scholar
  4. Cormen. Virtual Memory for Data-Parallel Computing. PhD thesis, MIT, 1993. Google ScholarGoogle Scholar
  5. Eklundh. A fast computer method for matrix transposing. IEEE Transactions on Computers, 21(7):801-803, July 1972. Google ScholarGoogle Scholar
  6. Kaushik, Huang, Johnson, Johnson, and Sadayappan. Efficient transposition algorithms for large matrices. Supercomputing 1993. Google ScholarGoogle Scholar
  7. Krishnamoorthy, Baumgartner, Cociorva, Lam, and Sadayappan. On efficient out-of-core matrix transposition. Technical report, Ohio State University, 2003.Google ScholarGoogle Scholar
  8. Ross and Srivastava. Fast computation of sparse datacubes. VLDB 1997. Google ScholarGoogle Scholar
  9. Satish, Kim, Chhugani, Nguyen, Lee, Kim, and Dubey. Fast sort on CPUs and GPUs: A case for bandwidth oblivious SIMD sort. SIGMOD 2010. Google ScholarGoogle Scholar
  10. Suh and Prasanna. An efficient algorithm for out-of-core matrix transposition. IEEE Transactions on Computers, 51(4):420-438, 2002. Google ScholarGoogle Scholar
  11. Thonangi and Yang. Permuting data on random-access block storage. Technical report, Duke University, 2013. http://www.cs.duke.edu/dbgroup/papers/ThonangiYang-13-permute_storage.pdf.Google ScholarGoogle Scholar
  12. The TPC benchmark H, 1993. http://www.tpc.org/tpch/.Google ScholarGoogle Scholar
  13. Vitter. External memory algorithms and data structures. ACM Computing Surveys, 33(2):209-271, 2001. Google ScholarGoogle Scholar
  14. Zhao. Performance Issues of Multi-Dimensional Data Analysis. PhD thesis, University of Wisconsin at Madison, 1998. Google ScholarGoogle Scholar
  15. Zhao, Deshpande, and Naughton. An array-based algorithm for simultaneous multidimensional aggregates. SIGMOD 1997. Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 6, Issue 9
    July 2013
    180 pages

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 July 2013
    Published in pvldb Volume 6, Issue 9

    Qualifiers

    • article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader