Skip to main content

Distribution-Insensitive Parallel External Sorting on PC Clusters

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2858))

Abstract.

There have been many parallel external sorting algorithms reported such as NOW-Sort, SPsort, and hill sort, etc. They are for sorting large-scale data stored in the disk, but they differ in the speed, throughput, and cost-effectiveness. Mostly they deal with data that are uniformly distributed in their value range. Few research results have been yet reported for parallel external sort for data with arbitrary distribution. In this paper, we present two distribution- insensitive parallel external sorting algorithms that use sampling technique and histogram counts to achieve even distribution of data among processors, which eventually contribute to achieve superb performance. Experimental results on a cluster of Linux workstations show up to 63% reduction in the execution time compared to previous NOW-sort.

This research was supported by KOSEF Grant (no. R01-2001-000341-0).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Akl, S.G.: The design and analysis of parallel algorithms. ch. 4. Prentice Hall, Englewood Cliffs (1989)

    MATH  Google Scholar 

  2. Arpaci-Desseau, C., Arpaci-Desseau, R.H., Culler, D.E., Hellerstein, J.M., Patterson, D.A.: High-performance sorting on networks of workstations. In: ACM SIGMOD 1997, Tucson, Arizona (1997)

    Google Scholar 

  3. Batcher, K.: Sorting networks and their applications. In: Proc. AFIPS Spring Joint Computer Conference, Reston, VA, vol. 32, pp. 307–314 (1968)

    Google Scholar 

  4. Dusseau, A.A., Dusseau, R.A., Culler, D.E., Hellerstein, J.M., Patterson, D.A.: Searching for the sorting record: experiences in tuning NOW-Sort. In: Proc. SIGMETRICS Symp. Parallel and Distributed Tools, pp. 124–133 (1998)

    Google Scholar 

  5. Fayyad, U., Uthurusamy, R.: Evolving data mining into solutions for insights. Communications of the ACM 45(8), 29–31 (2002)

    Article  Google Scholar 

  6. Jeon, M., Kim, D.: Parallel merge sort with load balancing. Int’l Journal of Parallel Programming 31(1), 21–33 (2003)

    Article  MATH  Google Scholar 

  7. Knuth, D.E.: The Art of Computer Programming, Volume III: Sorting and Searching. Addison-Wesley, Reading (1973)

    Google Scholar 

  8. Lee, J.-S., Jeon, M., Kim, D.: Partial sort. Proc. Parallel Processing System 13(1), 3–10 (2002)

    Google Scholar 

  9. Lee, S.-J., Jeon, M., Kim, D., Sohn, A.: Partitioned parallel radix sort. Journal of Parallel and Distributed Computing 62, 656–668 (2002)

    Article  MATH  Google Scholar 

  10. Li, X., et al.: A practical external sort for shared disk MPPs. In: Proc. Supercomputing 1993, pp. 666–675 (1993)

    Google Scholar 

  11. Mcgeoch, C.C., Tygar, J.D.: Optimal sampling strategies for quicksort. Random Structures and Algorithms 7, 287–300 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  12. Moore, G.E.: Cramming more components onto integrated circuits. Electronics 38(8) (1965)

    Google Scholar 

  13. Popovici, F., Bent, J., Forney, B., Dusseau, A.A., Dusseau, R.A.: Datamation 2001: A Sorting Odyssey. Sort Benchmark Home Page

    Google Scholar 

  14. Porter, J.: Disk trend, report (1998), http://www.disktrend.com/pdf/portrpkg.pdf

  15. Raman, R.: Random sampling techniques in parallel computation. In: Proc. IPPS/SPDP Workshops, pp. 351-360 (1998)

    Google Scholar 

  16. Taniar, D., Rahayu, J.W.: Sorting in parallel database systems. In: Proc. High Performance Computing in the Asia Pacific Region, 2000: The Fourth Int’l Conf. and Exhibition, vol. 2, pp. 830–835 (2000)

    Google Scholar 

  17. Wegner, L.M., Teuhola, J.I.: The external heapsort. IEEE Trans. Software Engineering 15(7), 917–925 (1989)

    Article  Google Scholar 

  18. Wyllie, J.: SPsort: How to sort a terabyte quickly. Technical Report, IBM Almaden Lab. (1999), http://www.almaden.ibm.com/cs/gpfs-spsort.html

  19. Sort Benchmark Home Page, http://research.microsoft.com/barc/SortBenchmark

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jeon, M., Kim, D. (2003). Distribution-Insensitive Parallel External Sorting on PC Clusters. In: Veidenbaum, A., Joe, K., Amano, H., Aiso, H. (eds) High Performance Computing. ISHPC 2003. Lecture Notes in Computer Science, vol 2858. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39707-6_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39707-6_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20359-9

  • Online ISBN: 978-3-540-39707-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics