Abstract.
There have been many parallel external sorting algorithms reported such as NOW-Sort, SPsort, and hill sort, etc. They are for sorting large-scale data stored in the disk, but they differ in the speed, throughput, and cost-effectiveness. Mostly they deal with data that are uniformly distributed in their value range. Few research results have been yet reported for parallel external sort for data with arbitrary distribution. In this paper, we present two distribution- insensitive parallel external sorting algorithms that use sampling technique and histogram counts to achieve even distribution of data among processors, which eventually contribute to achieve superb performance. Experimental results on a cluster of Linux workstations show up to 63% reduction in the execution time compared to previous NOW-sort.
This research was supported by KOSEF Grant (no. R01-2001-000341-0).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Akl, S.G.: The design and analysis of parallel algorithms. ch. 4. Prentice Hall, Englewood Cliffs (1989)
Arpaci-Desseau, C., Arpaci-Desseau, R.H., Culler, D.E., Hellerstein, J.M., Patterson, D.A.: High-performance sorting on networks of workstations. In: ACM SIGMOD 1997, Tucson, Arizona (1997)
Batcher, K.: Sorting networks and their applications. In: Proc. AFIPS Spring Joint Computer Conference, Reston, VA, vol. 32, pp. 307–314 (1968)
Dusseau, A.A., Dusseau, R.A., Culler, D.E., Hellerstein, J.M., Patterson, D.A.: Searching for the sorting record: experiences in tuning NOW-Sort. In: Proc. SIGMETRICS Symp. Parallel and Distributed Tools, pp. 124–133 (1998)
Fayyad, U., Uthurusamy, R.: Evolving data mining into solutions for insights. Communications of the ACM 45(8), 29–31 (2002)
Jeon, M., Kim, D.: Parallel merge sort with load balancing. Int’l Journal of Parallel Programming 31(1), 21–33 (2003)
Knuth, D.E.: The Art of Computer Programming, Volume III: Sorting and Searching. Addison-Wesley, Reading (1973)
Lee, J.-S., Jeon, M., Kim, D.: Partial sort. Proc. Parallel Processing System 13(1), 3–10 (2002)
Lee, S.-J., Jeon, M., Kim, D., Sohn, A.: Partitioned parallel radix sort. Journal of Parallel and Distributed Computing 62, 656–668 (2002)
Li, X., et al.: A practical external sort for shared disk MPPs. In: Proc. Supercomputing 1993, pp. 666–675 (1993)
Mcgeoch, C.C., Tygar, J.D.: Optimal sampling strategies for quicksort. Random Structures and Algorithms 7, 287–300 (1995)
Moore, G.E.: Cramming more components onto integrated circuits. Electronics 38(8) (1965)
Popovici, F., Bent, J., Forney, B., Dusseau, A.A., Dusseau, R.A.: Datamation 2001: A Sorting Odyssey. Sort Benchmark Home Page
Porter, J.: Disk trend, report (1998), http://www.disktrend.com/pdf/portrpkg.pdf
Raman, R.: Random sampling techniques in parallel computation. In: Proc. IPPS/SPDP Workshops, pp. 351-360 (1998)
Taniar, D., Rahayu, J.W.: Sorting in parallel database systems. In: Proc. High Performance Computing in the Asia Pacific Region, 2000: The Fourth Int’l Conf. and Exhibition, vol. 2, pp. 830–835 (2000)
Wegner, L.M., Teuhola, J.I.: The external heapsort. IEEE Trans. Software Engineering 15(7), 917–925 (1989)
Wyllie, J.: SPsort: How to sort a terabyte quickly. Technical Report, IBM Almaden Lab. (1999), http://www.almaden.ibm.com/cs/gpfs-spsort.html
Sort Benchmark Home Page, http://research.microsoft.com/barc/SortBenchmark
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jeon, M., Kim, D. (2003). Distribution-Insensitive Parallel External Sorting on PC Clusters. In: Veidenbaum, A., Joe, K., Amano, H., Aiso, H. (eds) High Performance Computing. ISHPC 2003. Lecture Notes in Computer Science, vol 2858. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39707-6_15
Download citation
DOI: https://doi.org/10.1007/978-3-540-39707-6_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20359-9
Online ISBN: 978-3-540-39707-6
eBook Packages: Springer Book Archive