article

Free Access

Practical parallel algorithms for personalized communication and integer sorting

ACM Journal of Experimental Algorithmics Volume 1pp 3–eshttps://doi.org/10.1145/235141.235148

Published:01 January 1996Publication History

ACM Journal of Experimental Algorithmics

Abstract

A fundamental challenge for parallel computing is to obtain high-level, architecture independent, algorithms which efficiently execute on general-purpose parallel machines. With the emergence of message passing standards such as MPI, it has become easier to design efficient and portable parallel algorithms by making use of these communication primitives. While existing primitives allow an assortment of collective communication routines, they do not handle an important communication event when most or all processors have non-uniformly sized personalized messages to exchange with each other. We focus in this paper on the h-relation personalized communication whose efficient implementation will allow high performance implementations of a large class of algorithms. While most previous h-relation algorithms use randomization, this paper presents a new deterministic approach for h-relation personalized communication with asymptotically optimal complexity for h>p². As an application, we present an efficient algorithm for stable integer sorting.

The algorithms presented in this paper have been coded in Split-C and run on a variety of platforms, including the Thinking Machines CM-5, IBM SP-1 and SP-2, Cray Research T3D, Meiko Scientific CS-2, and the Intel Paragon. Our experimental results are consistent with the theoretical analysis and illustrate the scalability and efficiency of our algorithms across different platforms. In fact, they seem to outperform all similar algorithms known to the authors on these platforms.

Supplemental Material

Available for Download

tar

p3-bader.tar (3.2 MB)

The software suite accompanying the article; this is a large (3.3Megs) Unix tar file.

vol1nbr3.ps (1.2 MB)

tar

vol1nbr3.tex.tar (1.1 MB)

References

{1} B. Abali, F. Özgüner, and A. Bataineh. Balanced Parallel Sort on Hypercube Multiprocessors. IEEE Transactions on Parallel and Distributed Systems, 4(5):572-581, 1993. Google ScholarDigital Library
{2} A. Alexandrov, M. Ionescu, K. Schauser, and C. Scheiman. LogGP: Incorporating Long Messages into the LogP Model-One step closer towards a realistic model for parallel computation. In 7th Annual ACM Symposium on Parallel Algorithms and Architectures , pages 95-105, Santa Barbara, CA, July 1995. Google ScholarDigital Library
{3} R.H. Arpaci, D.E. Culler, A. Krishnamurthy, S.G. Steinberg, and K. Yelick. Empirical Evaluation of the CRAY-T3D: A Compiler Perspective. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 320-331, Santa Margherita Ligure, Italy, June 1995. ACM Press. Google ScholarDigital Library
{4} D. Bader. Randomized and Deterministic Routing Algorithms for h-Relations. ENEE 648X Class Report, April 1, 1994.Google Scholar
{5} D.A. Bader. On the Design and Analysis of Practical Parallel Algorithms for Combinatorial Problems with Applications to Image Processing. PhD thesis, University of Maryland, College Park, Department of Electrical Engineering, April 1996. Google ScholarDigital Library
{6} D.A. Bader and J. JáJá. Parallel Algorithms for Image Histogramming and Connected Components with an Experimental Study. Technical Report CS-TR-3384 and UMIACS-TR-94-133, UMIACS and Electrical Engineering, University of Maryland, College Park, MD, December 1994. In Journal of Parallel and Distributed Computing, 35(2):173-190, 1996. Google ScholarDigital Library
{7} D.A. Bader and J. JáJá. Parallel Algorithms for Image Histogramming and Connected Components with an Experimental Study. In Fifth ACM SIGPLAN Symposium of Principles and Practice of Parallel Programming, pages 123-133, Santa Barbara, CA, July 1995. Google ScholarDigital Library
{8} D.A. Bader and J. JáJá. Practical Parallel Algorithms for Dynamic Data Redistribution, Median Finding, and Selection. Technical Report CS-TR-3494 and UMIACS-TR-95-74, UMIACS and Electrical Engineering, University of Maryland, College Park, MD, July 1995. Google ScholarDigital Library
{9} D.A. Bader and J. JáJá. Practical Parallel Algorithms for Dynamic Data Redistribution, Median Finding, and Selection. In Proceedings of the 10th International Parallel Processing Symposium , pages 292-301, Honolulu, HI, April 1996. Google ScholarDigital Library
{10} D. Bailey, E. Barszcz, J. Barton, D. Browning, R. Carter, L. Dagum, R. Fatoohi, S. Fineberg, P. Frederickson, T. Lasinski, R. Schreiber, H. Simon, V. Venkatakrishnan, and S. Weeratunga. The NAS Parallel Benchmarks. Technical Report RNR-94-007, Numerical Aerodynamic Simulation Facility, NASA Ames Research Center, Moffett Field, CA, March 1994.Google Scholar
{11} V. Bala, J. Bruck, R. Cypher, P. Elustondo, A. Ho, C.-T. Ho, S. Kipnis, and M. Snir. CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers. IEEE Transactions on Parallel and Distributed Systems, 6:154- 164, 1995. Google ScholarDigital Library
{12} D.P. Bertsekas, C. Özveren, G.D. Stamoulis, P. Tseng, and J.N. Tsitsiklis. Optimal Communication Algorithms for Hypercubes. Journal of Parallel and Distributed Computing, 11:263-275, 1991. Google ScholarDigital Library
{13} G.E. Blelloch, C.E. Leiserson, B.M. Maggs, C.G. Plaxton, S.J. Smith, and M. Zagha. A Comparison of Sorting Algorithms for the Connection Machine CM-2. In Proceedings of the ACM Symposium on Parallel Algorithms and Architectures, pages 3-16, July 1991. Google ScholarDigital Library
{14} S.H. Bokhari. Complete Exchange on the iPSC-860. ICASE Report No. 91-4, ICASE, NASA Langley Research Center, Hampton, VA, January 1991.Google Scholar
{15} S.H. Bokhari. Multiphase Complete Exchange on a Circuit Switched Hypercube. In Proceedings of the 1991 International Conference on Parallel Processing, pages I-525 - I-529, August 1991. Also appeared as NASA ICASE Report No. 91-5.Google Scholar
{16} S.H. Bokhari and H. Berryman. Complete Exchange on a Circuit Switched Mesh. In Proceedings of Scalable High Performance Computing Conference, pages 300-306, Williamsburg, VA, April 1992.Google ScholarCross Ref
{17} W.W. Carlson and J.M. Draper. AC for the T3D. Technical Report SRC-TR-95-141, Supercomputing Research Center, Bowie, MD, February 1995.Google Scholar
{18} Cray Research, Inc. SHMEM Technical Note for C, October 1994. Revision 2.3.Google Scholar
{19} D.E. Culler, A. Dusseau, S.C. Goldstein, A. Krishnamurthy, S. Lumetta, S. Luna, T. von Eicken, and K. Yelick. Introduction to Split-C. Computer Science Division- EECS, University of California, Berkeley, version 1.0 edition, March 6, 1994.Google Scholar
{20} D.E. Culler, R.M. Karp, D.A. Patterson, A. Sahay, K.E. Schauser, E. Santos, R. Subramonian, and T. von Eicken. LogP: Towards a Realistic Model of Parallel Computation. In Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, May 1993. Google ScholarDigital Library
{21} V.V. Dimakopoulos and N.J. Dimopoulos. Optimal Total Exchange in Linear Arrays and Rings. In Proceedings of the 1994 International Symposium on Parallel Architectures, Algorithms, and Networks, pages 230-237, Kanazawa, Japan, December 1994.Google ScholarCross Ref
{22} A.C. Dusseau. Modeling Parallel Sorts with LogP on the CM- 5. Technical Report UCB//CSD-94-829, Computer Science Division, University of California, Berkeley, 1994. Google ScholarDigital Library
{23} N. Folwell, S. Guha, and I. Suzuki. A Practical Algorithm for Integer Sorting on a Mesh-Connected Computer. In Proceedings of the High Performance Computing Symposium, pages 281-291, Montreal, Canada, July 1995. Preliminary Version.Google Scholar
{24} A.V. Gerbessiotis and L.G. Valiant. Direct Bulk-Synchronous Parallel Algorithms. Journal of Parallel and Distributed Computing , 22(2):251-267, 1994. Google ScholarDigital Library
{25} S. Heller. Congestion-Free Routing on the CM-5 Data Router. In Proceedings of the First International Workshop on Parallel Computer Routing and Communication, pages 176-184, Seattle, WA, May 1994. Springer-Verlag. Google ScholarDigital Library
{26} S. Hinrichs, C. Kosak, D.R. O'Hallaron, T.M. Strickler, and R. Take. An architecture for optimal all-to-all personalized communication. Technical Report CMU-CS-94-140, School of Computer Science, Carnegie Mellon University, September 1994. Google ScholarDigital Library
{27} T. Horie and K. Hayashi. All-to-All Personalized Communication on a Wrap-around Mesh. In Proceedings of the Second Fujitsu-ANU CAP Workshop, Canberra, Austrailia, November 1991. 10 pp.Google Scholar
{28} J. JáJá and K.W. Ryu. The Block Distributed Memory Model. Technical Report CS-TR-3207, Computer Science Department, University of Maryland, College Park, January 1994. To appear in IEEE Transactions on Parallel and Distributed Systems. Google ScholarDigital Library
{29} J.F. JáJá and K.W. Ryu. The Block Distributed Memory Model for Shared Memory Multiprocessors. In Proceedings of the 8th International Parallel Processing Symposium, pages 752-756, Cancún, Mexico, April 1994. (Extended Abstract). Google ScholarDigital Library
{30} S.L. Johnsson and C.-T. Ho. Optimal Broadcasting and Personalized Communication in Hypercubes. IEEE Transactions on Computers, 38(9):1249-1268, 1989. Google ScholarDigital Library
{31} M. Kaufmann, J.F. Sibeyn, and T. Suel. Derandomizing Algorithms for Routing and Sorting on Meshes. In Proceedings of the 5th Symposium on Discrete Algorithms, pages 669-679. ACM-SIAM, 1994. Google ScholarDigital Library
{32} D.E. Knuth. The Art of Computer Programming: Sorting and Searching, volume 3. Addison-Wesley Publishing Company, Reading, MA, 1973.Google Scholar
{33} D. Krizanc. Integer Sorting on a Mesh-Connected Array of Processors. Information Processing Letters, 47(6):283-289, 1993. Google ScholarDigital Library
{34} Y.-D. Lyuu and E. Schenfeld. Total Exchange on a Reconfigurable Parallel Architecture. In Proceedings of the Fifth IEEE Symposium on Parallel and Distributed Processing, pages 2-10, Dallas, TX, December 1993.Google ScholarDigital Library
{35} Message Passing Interface Forum. MPI: A Message-Passing Interface Standard. Technical report, University of Tennessee, Knoxville, TN, June 1995. Version 1.1.Google Scholar
{36} S.R. Öhring and S.K. Das. Efficient Communication in the Foldned Petersen Interconnection Networks. In Proceedings of the Sixth International Parallel Architectures and Languages Europe Conference, pages 25-36, Athens, Greece, July 1994. Springer-Verlag. Google ScholarDigital Library
{37} S. Ranka, R.V. Shankar, and K.A. Alsabti. Many-to-many Personalized Communication with Bounded Traffic. In The Fifth Symposium on the Frontiers of Massively Parallel Computation, pages 20-27, McLean, VA, February 1995. Google ScholarDigital Library
{38} S. Rao, T. Suel, T. Tsantilas, and M. Goudreau. Efficient Communication Using Total-Exchange. In Proceedings of the 9th International Parallel Processing Symposium, pages 544-550, Santa Barbara, CA, April 1995. Google ScholarDigital Library
{39} T. Schmiermund and S.R. Seidel. A Communication Model for the Intel iPSC/2. Technical Report Technical Report CS-TR 9002, Dept. of Computer Science, Michigan Tech. Univ., April 1990.Google Scholar
{40} D.S. Scott. Efficient All-to-All Communication Patterns in Hypercube and Mesh Topologies. In Proceedings of the 6th Distributed Memory Computing Conference, pages 398-403, Portland, OR, April 1991.Google Scholar
{41} T. Suel. Routing and Sorting on Meshes with Row and Column Buses. Technical Report UTA//CS-TR-94-09, Department of Computer Sciences, University of Texas at Austin, October 1994. Google ScholarDigital Library
{42} R. Take. A Routing Method for All-to-All Burst on Hypercube Networks. In Proceedings of the 35th National Conference of Information Processing Society of Japan, pages 151-152, 1987. In Japanese. Translation by personal communication with R. Take.Google Scholar
{43} R. Thakur and A. Choudhary. All-to-All Communication on Meshes with Wormhole Routing. In Proceedings of the 8th International Parallel Processing Symposium, pages 561-565, Cancún, Mexico, April 1994. Google ScholarDigital Library
{44} R. Thakur, A. Choudhary, and G. Fox. Complete Exchange on a Wormhole Routed Mesh. Report SCCS-505, Northeast Parallel Architectures Center, Syracuse University, Syracuse, NY, July 1993.Google Scholar
{45} R. Thakur, R. Ponnusamy, A. Choudhary, and G. Fox. Complete Exchange on the CM-5 and Touchstone Delta. Journal of Supercomputing , 8:305-328, 1995. (An earlier version of this paper was presented at Supercomputing '92.). Google ScholarDigital Library
{46} L.G. Valiant. A Bridging Model for Parallel Computation. Communications of the ACM, 33(8):103-111, 1990. Google ScholarDigital Library
{47} J.-C. Wang, T.-H. Lin, and S. Ranka. Distributed Scheduling of Unstructured Collective Communication on the CM-5. Technical Report CRPC-TR94502, Syracuse University, Syracuse, NY, 1994.Google Scholar
{48} S.C. Woo, M. Ohara, E. Torrie, J.P. Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 24-36, June 1995. Google ScholarDigital Library

Index Terms

Practical parallel algorithms for personalized communication and integer sorting
1. Theory of computation
  1. Design and analysis of algorithms
    1. Data structures design and analysis
      1. Sorting and searching
  2. Models of computation
    1. Concurrency
      1. Parallel computing models

Recommendations

Communication-Efficient Parallel Sorting

We study the problem of sorting n numbers on a p-processor bulk-synchronous parallel (BSP) computer, which is a parallel multicomputer that allows for general processor-to-processor communication rounds provided each processor sends and receives at most ...
Read More
A Comparison of Parallel Sorting Algorithms on Different Architectures
Read More
Parallel integer sorting and simulation amongst CRCW models
Abstract
In this paper a general technique for reducing processors in simulation without any increase in time is described. This results in an O(√logn) time algorithm for simulating one step of PRIORITY on TOLERANT with processor-time product of O(n log ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Journal of Experimental Algorithmics Volume 1, Issue
1996
104 pages
ISSN:1084-6654
EISSN:1084-6654
DOI:10.1145/235141
Editor:
Bernard M. E. Moret
Univ. of New Mexico, Albuquerque
Issue’s Table of Contents
Copyright © 1996 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 January 1996
Published in jea Volume 1, Issue
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 13
  Total Citations
  View Citations
- 544
  Total Downloads
- Downloads (Last 12 months)47
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Practical parallel algorithms for personalized communication and integer sorting

ACM Journal of Experimental Algorithmics

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Communication-Efficient Parallel Sorting

A Comparison of Parallel Sorting Algorithms on Different Architectures

Parallel integer sorting and simulation amongst CRCW models

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Practical parallel algorithms for personalized communication and integer sorting

ACM Journal of Experimental Algorithmics

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Communication-Efficient Parallel Sorting

A Comparison of Parallel Sorting Algorithms on Different Architectures

Parallel integer sorting and simulation amongst CRCW models

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media