Skip to main content

More Efficient Reduction Algorithms for Non-Power-of-Two Number of Processors in Message-Passing Parallel Systems

  • Conference paper
Recent Advances in Parallel Virtual Machine and Message Passing Interface (EuroPVM/MPI 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3241))

Abstract

We present improved algorithms for global reduction operations for message-passing systems. Each of p processors has a vector of m data items, and we want to compute the element-wise “sum” under a given, associative function of the p vectors. The result, which is also a vector of m items, is to be stored at either a given root processor (MPI_Reduce), or all p processors (MPI_Allreduce). A further constraint is that for each data item and each processor the result must be computed in the same order, and with the same bracketing. Both problems can be solved in O(m+log2 p) communication and computation time. Such reduction operations are part of MPI (the Message Passing Interface), and the algorithms presented here achieve significant improvements over currently implemented algorithms for the important case where p is not a power of 2. Our algorithm requires ⌈log2 p⌉ + 1 rounds – one round off from optimal – for small vectors. For large vectors twice the number of rounds is needed, but the communication and computation time is less than 3 and 3/2, respectively, an improvement from 4 and 2 achieved by previous algorithms (with the message transfer time modeled as α + , and reduction-operation execution time as ). For p=3× 2n and p=9× 2n and small mb for some threshold b, and p=q 2n with small q, our algorithm achieves the optimal ⌈log2 p⌉ number of rounds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barnett, M., Gupta, S., Payne, D., Shuler, L., van de Gejin, R., Watts, J.: Interprocessor collective communication library (InterCom). In: Proceedings of Supercomputing 1994 (November 1994)

    Google Scholar 

  2. Bar-Noy, A., Bruck, J., Ho, C.-T., Kipnis, S., Schieber, B.: Computing global combine operations in the multiport postal model. IEEE Transactions on Parallel and Distributed Systems 6(8), 896–900 (1995)

    Article  Google Scholar 

  3. Bar-Noy, A., Kipnis, S., Schieber, B.: An optimal algorithm for computing census functions in message-passing systems. Parallel Processing Letters 3(1), 19–23 (1993)

    Article  Google Scholar 

  4. Blum, E.K., Wang, X., Leung, P.: Architectures and message-passing algorithms for cluster computing: Design and performance. Parallel Computing 26, 313–332 (2000)

    Article  MATH  Google Scholar 

  5. Bruck, J., Ho, C.-T.: Efficient global combine operations in multi-port messagepassing systems. Parallel Processing Letters 3(4), 335–346 (1993)

    Article  Google Scholar 

  6. Bruck, J., Ho, C.-T., Kipnis, S., Upfal, E., Weathersby, D.: Efficient algorithms for all-to-all communications in multiport message-passing systems. IEEE Transactions on Parallel and Distributed Systems 8(11), 1143–1156 (1997)

    Article  Google Scholar 

  7. Gabriel, E., Resch, M., Rühle, R.: Implementing MPI with optimized algorithms for metacomputing. In: Proceedings of the MPIDC 1999, Atlanta, USA, pp. 31–41 (1999)

    Google Scholar 

  8. Karonis, N., de Supinski, B., Foster, I., Gropp, W., Lusk, E., Bresnahan, J.: Exploiting hierarchy in parallel computer networks to optimize collective operation performance. In: Proceedings of the 14th International Parallel and Distributed Processing Symposium (IPDPS 2000), pp. 377–384 (2000)

    Google Scholar 

  9. Kielmann, T., Hofman, R.F.H., Bal, H.E., Plaat, A., Bhoedjang, R.A.F.: MPI’s reduction operations in clustered wide area systems. In: Proceedings of the MPIDC 1999, pp. 43–52 (1999)

    Google Scholar 

  10. Knies, A.D., Ray Barriuso, F., Adams III, W.J.H.G.B.: SLICC: A low latency interface for collective communications. In: Proceedings of the 1994 conference on Supercomputing, Washington, D.C, November 14–18, pp. 89–96 (1994)

    Google Scholar 

  11. Pritchard, H., Nicholson, J., Schwarzmeier, J.: Optimizing MPI Collectives for the Cray X1. In: Proceeding of the CUG 2004 conference, Knoxville, Tennessee, USA, May 17-21 (2004) (personal communication)

    Google Scholar 

  12. Rabenseifner, R.: Automatic MPI counter profiling of all users: First results on a CRAY T3E 900-512. In: Proceedings of the Message Passing Interface Developer’s and User’s Conference 1999 (MPIDC 1999), Atlanta, USA, March 1999, pp. 77–85 (1999)

    Google Scholar 

  13. Rabenseifner, R.: Optimization of collective reduction operations. In: Bubak, M., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2004. LNCS, vol. 3036, pp. 1–9. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  14. Snir, M., Otto, S., Huss-Lederman, S., Walker, D., Dongarra, J.: MPI – The Complete Reference, 2nd edn. The MPI Core, vol. 1. MIT Press, Cambridge (1998)

    Google Scholar 

  15. Thakur, R., Gropp, W.D.: Improving the performance of collective operations in MPICH. In: Dongarra, J., Laforenza, D., Orlando, S. (eds.) EuroPVM/MPI 2003. LNCS, vol. 2840, pp. 257–267. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  16. van de Geijn, R.: On global combine operations. Journal of Parallel and Distributed Computing 22, 324–328 (1994)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rabenseifner, R., Träff, J.L. (2004). More Efficient Reduction Algorithms for Non-Power-of-Two Number of Processors in Message-Passing Parallel Systems. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2004. Lecture Notes in Computer Science, vol 3241. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30218-6_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30218-6_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23163-9

  • Online ISBN: 978-3-540-30218-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics