ABSTRACT
We consider the problem of high-speed I/O for a single application running on multiple nodes of a distributed-memory parallel computer. Our model is that the parallel system is connected to an I/O system that provides the interface between the internal connections of the parallel system and one or more external connections, such as HIPPI links. We identify two primary operations for this I/O system: scattering data from a high speed link across several lower speed links and gathering data from multiple links onto a single high speed link. We show that these core operations are the basis of the I/O system, independent of the relative speeds of the internal and external connections.
We identify several architectural features that are critical for supporting high-speed scatter and gather operations. They include flexible routing methods in the parallel system, low overhead communication, and the ability to support multiple data streams in and out of the memory on the I/O node.
- 1.Don Adams. Cray T3D system architecture overview. Cray Research Inc., September 1993. Revision 1.C.Google Scholar
- 2.Tom Blank. The Maspar MP-1 architecture. In IEEE Compcon Spring 1990, pages 20-24, San Francisco, February/March 1990. IEEE, IEEE Computer Society Press.Google ScholarCross Ref
- 3.Rajesh Bordawekar, Juan DelRosario, and alok Choudhary. Design and evaluation of primitives for parallel I/O. In Proceedings of Supercomputing '93, pages 452-461, Oregon, November 1993, ACM/IEEE. Google ScholarDigital Library
- 4.Shekhar Borkar, Robert Cohn, George Cox, Sha Gleason, Thomas Gross, H. T. Kung, Monica Lam, Brian Moore, Craig Peterson, John Pieper, Linda Rankin, P. S. Tseng, Jim Sutton, John Urbanski, and Jon Webb. iWarp: An integrated solution to high-speed parallel computing. In Proceedings of the 1988 International Conference on Supercomputing, pages 330-339, Orlando, Florida, November 1988. IEEE Computer Society and ACM SIGARCH. Google ScholarDigital Library
- 5.Shekhar Borkar, Robert Cohn, George Cox, Thomas Gross, H.T. Kung, Monica Lam, Margie Levine, Brian Moore, Wire Moore, Craig Peterson, Jim Susman, Jim Sutton, John Urbanski, and Jon Webb. Supporting systolic and memory communication in iWarp. In Proceedings of the 17th annaual International Symposium on Computer Architecture, pages 70-81, Seattle, May 1990. ACM/IEEE. Also published as CMU Technical Report CMU-CS-90-197. Google ScholarDigital Library
- 6.Claudson Bornstein and Peter Steenkiste. Data reshuffling in support of fast i/o for distributed memory machines. In preparation, 1993.Google Scholar
- 7.Thinking Mahines Corporation. The Connection Machine CM-5 Technical Summary. Thinking Machines Corporation, 1991.Google Scholar
- 8.Erik DeBenedictis and Peter Madams. nCUBE's parallel I/O with UNIX compatibility. In Proceedings of the Sixth Distributed Memory Computing Conference, pages 270- 277. IEEE, April 1991.Google ScholarCross Ref
- 9.S. Hiranandani, K. Kennedy, and C. W. Tseng. Compiling Fortran D for MIMD distributed-memory machines. Communications of the ACM, 35(8):66-80, August 1992. Google ScholarDigital Library
- 10.R.W. Hockney and Jesshope C.R. Parallel Computers. Adam Hilger Ltd., Bristol, U.K., 1981.Google Scholar
- 11.Intel Corporation. Paragon X/PS Product Overview, March 1991.Google Scholar
- 12.Sigurd L. Lillevik. The Touchstone 30 gigaflop Delta prototype. In Proceedings of the Sixth distribution Memory Computing Conference, pages 671-677. IEEE, April 1991.Google ScholarCross Ref
- 13.Susan LoVeso, Marshall Isman, Andy Nanopoulos, William Nesheim, Ewan D. Milne, and Richard Wheeler. sfs: A parallel file system for the CM-5. In Proceedings of the Summer 1993 USENIX Conference, pages 291-305, Cincinnati, Ohio, June 1993. USENIX.Google Scholar
- 14.nCUBE Corp. nCUBE2: Technical Overview. nCUBE Corporation, Foster City, CA., 1992.Google Scholar
- 15.John R. Nickolls. The design of the Maspar MP-1: A cost effective massively parallel computer. In IEEE Compcon Spring 1990, pages 25-28, San Francisco, February/March 1990. IEEE Computer Society Press.Google ScholarCross Ref
- 16.Paul Pierce. A concurrent file system for a highly parallel mass storage subsystem. In Proceedings of the Fourth Conference on Hypercubes, Concurrent Computers, and Applications, volume 1, pages 155-160, California, March 1989.Google Scholar
- 17.T.W. Pratt, J. C. French, R M. Dickens, and S. A. Janet Jr. A comparison of the architecture and performance of two parallel file systems. In Proceedings of the Fourth Conference on Hypercubes, Concurrent Computers, and Applications, volume 1, pages 161-166, California, March 1989.Google Scholar
- 18.Peter Steenkiste, Michael Hemy, Todd Mummert, and Brian Zill. Architecture and evaluation of a high-speed networking subsystem for distributed-memory systems. In Proceedings of the 21th Annual International Symposium on Computer Architecture. IEEE, May 1994. Google ScholarDigital Library
- 19.Peter A. Steenkiste, Brian D. Zill, H.T. Kung, Steven J. Schlick, Jim Hughes, Bob Kowalski, and John Mullaney. A host interference architecture for high-speed networks. In Proceedings of the 4th IFIP Conference on High Performance Networks, pages A3 1-16, Liege, Belgium, December 1992. IFIP, Elsevier. Google ScholarDigital Library
- 20.Lewis W. Tucker and George G. Robertson. Architecture and applications of the Connection Machine. IEEE Computer 21(8):26-38, August 1988. Google ScholarDigital Library
Index Terms
- Architecture implications of high-speed I/O for distributed-memory computers
Recommendations
High-Performance Radix-2, 3 and 5 Parallel 1-D Complex FFT Algorithms for Distributed-Memory Parallel Computers
In this paper, we propose high-performance radix-2, 3 and 5 parallel 1-D complex FFT algorithms for distributed-memory parallel computers. We use the four-step or six-step FFT algorithms to implement the radix-2, 3 and 5 parallel 1-D complex FFT ...
Comments