Abstract
With the increasing miniaturization of transistors, wire delays are becoming a dominant factor in microprocessor performance. To address this issue, a number of emerging architectures contain replicated processing units with software-exposed communication between one unit and another (e.g., Raw, SmartMemories, TRIPS). However, for their use to be widespread, it will be necessary to develop compiler technology that enables a portable, high-level language to execute efficiently across a range of wire-exposed architectures.In this paper, we describe our compiler for StreamIt: a high-level, architecture-independent language for streaming applications. We focus on our backend for the Raw processor. Though StreamIt exposes the parallelism and communication patterns of stream programs, some analysis is needed to adapt a stream program to a software-exposed processor. We describe a partitioning algorithm that employs fission and fusion transformations to adjust the granularity of a stream graph, a layout algorithm that maps a stream graph to a given network topology, and a scheduling strategy that generates a fine-grained static communication pattern for each computational element.We have implemented a fully functional compiler that parallelizes StreamIt applications for Raw, including several load-balancing transformations. Using the cycle-accurate Raw simulator, we demonstrate that the StreamIt compiler can automatically map a high-level stream abstraction to Raw without losing performance. We consider this work to be a first step towards a portable programming model for communication-exposed architectures.
- Streamit homepage. http://compiler.lcs.mit.edu/streamit.]]Google Scholar
- The Transputer Databook. Inmos Corporation, 1988.]]Google Scholar
- G. Berry and G. Gonthier. The Esterel Synchronous Programming Language: Design, Semantics, Implementation. Science of Computer Programming, 19(2), 1992.]] Google ScholarDigital Library
- S. S. Bhattacharyya, P. K. Murthy, and E. A. Lee. Software Synthesis from Dataflow Graphs. Kluwer Academic Publishers, 1996.]] Google ScholarDigital Library
- G. Bilsen, M. Engels, R. Lauwereins, and J. Peperstraete. Cyclo-static dataflow. IEEE Trans. on Signal Processing, 1996.]]Google ScholarDigital Library
- S. Borkar, R. Cohn, G. Cox, S. Gleason, T. Gross, H. T. Kung, M. Lam, B. Moore, C. Peterson, J. Pieper, L. Rankin, P. S. Tseng, J. Sutton, J. Urbanski, and J. Webb. iWarp: An integrated solution to high-speed parallel computing. In Supercomputing, 1988.]] Google ScholarDigital Library
- E. Caspi, M. Chu, R. Huang, J. Yeh, J. Wawrzynek, and A. DeHon. Stream Computations Organized for Reconfigurable Execution (SCORE): Extended Abstract. In Proceedings of the Conference on Field Programmable Logic and Applications, 2000.]] Google ScholarDigital Library
- M. B. T. et. al. The Raw Microprocessor: A Computational Fabric for Software Circuits and General Purpose Programs. IEEE Micro vol 22, Issue 2, 2002.]] Google ScholarDigital Library
- J. Gaudiot, W. Bohm, T. DeBoni, J. Feo, and P. Mille". The Sisal Model of Functional Programming and its Implementation. In Proceedings of the Second Aizu International Symposium on Parallel Algorithms/Architectures Synthesis, 1997.]] Google ScholarDigital Library
- T. Gautier, P. L. Guernic, and L. Besnard. Signal: A declarative language for synchronous programming of real-time systems. Springer Verlag Lecture Notes in Computer Science, 274, 1987.]] Google ScholarDigital Library
- V. Gay-Para, T. Graf, A.-G. Lemonnier, and E. Wais. Kopi Reference manual. http://www.dms.at/kopi/docs/kopi.html, 2001.]]Google Scholar
- T. Gross and D. R. O'Halloron. iWarp, Anatomy of a Parallel Computing System. MIT Press, 1998.]] Google ScholarDigital Library
- N. Halbwachs, P. Caspi, P. Raymond, and D. Pilaud. The synchronous data-flow programming language LUSTRE. Proc. of the IEEE, 79(9), 1991.]]Google ScholarCross Ref
- R. Ho, K. Mai, and M. Horowitz. The Future of Wires. In Proc. of the IEEE, 2001.]]Google Scholar
- C. A. R. Hoare. Communicating sequential processes. Communications of the ACM, 21(8), 1978.]] Google ScholarDigital Library
- Inmos Corporation. Occam 2 Reference Manual. Prentice Hall, 1988.]]Google Scholar
- U. J. Kapasi, P. Mattson, W. J. Dally, J. D. Owens, and B. Towles. Stream scheduling. In Proc. of the 3rd Workshop on Media and Streaming Processors, 2001.]]Google Scholar
- S. Kirkpatrick, J. C. D. Gelatt, and M. Vecchi. Optimization by Simulated Annealing. Science, 220(4598), May 1983.]]Google Scholar
- J. Lebak. Polymorphous Computing Architecture (PCA) Example Applications and Description. External Report, Lincoln Laboratory, Mass. Inst. of Technology, 2001.]]Google Scholar
- E. A. Lee. Overview of the Ptolemy Project. UCB/ERL Technical Memorandum UCB/ERL M01/11, Dept. EECS, University of California, Berkeley, CA, March 2001.]]Google Scholar
- W. Lee, R. Barua, M. Frank, D. Srikrishna, J. Babb, V. Sarkar, and S. P. Amarasinghe. Space-Time Scheduling of Instruction-Level Parallelism on a Raw Machine. In Architectural Support for Programming Languages and Operating Systems, 1998.]] Google ScholarDigital Library
- K. Mai, T. Paaske, N. Jayasena, R. Ho, W. Dally, and M. Horowitz. Smart memories: A modular recongurable architecture. In ISCA 2000, Vancouver, BC, Canada.]] Google ScholarDigital Library
- D. May, R. Shepherd, and C. Keane. Communicating Process Architecture: Transputers and Occam. Future Parallel Computers: An Advanced Course, Pisa, Lecture Notes in Computer Science, 272, June 1987.]] Google ScholarDigital Library
- A. Mitschele-Thiel. Automatic Configuration and Optimization of Parallel Transputer Applications. Transputer Applications and Systems '93, 1993.]]Google Scholar
- D. R. O'Hallaron. The ASSIGN Parallel Program Generator. Carnegie Mellon Technical Report CMU-CS-91-141, 1991.]]Google Scholar
- T. A. Proebsting and S. A. Watterson. Filter Fusion. In POPL, 1996.]] Google ScholarDigital Library
- S. Rixner, W. J. Dally, U. J. Kapasi, B. Khailany, A. Lopez-Lagunas, P. R. Mattson, and J. D. Owens. A bandwidth-efficient architecture for media processing. In International Symposium on Microarchitecture, 1998.]] Google ScholarDigital Library
- K. Sankaralingam, R. Nagarajan, S. Keckler, and D. Burger. A Technology-Scalable Architecture for Fast Clocks and High ILP. University of Texas at Austin, Dept. of Computer Sciences Technical Report TR-01-02, 2001.]] Google ScholarDigital Library
- R. Stephens. A Survey of Stream Processing. Acta Informatica, 34(7), 1997.]]Google ScholarCross Ref
- M. B. Taylor, W. Lee, S. Amarasinghe, and A. Agarwal. Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architectures. Technical Report MIT-LCS-TR-859, Mass. Inst. of Technology, July 2002.]]Google Scholar
- W. Thies, M. Karczmarek, and S. Amarasinghe. StreamIt: A Language for Streaming Applications. In Proceedings of the International Conference on Compiler Construction, to appear, Grenoble, France, 2002.]] Google ScholarDigital Library
- W. Thies, M. Karczmarek, M. Gordon, D. Maze, J. Wong, H. Hoffmann, M. Brown, and S. Amarasinghe. StreamIt: A Compiler for Streaming Applications. MIT-LCS Technical Memo LCS-TM-622, Cambridge, MA, 2001.]]Google Scholar
- E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J. Babb, S. Amarasinghe, and A. Agarwal. Baring it all to software: Raw machines. IEEE Computer, 30(9), 1997.]] Google ScholarDigital Library
Index Terms
- A stream compiler for communication-exposed architectures
Recommendations
A stream compiler for communication-exposed architectures
With the increasing miniaturization of transistors, wire delays are becoming a dominant factor in microprocessor performance. To address this issue, a number of emerging architectures contain replicated processing units with software-exposed ...
A stream compiler for communication-exposed architectures
With the increasing miniaturization of transistors, wire delays are becoming a dominant factor in microprocessor performance. To address this issue, a number of emerging architectures contain replicated processing units with software-exposed ...
A stream compiler for communication-exposed architectures
ASPLOS X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systemsWith the increasing miniaturization of transistors, wire delays are becoming a dominant factor in microprocessor performance. To address this issue, a number of emerging architectures contain replicated processing units with software-exposed ...
Comments