ABSTRACT
We consider the balance between compute density and interconnect in Coarse-Grained Reconfigurable Architectures (CGRAs) intended for acceleration of HPC applications. We model a baseline CGRA architecture [2] in the open-source CGRA-ME framework [11] and describe the modelling as a case study. Then, holding the interconnect fabric constant, we create several variants of the baseline CGRA: 1) one having reduced (sparser) compute capability where not all ALUs are fully capable, 2) one having increased (denser) compute capability, where the amount of compute is roughly doubled relative to the baseline, and 3) one with increased I/O bandwidth. In an experimental study, we evaluate all architectures to assess application mappability and resource usage for a set of benchmark applications. We also evaluate silicon area consumption using a standard-cell ASIC flow. Results show the baseline CGRA to be overprovisioned in both compute and interconnect, with the proposed variants offering superior area efficiency.
- 2023. The NanGate FreePDK45 Open Cell Library.Google Scholar
- Boma Adhi, Carlos Cortes, Yiyu Tan, Takuya Kojima, Artur Podobas, and Kentaro Sano. 2022. The Cost of Flexibility: Embedded versus Discrete Routers in CGRAs for HPC. In IEEE CLUSTER.Google Scholar
- Boma Adhi, Carlos Cortes, Yiyu Tan, Takuya Kojima, Artur Podobas, and Kentaro Sano. 2022. Exploration Framework for Synthesizable CGRAs Targeting HPC: Initial Design and Evaluations. In The First International Workshop on Coarse-Grained Reconfigurable Architectures for High-Performance Computing (CGRA4HPC).Google Scholar
- Boma Adhi, Carlos Cortes, Tomohiro Ueno, Yiyu Tan, Takuya Kojima, Artur Podobas, and Kentaro Sano. 2022. Exploring Inter-tile Connectivity for HPC-oriented CGRA with Lower Resource Usage. In IEEE FPT.Google Scholar
- Giovanni Ansaloni, Paolo Bonzini, and Laura Pozzi. 2011. EGRA: A Coarse Grained Reconfigurable Architectural Template. IEEE TVLSI 19, 6 (2011), 1062–1074.Google Scholar
- Oguzhan Atak and Abdullah Atalar. 2012. BilRC: An execution triggered coarse grained reconfigurable architecture. IEEE TVLSI 21, 7 (2012), 1285–1298.Google Scholar
- Thilini Kaushalya Bandara, Dhananjaya Wijerathne, Tulika Mitra, and Li-Shiuan Peh. 2022. REVAMP: A systematic framework for heterogeneous CGRA realization. In ACM ASPLOS. 918–932.Google Scholar
- Volker Baumgarte, Gerd Ehlers, Frank May, Armin Nückel, Martin Vorbach, and Markus Weinhardt. 2003. PACT XPP – A Self-Reconfigurable Data Processing Architecture. Journal of Supercomputing 26, 2 (2003), 167–184.Google ScholarDigital Library
- Frank Bouwens, Mladen Berekovic, Andreas Kanstein, and Georgi Gaydadjiev. 2007. Architectural exploration of the ADRES coarse-grained reconfigurable array. In International Workshop on Applied Reconfigurable Computing. Springer, 1–13.Google ScholarCross Ref
- S. Alexander Chin and Jason H. Anderson. 2018. An Architecture-Agnostic Integer Linear Programming Approach to CGRA Mapping. In IEEE/ACM DAC.Google Scholar
- S. Alexander Chin, Noriaki Sakamoto, Allan Rui, Jim Zhao, Jin Hee Kim, Yuko Hara-Azumi, and Jason Anderson. 2017. CGRA-ME: A unified framework for CGRA modelling and exploration. In IEEE ASAP. 184–189.Google Scholar
- Florent de Dinechin and Bogdan Pasca. 2011. Designing Custom Arithmetic Data Paths with FloPoCo. IEEE Design & Test of Computers 28, 4 (July 2011), 18–27.Google Scholar
- Jens Domke, Kazuaki Matsumura, Mohamed Wahib, Haoyu Zhang, Keita Yashima, Toshiki Tsuchikawa, Yohei Tsuji, Artur Podobas, and Satoshi Matsuoka. 2019. Double-precision FPUs in high-performance computing: an embarrassment of riches?. In IEEE IPDPS. 78–88.Google Scholar
- Carl Ebeling, Darren C Cronquist, and Paul Franklin. 1996. RaPiD – Reconfigurable pipelined datapath. In FPL. 126–135.Google Scholar
- Graham Gobieski, Ahmet Oguz Atli, Kenneth Mai, Brandon Lucia, and Nathan Beckmann. 2021. Snafu: an ultra-low-power, energy-minimal CGRA-generation framework and architecture. In ACM/IEEE ISCA. 1027–1040.Google Scholar
- Seth Copen Goldstein, Herman Schmit, Mihai Budiu, Srihari Cadambi, Matthew Moe, and R Reed Taylor. 2000. PipeRench: A reconfigurable architecture and compiler. Computer 33, 4 (2000), 70–77.Google ScholarDigital Library
- Ujval J Kapasi, William J Dally, Scott Rixner, John D Owens, and Brucek Khailany. 2002. The Imagine stream processor. In IEEE ICCD. 282–288.Google Scholar
- Manupa Karunaratne, Aditi Kulkarni Mohite, Tulika Mitra, and Li-Shiuan Peh. 2017. HyCUBE: A CGRA with reconfigurable single-cycle multi-hop interconnect. In IEEE/ACM DAC.Google Scholar
- Scott Kirkpatrick, C Daniel Gelatt Jr, and Mario P Vecchi. 1983. Optimization by simulated annealing. Science 220, 4598 (1983), 671–680.Google Scholar
- Guangming Lu, Hartej Singh, Ming-Hau Lee, Nader Bagherzadeh, Fadi Kurdahi, 1999. The MorphoSys parallel reconfigurable system. In European Conference on Parallel Processing. Springer, 727–734.Google ScholarCross Ref
- L. McMurchie and C. Ebeling. 1995. PathFinder: A Negotiation-Based Performance-Driven Router for FPGAs. In ACM FPGA. 111–7. https://doi.org/10.1109/FPGA.1995.242049Google ScholarCross Ref
- Bingfeng Mei, Serge Vernalde, Diederik Verkest, Hugo De Man, and Rudy Lauwereins. 2003. ADRES: An architecture with tightly coupled VLIW processor and coarse-grained reconfigurable matrix. In FPL. 61–70.Google Scholar
- Tony Nowatzki, Vinay Gangadhar, Newsha Ardalani, and Karthikeyan Sankaralingam. 2017. Stream-dataflow acceleration. In IEEE/ACM ISCA. 416–429.Google Scholar
- Raghu Prabhakar, Yaqi Zhang, David Koeplinger, Matt Feldman, Tian Zhao, Stefan Hadjis, Ardavan Pedram, Christos Kozyrakis, and Kunle Olukotun. 2017. Plasticine: A reconfigurable architecture for parallel patterns. In ACM/IEEE ISCA. 389–402.Google ScholarDigital Library
- Rohit Prasad, Satyajit Das, Kevin JM Martin, Giuseppe Tagliavini, Philippe Coussy, Luca Benini, and Davide Rossi. 2020. TRANSPIRE: An energy-efficient TRANSprecision floating-point Programmable archItectuRE. In IEEE/ACM DATE. IEEE, 1067–1072.Google Scholar
- Omar Ragheb, Rami Beidas, and Jason Anderson. 2023. Statically Scheduled vs. Elastic CGRA Architectures: Impact on Mapping Feasibility. In Second International Workshop on CGRAs for HPC (CGRA4HPC).Google Scholar
- Omar Ragheb, Tianyi Yu, and Jason Anderson. 2022. Modelling and exploration of elastic CGRAs. In FPL.Google Scholar
- Matthew J. P. Walker and Jason H. Anderson. 2019. Generic Connectivity-Based CGRA Mapping via Integer Linear Programming. In IEEE FCCM. 65–73. https://doi.org/10.1109/FCCM.2019.00019Google ScholarCross Ref
Index Terms
- Exploration of Compute vs. Interconnect Tradeoffs in CGRAs for HPC
Recommendations
Networks-on-Chip for FPGAs: Hard, Soft or Mixed?
Special Issue on 11th International Conference on Field-Programmable Technology (FPT'12) and Special Issue on the 7th International Workshop on Reconfigurable Communication-Centric Systems-on-Chip (ReCoSoC'12)As FPGA capacity increases, a growing challenge is connecting ever-more components with the current low-level FPGA interconnect while keeping designers productive and on-chip communication efficient. We propose augmenting FPGAs with networks-on-chip (...
A Software Scheme for Multithreading on CGRAs
Recent industry trends show a drastic rise in the use of hand-held embedded devices, from everyday applications to medical (e.g., monitoring devices) and critical defense applications (e.g., sensor nodes). The two key requirements in the design of such ...
Design and Implementation of High Performance Architectures with Partially Reconfigurable CGRAs
IPDPSW '13: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD ForumProgrammable hardware built on a regular architecture can be used to address the challenges associated with using many fixed core architectures for applications which have varying compute power requirements during the lifetime of execution. The fine ...
Comments