一個為異質系統仿真器以LLVM為基準的二元轉譯器

通用圖形處理單元（GPGPU）的計算可以更有效的方式與高度並行的加快程序運行。然而，編程模型對程序員不太友好。內存模型是異質的，這樣的編程需要明確的數據傳輸控制系統主內存和GPU設備內存。在另一方面，其他如基礎的除錯調試和代碼分發缺乏支持。來自AMD的異構系統架構（HSA）以紓緩在GPGPU編程複雜性的軟件開發。特色包括共享內存模型, 可用於不同廠商硬體上的中介語言（IR）及更具體的操作控制，如控制GPGPU的環境中誇工作群的內存存取。在本文中，我們提出的以LLVM為基準開發的HSA轉譯器為了在一個HSA仿真器上提供一個快速的HSAIL轉譯。手寫的HSAIL benchmark以及HSAIL的二元組譯器協助確認功能性上的正確性。

關鍵字

LLVM ；仿真器；異質系統；二元轉譯器

並列摘要

General purpose graphical processing unit (GPGPU) computation can speed up the programs with high degree of parallelism in a more power efficient way. However, the programming model is not programmer friendly. The memory model is heterogeneous thus such programming needs explicit data transfer control between system main memory and the GPU device memory from the programmers. On the other hand, other infrastructures such as the debugging and the code distribution are lack of support as well. The Heterogeneous System Architecture (HSA) from AMD rises with such issues to ease the software development in the GPGPU programming. Features including the shared memory model and the re-targetable intermediate representation (IR) with more specific operation controlling such as the cross work group controlling ease the software development in the GPGPU environment. In this paper, we present the HSA Translator for the fast simulation of the HSAIL in the functional level system mode simulator called the HSA Simulator performing the simulation of the HSA environment. It consists of the simulator based on the PQEMU for the simulation of the processing unit in the GPGPU environment. The HSA Translator is implemented in the simulator for the native code translation. The HSA Translator leverages the LLVM infrastructure to translate the kernel source code from the Heterogeneous System Architecture Intermediate Language (HSAIL) to the native re-locatable code. The linking of the native binary is done by a self-implemented link-loader called the HSA Link-Loader implemented in the simulator. The simulation of the kernel processing device is performed by using the host threads in order to speed up the simulation. We evaluate the simulation with the self-translated HSAIL benchmark based on the Rodinia benchmark and the AMD OpenCL samples.

並列關鍵字

LLVM ； Simulator ； HSA ； Binary Translator

參考文獻

[1] Jiun-Hung Ding, Po-Chun Chang, Wei-Chung Hsu, Yeh-Ching Chung, "PQEMU: A Parallel System Emulator Based on QEMU," icpads, pp.276-283, 2011 IEEE 17th Internatio-nal Conference on Parallel and Distributed Systems, 2011.

[4] Thomas B. Jablin, Prakash Prabhu, James A. Jablin, Princeton University, Brown University, Automatic CPU-GPU Communication Management and Optimization,

[5] Chris Lattner, LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation, UIUC, CGO’04.

[6] Sylvain Collange, Marc Daumas, David Defour, David Parello. Barra: A Parallel Functional Simulator for GPGPU. 18th Annual IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), 2010.

[7] Gregory Diamos, Ocelot: A Dynamic Optimization Framework for Bulk-Synchronous Applications in Heterogeneous Systems, Georgia Tech, PACT 2010.

國際替代計量

一個為異質系統仿真器以LLVM為基準的二元轉譯器

全文下載

主題瀏覽