Article

Helper function inlining in dynamic binary translation

Author:
Wenwen Wang

University of Georgia, USA

University of Georgia, USA
View Profile

CC 2021: Proceedings of the 30th ACM SIGPLAN International Conference on Compiler ConstructionMarch 2021Pages 107–118https://doi.org/10.1145/3446804.3446851

Published:27 February 2021Publication History

CC 2021: Proceedings of the 30th ACM SIGPLAN International Conference on Compiler Construction

Pages 107–118

ABSTRACT

Dynamic binary translation (DBT) is the cornerstone of many important applications. Yet, it takes a tremendous effort to develop and maintain a real-world DBT system. To mitigate the engineering effort, helper functions are frequently employed during the development of a DBT system. Though helper functions greatly facilitate the DBT development, their adoption incurs substantial performance overhead due to the helper function calls. To solve this problem, this paper presents a novel approach to inline helper functions in DBT systems. The proposed inlining approach addresses several unique technical challenges. As a result, the performance overhead introduced by helper function calls can be reduced, and meanwhile, the benefits of helper functions for DBT development are not lost. We have implemented a prototype based on the proposed inlining approach using a popular DBT system, QEMU. Experimental results on the benchmark programs from the SPEC CPU 2017 benchmark suite show that an average of 1.2x performance speedup can be achieved. Moreover, the translation overhead introduced by inlining helper functions is negligible.

References

2019. IEEE Standard for Floating-Point Arithmetic. IEEE Std 754-2019 (Revision of IEEE 754-2008) ( 2019 ), 1-84. htps://doi.org/10.1109/ IEEESTD. 2019.8766229Google Scholar
Andrew Ayers, Richard Schooler, and Robert Gottlieb. 1997. Aggressive Inlining. In Proceedings of the ACM SIGPLAN 1997 Conference on Programming Language Design and Implementation (Las Vegas, Nevada, USA) ( PLDI '97). Association for Computing Machinery, New York, NY, USA, 134-145. htps://doi.org/10.1145/258915.258928Google Scholar
Fabrice Bellard. 2005. QEMU, a Fast and Portable Dynamic Translator. In Proceedings of the Annual Conference on USENIX Annual Technical Conference (Anaheim, CA) ( ATC '05). USENIX, USA, 41-46.Google ScholarDigital Library
Derek Bruening and Vladimir Kiriansky. 2008. Process-Shared and Persistent Code Caches. In Proceedings of the Fourth ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments. Association for Computing Machinery, New York, NY, USA, 61-70. htps://doi.org/10.1145/1346256.1346265Google ScholarDigital Library
Brad Calder and Dirk Grunwald. 1994. Reducing Indirect Function Call Overhead in C++ Programs. In Proceedings of the 21st ACM SIGPLANSIGACT Symposium on Principles of Programming Languages (Portland, Oregon, USA) ( POPL '94). Association for Computing Machinery, New York, NY, USA, 397-408. htps://doi.org/10.1145/174675.177973Google ScholarDigital Library
John Cavazos and Michael F. P. O'Boyle. 2005. Automatic Tuning of Inlining Heuristics. In Proceedings of the 2005 ACM/IEEE Conference on Supercomputing (SC '05). IEEE Computer Society, USA, 14. htps: //doi.org/10.1109/SC. 2005.14Google Scholar
Emilio G. Cota, Paolo Bonzini, Alex Bennée, and Luca P. Carloni. 2017. Cross-ISA Machine Emulation for Multicores. In Proceedings of the 2017 International Symposium on Code Generation and Optimization (Austin, USA) ( CGO '17). IEEE Press, 210-220.Google Scholar
Peter Feiner, Angela Demke Brown, and Ashvin Goel. 2012. Comprehensive Kernel Instrumentation via Dynamic Binary Translation. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems (London, England, UK) (ASPLOS XVII). Association for Computing Machinery, New York, NY, USA, 135-146. htps://doi.org/10.1145/2150976.2150992Google ScholarDigital Library
GCC. 2020. Optimization Options. htps://gcc.gnu.org/onlinedocs/ gcc/Optimize-Options.html.Google Scholar
Byron Hawkins, Brian Demsky, Derek Bruening, and Qin Zhao. 2015. Optimizing Binary Translation of Dynamically Generated Code. In Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization (San Francisco, California) ( CGO '15). IEEE Computer Society, USA, 68-78.Google ScholarCross Ref
Shiliang Hu and James E. Smith. 2004. Using Dynamic Binary Translation to Fuse Dependent Instructions. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization (Palo Alto, California) (CGO '04). IEEE Computer Society, USA, 213.Google Scholar
Suresh Jagannathan and Andrew Wright. 1996. Flow-Directed Inlining. In Proceedings of the ACM SIGPLAN 1996 Conference on Programming Language Design and Implementation (Philadelphia, Pennsylvania, USA) ( PLDI '96). Association for Computing Machinery, New York, NY, USA, 193-205. htps://doi.org/10.1145/231379.231417Google Scholar
Jinhu Jiang, Rongchao Dong, Zhongjun Zhou, Changheng Song, Wenwen Wang, Pen-Chung Yew, and Weihua Zhang. 2020. More with Less-Deriving More Translation Rules with Less Training Data for DBTs Using Parameterization. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 415-426. htps: //doi.org/10.1109/MICRO50266. 2020.00043Google Scholar
LLVM. 2020. Inlining. htps://clang.llvm.org/docs/analyzer/developerdocs/IPA.html.Google Scholar
Guilherme Ottoni, Thomas Hartin, Christopher Weaver, Jason Brandt, Belliappa Kuttanna, and Hong Wang. 2011. Harmonia: A Transparent, Eficient, and Harmonious Dynamic Binary Translator Targeting the Intel® Architecture. In Proceedings of the 8th ACM International Conference on Computing Frontiers (Ischia, Italy) (CF '11). Association for Computing Machinery, New York, NY, USA, Article 26, 10 pages. htps://doi.org/10.1145/2016604.2016635Google ScholarDigital Library
Maksim Panchenko, Rafael Auler, Bill Nell, and Guilherme Ottoni. 2019. BOLT: A Practical Binary Optimizer for Data Centers and Beyond. In Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization (Washington, DC, USA) ( CGO '19). IEEE Press, 2-14.Google ScholarCross Ref
Ian Piumarta and Fabio Riccardi. 1998. Optimizing Direct Threaded Code by Selective Inlining. In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation (Montreal, Quebec, Canada) ( PLDI '98). Association for Computing Machinery, New York, NY, USA, 291-300. htps://doi.org/10.1145/ 277650.277743Google ScholarDigital Library
Aleksandar Prokopec, Gilles Duboscq, David Leopoldseder, and Thomas Würthinger. 2019. An Optimization-Driven Incremental Inline Substitution Algorithm for Just-in-Time Compilers. In Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization (Washington, DC, USA) ( CGO '19). IEEE Press, 164-179.Google ScholarCross Ref
Vijay Janapa Reddi, Dan Connors, Robert Cohn, and Michael D. Smith. 2007. Persistent Code Caching: Exploiting Code Reuse Across Executions and Applications. In Proceedings of the International Symposium on Code Generation and Optimization (CGO '07). IEEE Computer Society, USA, 74-88. htps://doi.org/10.1109/CGO. 2007.29Google ScholarDigital Library
Douglas Simon, John Cavazos, Christian Wimmer, and Sameer Kulkarni. 2013. Automatic Construction of Inlining Heuristics Using Machine Learning. In Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO '13). IEEE Computer Society, USA, 1-12. htps://doi.org/10.1109/CGO. 2013.6495004Google ScholarDigital Library
Changheng Song, Wenwen Wang, Pen-Chung Yew, Antonia Zhai, and Weihua Zhang. 2019. Unleashing the Power of Learning: An Enhanced Learning-Based Approach for Dynamic Binary Translation. In Proceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference (Renton, WA, USA) ( USENIX ATC '19). USENIX Association, USA, 77-89.Google Scholar
Tom Spink, Harry Wagstaf, and Björn Franke. 2019. A Retargetable System-Level DBT Hypervisor. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). USENIX Association, Renton, WA, 505-520. htps://www.usenix.org/conference/atc19/presentation/spinkGoogle Scholar
Standard Performance Evaluation Corporation. 2020. SPEC CPU 2017. htps://www.spec.org/cpu2017.Google Scholar
Levon Stepanian, Angela Demke Brown, Allan Kielstra, Gita Koblents, and Kevin Stoodley. 2005. Inlining Java Native Calls at Runtime. In Proceedings of the 1st ACM/USENIX International Conference on Virtual Execution Environments (Chicago, IL, USA) ( VEE '05). Association for Computing Machinery, New York, NY, USA, 121-131. htps://doi.org/ 10.1145/1064979.1064997Google ScholarDigital Library
Wenwen Wang, Stephen McCamant, Antonia Zhai, and Pen-Chung Yew. 2018. Enhancing Cross-ISA DBT Through Automatically Learned Translation Rules. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (Williamsburg, VA, USA) ( ASPLOS '18). Association for Computing Machinery, New York, NY, USA, 84-97. htps://doi. org/10.1145/3173162.3177160Google ScholarDigital Library
Wenwen Wang, Chenggang Wu, Tongxin Bai, Zhenjiang Wang, Xiang Yuan, and Huimin Cui. 2014. A Pattern Translation Method for Flags in Binary Translation. Journal of Computer Research and Development 51, 10 ( 2014 ), 2336-2347. htp://crad.ict.ac.cn/EN/10.7544/issn1000-1239. 2014.20130018Google Scholar
Wenwen Wang, Jiacheng Wu, Xiaoli Gong, Tao Li, and Pen-Chung Yew. 2018. Improving Dynamically-Generated Code Performance on Dynamic Binary Translators. In Proceedings of the 14th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (Williamsburg, VA, USA) ( VEE '18). Association for Computing Machinery, New York, NY, USA, 17-30. htps://doi.org/10.1145/ 3186411.3186413Google ScholarDigital Library
Wenwen Wang, Pen-Chung Yew, Antonia Zhai, and Stephen McCamant. 2016. A General Persistent Code Caching Framework for Dynamic Binary Translation (DBT). In Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference (Denver, CO, USA) ( USENIX ATC '16). USENIX Association, USA, 591-603.Google Scholar
Wenwen Wang, Pen-Chung Yew, Antonia Zhai, and Stephen McCamant. 2020. Eficient and Scalable Cross-ISA Virtualization of Hardware Transactional Memory. In Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization (San Diego, CA, USA) ( CGO '20). Association for Computing Machinery, New York, NY, USA, 107-120. htps://doi.org/10.1145/3368826.3377919Google Scholar
Wenwen Wang, Pen-Chung Yew, Antonia Zhai, Stephen McCamant, Youfeng Wu, and Jayaram Bobba. 2017. Enabling Cross-ISA Ofloading for COTS Binaries. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services (Niagara Falls, New York, USA) ( MobiSys '17). Association for Computing Machinery, New York, NY, USA, 319-331. htps://doi.org/10.1145/3081333.3081337Google ScholarDigital Library
Jin Wu, Jian Dong, Ruili Fang, Wenwen Wang, and Decheng Zuo. 2020. PerfDBT: Eficient Performance Regression Testing of Dynamic Binary Translation. In 2020 IEEE 38th International Conference on Computer Design (ICCD). 389-392. htps://doi.org/10.1109/ICCD50377. 2020. 00071Google Scholar
Xiaochun Zhang, Qi Guo, Yunji Chen, Tianshi Chen, and Weiwu Hu. 2015. HERMES: A Fast Cross-ISA Binary Translator with PostOptimization. In Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization (San Francisco, California) ( CGO '15). IEEE Computer Society, USA, 246-256.Google Scholar
Ziyi Zhao, Zhang Jiang, Ximing Liu, Xiaoli Gong, Wenwen Wang, and Pen-Chung Yew. 2020. DQEMU: A Scalable Emulator with Retargetable DBT on Distributed Platforms. In 49th International Conference on Parallel Processing-ICPP (Edmonton, AB, Canada) ( ICPP '20). Association for Computing Machinery, New York, NY, USA, Article 7, 11 pages. htps://doi.org/10.1145/3404397.3404403Google Scholar

Index Terms

Helper function inlining in dynamic binary translation
1. Software and its engineering

Recommendations

Low overhead dynamic binary translation on ARM
PLDI '17

The ARMv8 architecture introduced AArch64, a 64-bit execution mode with a new instruction set, while retaining binary compatibility with previous versions of the ARM architecture through AArch32, a 32-bit execution mode. Most hardware implementations ...
Read More
Low overhead dynamic binary translation on ARM
PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation

The ARMv8 architecture introduced AArch64, a 64-bit execution mode with a new instruction set, while retaining binary compatibility with previous versions of the ARM architecture through AArch32, a 32-bit execution mode. Most hardware implementations ...
Read More
Ablego: a function outlining and partial inlining framework: Research Articles

Frequently invoked large functions are common in non-numeric applications. These large functions present challenges to modern compilers not only because they require more time and resources at compilation time, but also because they may prevent ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CC 2021: Proceedings of the 30th ACM SIGPLAN International Conference on Compiler Construction
March 2021
164 pages
ISBN:9781450383257
DOI:10.1145/3446804
General Chair:
Aaron Smith
Microsoft, USA / University of Edinburgh, UK
,
Program Chairs:
Delphine Demange
University of Rennes, France / Inria, France / CNRS, France /IRISA France
,
Rajiv Gupta
University of California at Riverside, USA
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 February 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Binary translation
Compiler optimization
Function inlining
QEMU
Qualifiers
- Article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 247
  Total Downloads
- Downloads (Last 12 months)44
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.