skip to main content
10.1145/3642963.3652204acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article
Free Access

Hawkeyes: Addressing Weak Memory Order in Program Migration Based on Instruction Windows

Published:14 May 2024Publication History

ABSTRACT

Migrating memory systems from x86 to ARM can result in weak memory order issues due to memory model differences. This necessitates the addition of memory barriers to prevent such problems. However, current automatic memory barrier insertion approaches fail to identify all potential locations where WMM (Weak Memory Model) bugs may occur and also often overuse unnecessary memory barriers. To address this issue, we propose Hawkeyes, an approach that combines dynamic memory access conflict detection and instruction windows to locate out-of-order memory access issues in multi-threaded programs. Hawkeyes performs compile-time instrumentation to locate all memory conflicts at run time and analyzes the micro-instructions together with the instruction window to identify out-of-order instruction intervals. By comparing such intervals among different threads, Hawkeyes determines the locations that need to maintain order. We validate the correctness of Hawkeyes on open-source libraries and evaluate Hawkeyes on public benchmarks. We demonstrate that Hawkeyes not only pinpoints all the locations where WMM bugs may appear but also achieves high accuracy in barrier insertion.

References

  1. Alibaba develops its own 5nm 128-core arm-based server chip. https://www.tomshardware.com/news/alibaba-unveils-128-core-server-cpu, 2021.Google ScholarGoogle Scholar
  2. Jade Alglave. A formal hierarchy of weak memory models. Formal Methods in System Design, 41:178--210, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Mark Batty, Kayvan Memarian, Scott Owens, Susmit Sarkar, and Peter Sewell. Clarifying and compiling c/c++ concurrency: from c++ 11 to power. ACM SIGPLAN Notices, 47(1):509--520, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Martin Beck, Koustubha Bhat, Lazar Stričević, Geng Chen, Diogo Behrens, Ming Fu, Viktor Vafeiadis, Haibo Chen, and Hermann Härtig. Atomig: Automatically migrating millions lines of code from tso to wmm. 2023.Google ScholarGoogle Scholar
  5. Jacob Burnim, Koushik Sen, and Christos Stergiou. Testing concurrent programs on relaxed memory models. In Proceedings of the 2011 international symposium on Software Testing and Analysis, pages 122--132, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ernie Cohen and Bert Schirmer. From total store order to sequential consistency: A practical reduction theorem. In Interactive Theorem Proving: First International Conference, ITP 2010, Edinburgh, UK, July 11--14, 2010. Proceedings 1, pages 403--418. Springer, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Joseph Emeras, Sébastien Varrette, Valentin Plugaru, and Pascal Bouvry. Amazon elastic compute cloud (ec2) versus in-house hpc platform: A cost analysis. IEEE Transactions on Cloud Computing, 7(2):456--468, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  8. Kourosh Gharachorloo, Daniel Lenoski, James Laudon, Phillip Gibbons, Anoop Gupta, and John Hennessy. Memory consistency and event ordering in scalable shared-memory multiprocessors. ACM SIGARCH Computer Architecture News, 18(2SI):15--26, 1990.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Caixin Gong, Chengjin Tian, Zhengheng Wang, Sheng Wang, Xiyu Wang, Qiulei Fu, Wu Qin, Long Qian, Rui Chen, Jiang Qi, Ruo Wang, Guoyun Zhu, Chenghu Yang, Wei Zhang, and Feifei Li. Tair-pmem: a fully durable non-volatile memory database. Proc. VLDB Endow., 15(12):3346--3358, aug 2022.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. James R Goodman. Cache consistency and sequential consistency. Technical report, University of Wisconsin-Madison Department of Computer Sciences, 1991.Google ScholarGoogle Scholar
  11. Changyi Gu. Building Embedded Systems: Programmable Hardware. Apress, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Sudheendra Hangal, Durgam Vahia, Chaiyasit Manovit, and Juin-Yeu Joseph Lu. Tsotool: A program for verifying memory systems using the memory consistency model. ACM SIGARCH Computer Architecture News, 32(2):114, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Shahidullah Kaiser, Md Sadun Haq, Ali Şaman Tosun, and Turgay Korkmaz. Container technologies for arm architecture: A comprehensive survey of the state-of-the-art. IEEE Access, 2022.Google ScholarGoogle Scholar
  14. Qiang Li, Qiao Xiang, Yuxin Wang, Haohao Song, Ridi Wen, Wenhui Yao, Yuanyuan Dong, Shuqi Zhao, Shuo Huang, Zhaosheng Zhu, Huayong Wang, Shanyang Liu, Lulu Chen, Zhiwu Wu, Haonan Qiu, Derui Liu, Gexiao Tian, Chao Han, Shaozong Liu, Yaohui Wu, Zicheng Luo, Yuchao Shao, Junping Wu, Zheng Cao, Zhongjie Wu, Jiaji Zhu, Jinbo Wu, Jiwu Shu, and Jiesheng Wu. More than capacity: Performance-oriented evolution of pangu in alibaba. In 21st USENIX Conference on File and Storage Technologies (FAST 23), pages 331--346, Santa Clara, CA, February 2023. USENIX Association.Google ScholarGoogle Scholar
  15. Weiyu Luo and Brian Demsky. C11tester: a race detector for c/c++ atomics. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, pages 630--646, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jeremy Manson, William Pugh, and Sarita V Adve. The java memory model. ACM SIGPLAN Notices, 40(1):378--391, 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Luc Maranget, Susmit Sarkar, and Peter Sewell. A tutorial introduction to the arm and power relaxed memory models. Draft available from http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf, 2012.Google ScholarGoogle Scholar
  18. Paul E McKenney. Memory barriers: a hardware view for software hackers. Linux Technology Center, IBM Beaverton, 2010.Google ScholarGoogle Scholar
  19. Rui Miao, Lingjun Zhu, Shu Ma, Kun Qian, Shujun Zhuang, Bo Li, Shuguang Cheng, Jiaqi Gao, Yan Zhuang, Pengcheng Zhang, Rong Liu, Chao Shi, Binzhang Fu, Jiaji Zhu, Jiesheng Wu, Dennis Cai, and Hongqiang Harry Liu. From luna to solar: the evolutions of the compute-to-storage networks in alibaba cloud. In Proceedings of the ACM SIGCOMM 2022 Conference, SIGCOMM '22, page 753--766, New York, NY, USA, 2022. Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Christopher Pulte, Jean Pichon-Pharabod, Jeehoon Kang, Sung-Hwan Lee, and Chung-Kil Hur. Promising-arm/risc-v: a simpler and faster operational concurrency model. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 1--15, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Konstantin Serebryany and Timur Iskhodzhanov. Threadsanitizer: data race detection in practice. In Proceedings of the workshop on binary instrumentation and applications, pages 62--71, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Peter Sewell, Susmit Sarkar, Scott Owens, Francesco Zappa Nardelli, and Magnus O Myreen. x86-tso: a rigorous and usable programmer's model for x86 multiprocessors. Communications of the ACM, 53(7):89--97, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Bogdan Marius Tudor and Yong Meng Teo. On understanding the energy consumption of arm-based multicore servers. In Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems, pages 267--278, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. John D. Valois. Implementing lock-free queues. 1994.Google ScholarGoogle Scholar
  25. Jing Xia, Chuanning Cheng, Xiping Zhou, Yuxing Hu, and Peter Chun. Kunpeng 920: The first 7-nm chiplet-based 64-core arm soc for cloud services. IEEE Micro, 41(5):67--75, 2021.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    CHEOPS '24: Proceedings of the 4th Workshop on Challenges and Opportunities of Efficient and Performant Storage Systems
    April 2024
    38 pages
    ISBN:9798400705380
    DOI:10.1145/3642963

    Copyright © 2024 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 14 May 2024

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate6of8submissions,75%
  • Article Metrics

    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)20

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader