skip to main content
10.1145/2611462.2611482acmconferencesArticle/Chapter ViewAbstractPublication PagespodcConference Proceedingsconference-collections
research-article

Software-improved hardware lock elision

Published:15 July 2014Publication History

ABSTRACT

With hardware transactional memory (HTM) becoming available in mainstream processors, lock-based critical sections may now initiate a hardware transaction instead of taking the lock, enabling their concurrent execution unless a real data conflict occurs. However, just a few transactional aborts can cause the lock to be acquired non-transactionally resulting in the serialization of all the threads, severely degrading the amount of speedup obtained. In this paper we provide two software extension mechanisms that considerably improve the concurrency and speedup levels attained by lock based programs using HTM-based lock elision. The first sacrifices opacity to achieve higher levels of concurrency, and the second retains opacity while reaching slightly lower levels of concurrency.

Evaluation on STAMP and on data structure benchmarks on an Intel Haswell processor shows that these techniques improve the speedup by up to 3.5 times and $10$ times respectively, compared to using Haswell's hardware lock elision as is.

References

  1. Intel 64 and IA-32 Architectures Optimization Reference Manual.Google ScholarGoogle Scholar
  2. Intel Architecture Instruction Set Extensions Programming Reference.Google ScholarGoogle Scholar
  3. Y. Afek, A. Levy, and A. Morrison. Programming with hardware lock elision. In PPoPP 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Y. Afek, A. Levy, and A. Morrison. Software-Improved Hardware Lock Elision. Technical report, Tel Aviv University.Google ScholarGoogle Scholar
  5. Y. Afek, A. Matveev, and N. Shavit. Pessimistic software lock-elision. In DISC 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Bienia. Benchmarking Modern Multiprocessors. PhD thesis, Princeton University, January 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Bobba, K. E. Moore, H. Volos, L. Yen, M. D. Hill, M. M. Swift, and D. A. Wood. Performance pathologies in hardware transactional memory. In ISCA 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. H. W. Cain, M. M. Michael, B. Frey, C. May, D. Williams, and H. Le. Robust architectural support for transactional memory in the power architecture. In ISCA 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. I. Calciu, T. Shpeisman, G. Pokam, and M. Herlihy. Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory. In TRANSACT 2014.Google ScholarGoogle Scholar
  10. C. Cao Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford transactional applications for multi-processing. In IISWC 2008.Google ScholarGoogle ScholarCross RefCross Ref
  11. T. S. Craig. Building FIFO and priority-queuing spin locks from atomic swap. Technical Report 93-02-02, Department of Computer Science and Engineering, University of Washington, 1993.Google ScholarGoogle Scholar
  12. D. Dice, Y. Lev, M. Moir, D. Nussbaum, and M. Olszewski. Early experience with a commercial hardware transactional memory implementation. Technical Report TR-2009-180, Sun Microsystems, 2009. Google ScholarGoogle Scholar
  13. N. Diegues and P. Romano. Time-warp: Lightweight Abort Minimization in Transactional Memory. In PPoPP 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Guerraoui and M. Kapalka. On the correctness of transactional memory. In PPoPP 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Herlihy. Wait-free synchronization. ACM TOPLAS, 13:124--149, January 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Herlihy and J. E. B. Moss. Transactional memory: architectural support for lock-free data structures. In ISCA 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. S. Magnusson, A. Landin, and E. Hagersten. Queue locks on cache coherent multiprocessors. In ISPP '94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. M. Mellor-Crummey and M. L. Scott. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM TOCS, 9(1):21--65, Feb. 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Papagiannopoulou, G. Capodanno, R. I. Bahar, T. Moreshet, A. Holla, and M. Herlihy. Energy-Efficient and High-Performance Lock Speculation Hardware for Embedded Multicore Systems. In TRANSACT 2013.Google ScholarGoogle Scholar
  20. N. Piggin. x86: FIFO ticket spinlocks. http://lkml.org/lkml/2007/11/1/125, 2007.Google ScholarGoogle Scholar
  21. R. Rajwar and J. R. Goodman. Speculative Lock Elision: enabling highly concurrent multithreaded execution. In MICRO 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Rajwar and J. R. Goodman. Transactional lock-free execution of lock-based programs. In ASPLOS 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Roy, S. Hand, and T. Harris. A runtime system for software lock elision. In EuroSys 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Wang, M. Gaudet, P. Wu, J. N. Amaral, M. Ohmacht, C. Barton, R. Silvera, and M. Michael. Evaluation of Blue Gene/Q hardware support for transactional memories. In PACT 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Software-improved hardware lock elision

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        PODC '14: Proceedings of the 2014 ACM symposium on Principles of distributed computing
        July 2014
        444 pages
        ISBN:9781450329446
        DOI:10.1145/2611462

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 15 July 2014

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        PODC '14 Paper Acceptance Rate39of141submissions,28%Overall Acceptance Rate740of2,477submissions,30%

        Upcoming Conference

        PODC '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader