ABSTRACT
With hardware transactional memory (HTM) becoming available in mainstream processors, lock-based critical sections may now initiate a hardware transaction instead of taking the lock, enabling their concurrent execution unless a real data conflict occurs. However, just a few transactional aborts can cause the lock to be acquired non-transactionally resulting in the serialization of all the threads, severely degrading the amount of speedup obtained. In this paper we provide two software extension mechanisms that considerably improve the concurrency and speedup levels attained by lock based programs using HTM-based lock elision. The first sacrifices opacity to achieve higher levels of concurrency, and the second retains opacity while reaching slightly lower levels of concurrency.
Evaluation on STAMP and on data structure benchmarks on an Intel Haswell processor shows that these techniques improve the speedup by up to 3.5 times and $10$ times respectively, compared to using Haswell's hardware lock elision as is.
- Intel 64 and IA-32 Architectures Optimization Reference Manual.Google Scholar
- Intel Architecture Instruction Set Extensions Programming Reference.Google Scholar
- Y. Afek, A. Levy, and A. Morrison. Programming with hardware lock elision. In PPoPP 2013. Google ScholarDigital Library
- Y. Afek, A. Levy, and A. Morrison. Software-Improved Hardware Lock Elision. Technical report, Tel Aviv University.Google Scholar
- Y. Afek, A. Matveev, and N. Shavit. Pessimistic software lock-elision. In DISC 2012. Google ScholarDigital Library
- C. Bienia. Benchmarking Modern Multiprocessors. PhD thesis, Princeton University, January 2011. Google ScholarDigital Library
- J. Bobba, K. E. Moore, H. Volos, L. Yen, M. D. Hill, M. M. Swift, and D. A. Wood. Performance pathologies in hardware transactional memory. In ISCA 2007. Google ScholarDigital Library
- H. W. Cain, M. M. Michael, B. Frey, C. May, D. Williams, and H. Le. Robust architectural support for transactional memory in the power architecture. In ISCA 2013. Google ScholarDigital Library
- I. Calciu, T. Shpeisman, G. Pokam, and M. Herlihy. Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory. In TRANSACT 2014.Google Scholar
- C. Cao Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford transactional applications for multi-processing. In IISWC 2008.Google ScholarCross Ref
- T. S. Craig. Building FIFO and priority-queuing spin locks from atomic swap. Technical Report 93-02-02, Department of Computer Science and Engineering, University of Washington, 1993.Google Scholar
- D. Dice, Y. Lev, M. Moir, D. Nussbaum, and M. Olszewski. Early experience with a commercial hardware transactional memory implementation. Technical Report TR-2009-180, Sun Microsystems, 2009. Google Scholar
- N. Diegues and P. Romano. Time-warp: Lightweight Abort Minimization in Transactional Memory. In PPoPP 2014. Google ScholarDigital Library
- R. Guerraoui and M. Kapalka. On the correctness of transactional memory. In PPoPP 2008. Google ScholarDigital Library
- M. Herlihy. Wait-free synchronization. ACM TOPLAS, 13:124--149, January 1991. Google ScholarDigital Library
- M. Herlihy and J. E. B. Moss. Transactional memory: architectural support for lock-free data structures. In ISCA 1993. Google ScholarDigital Library
- P. S. Magnusson, A. Landin, and E. Hagersten. Queue locks on cache coherent multiprocessors. In ISPP '94. Google ScholarDigital Library
- J. M. Mellor-Crummey and M. L. Scott. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM TOCS, 9(1):21--65, Feb. 1991. Google ScholarDigital Library
- D. Papagiannopoulou, G. Capodanno, R. I. Bahar, T. Moreshet, A. Holla, and M. Herlihy. Energy-Efficient and High-Performance Lock Speculation Hardware for Embedded Multicore Systems. In TRANSACT 2013.Google Scholar
- N. Piggin. x86: FIFO ticket spinlocks. http://lkml.org/lkml/2007/11/1/125, 2007.Google Scholar
- R. Rajwar and J. R. Goodman. Speculative Lock Elision: enabling highly concurrent multithreaded execution. In MICRO 2001. Google ScholarDigital Library
- R. Rajwar and J. R. Goodman. Transactional lock-free execution of lock-based programs. In ASPLOS 2002. Google ScholarDigital Library
- A. Roy, S. Hand, and T. Harris. A runtime system for software lock elision. In EuroSys 2009. Google ScholarDigital Library
- A. Wang, M. Gaudet, P. Wu, J. N. Amaral, M. Ohmacht, C. Barton, R. Silvera, and M. Michael. Evaluation of Blue Gene/Q hardware support for transactional memories. In PACT 2012. Google ScholarDigital Library
Index Terms
- Software-improved hardware lock elision
Recommendations
Hardware read-write lock elision
EuroSys '16: Proceedings of the Eleventh European Conference on Computer SystemsHardware Lock Elision (HLE) represents a promising technique to enhance parallelism of concurrent applications relying on conventional, lock-based synchronization. The idea at the basis of current HLE approaches is to wrap critical sections into ...
Programming with hardware lock elision
PPoPP '13: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programmingWe present a simple yet effective technique for improving performance of lock-based code using the hardware lock elision (HLE) feature in Intel's upcoming Haswell processor.
We also describe how to extend Haswell's HLE mechanism to achieve a similar ...
Transactional Lock Elision Meets Combining
PODC '17: Proceedings of the ACM Symposium on Principles of Distributed ComputingFlat combining (FC) and transactional lock elision (TLE) are two techniques that facilitate efficient multi-thread access to a sequentially implemented data structure protected by a lock. FC allows threads to delegate their operations to another (...
Comments