ABSTRACT
Modern multithreaded applications, such as application servers and database engines, can severely stress the performance of user-level memory allocators like the ubiquitous malloc subsystem. Such allocators can prove to be a major scalability impediment for the applications that use them, particularly for applications with large numbers of threads running on high-order multiprocessor systems.This paper introduces Multi-Processor Restartable Critical Sections, or MP-RCS. MP-RCS permits user-level threads to know precisely which processor they are executing on and then to safely manipulate CPU-specific data, such as malloc metadata, without locks or atomic instructions. MP-RCS avoids interference by using upcalls to notify user-level threads when preemption or migration has occurred. The upcall will abort and restart any interrupted critical sections.We use MP-RCS to implement a malloc package, LFMalloc (Lock-Free Malloc). LFMalloc is scalable, has extremely low latency, excellent cache characteristics, and is memory efficient. We present data from some existing benchmarks showing that LFMalloc is often 10 times faster than Hoard, another malloc replacement package.
- T. Anderson, B. Bershad, E. Lazowska and H. Levy. Scheduler Activations: Effective Kernel Support for User-Level Management of Parallelism. ACM Transactions on Computer Systems, 10(1). 1992 Google ScholarDigital Library
- Alan Bawden. PCLSRing: Keeping Process State Modular. Available at ftp://ftp.ai.mit.edu/pub/alan/pclsr.memo. 1993Google Scholar
- Emery Berger, Kathryn McKinley, Robert Blumofe and Paul Wilson. Hoard: A Scalable Memory Allocator for Multithreaded Applications. In ASPLOS-IX: Ninth International Conference on Architectural Support for Programming Languages and Operating Systems. 1997 Google ScholarDigital Library
- Brian N. Bershad. Fast Mutual Exclusion for Uniprocessors. In ASPLOS-V: Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 1992 Google ScholarDigital Library
- Brian N. Bershad. Practical Considerations for Non-Blocking Concurrent Objects. In Proc. International Conference on Distributed Computing Systems, (ICDCS). May 1993Google Scholar
- Hans-J. Boehm. Fast Multiprocessor Memory Allocation and Garbage Collection. HP Labs Technical Report HPL-2000-165. 2000Google Scholar
- Jeff Bonwick and Jonathan Adams. Magazines and Vmem: Extending the Slab Allocator to Many CPUs and Arbitrary Resources. In Proc. USENIX Technical Conference. 2001 Google ScholarDigital Library
- Ben Gamsa, Orran Krieger, Jonathan Appavoo and Michael Stumm. Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System. In Proc. of Symp. On Operating System Design and Implementation. (OSDI-III). 1999 Google ScholarDigital Library
- Wolfram Golger. Dynamic Memory Allocator Implementations in Linux System Binaries. Available at www.dent.med.uni-muenchen.de/~wmglo/malloc-slides.html. Site visited January 2002Google Scholar
- Michael Greenwald. Ph. D. Thesis. Non-Blocking Synchronization and System Design. Stanford University, 1999 Google ScholarDigital Library
- Maurice Herlihy. A Method for Implementing Highly Concurrent Data Objects. ACM Transactions on Programming Languages and Systems (TOPLAS) 15(5), November 1993 Google ScholarDigital Library
- Richard L. Hudson, J. Eliot B. Moss, Sreenivas Subramoney and Weldon Washburn. Cycles to Recycle: Garbage Collection on the IA-64. In Tony Hoskings, editor, ISMM 2000, Proc. Second International Symposium on Memory Management, 36(1). of the ACM SIGPLAN Notices. 2000 Google ScholarDigital Library
- IBM OS/390 MVS Programming: Resource Recovery. 1998. GC28-1739-03Google Scholar
- Theodore Johnson and Krishna Harathi. Interruptible Critical Sections. Dept. of Computer Science, University of Florida. Technical Report TR94-007. 1994Google Scholar
- L. I. Kontothanassis, R. W. Wisniewski, and M. L. Scott. Scheduler-Conscious Synchronization. ACM Trans. on Computer Systems, February 1997 Google ScholarDigital Library
- P. Larson and M. Krishnan. Memory Allocation for Long-Running Server Applications. In International Symp. On Memory Management (ISMM 98). 1988 Google ScholarDigital Library
- Chuck Lever and David Boreham. Malloc() Performance in a Multithreaded Linux Environment. In USENIX Technical Conference, 2000 Google ScholarDigital Library
- Jim Maura and Richard McDougall. Solaris. Internals: Core Kernel Architecture. Sun Microsystems Press. Prentice-Hall. 2001 Google ScholarDigital Library
- Paul McKenney and John Slingwine. Read-Copy Update: Using Execution History to Solve Concurrency Problems. In 10th IASTED International Conference on Parallel and Distributed Computing Systems. (PDCS.98). 1998Google Scholar
- Paul McKenney, Jack Slingwine and Phil Krueger. Experience with a Parallel Memory Allocator. In Software . Practice & Experience. Vol. 31. 2001 Google ScholarDigital Library
- Mark Moir. Practical Implementations of Non-Blocking Synchronization Primitives. In Proc. of the 16th ACM Symposium on the Principles of Distributed Computing. (PODC) 1997 Google ScholarDigital Library
- William Moran and Farnham Jahanian. Cheap Mutual Exclusion. In Proc. USENIX Technical Conference. 1992Google Scholar
- David Mosberger, Peter Druschel and Larry L. Peterson. A Fast and General Software Solution to Mutual Exclusion on Uniprocessors. Technical Report 94-07, Department of Computer Science, University of Arizona. June 1994Google Scholar
- David Mosberger, Peter Druschel and Larry L. Peterson. Implementing Atomic Sequences on Uniprocessors Using Rollforward. In Software . Practice & Experience. Vol. 26, No. 1. January 1996 Google ScholarDigital Library
- E. Moss and W. H. Kohler. Concurrency Features for the trellis/owl Language. In European Conference on Object-Oriented Programming 1987 (ECOOP.87) Google ScholarDigital Library
- Greg Nakhimovsky. Improving Scalability of Multithreaded Dynamic Memory Allocation. In Dr. Dobbs Journal, #326. July 2001 Google ScholarDigital Library
- O. Shivers, J. Clark and R. McGrath. Atomic Heap Transactions and Fine-grain Interrupts. In Proc. International Conference on Functional Programming (ICFP). 1999 Google ScholarDigital Library
- Christopher Small and Margo Seltzer. Scheduler Activations on BSD: Sharing Thread Management State Between Kernel and Application. Harvard Computer Systems Laboratory Technical Report TR-31-95. 1995Google Scholar
- Hiroaki Takada and Ken Sakamura. Real-Time Synchronization Protocols with Abortable Critical Sections. In Proc. of the First Workshop on Real-Time Systems and Applications. (RTCSA). 1994Google Scholar
- John Valois. Lock-Free Data Structures. Ph. D. Thesis, Rensselaer Polytechnic Institute, 1995 Google ScholarDigital Library
- Voon-Yee Vee and Wen-Jing Hsu. A Scalable and Efficient Storage Allocator on Shared-Memory Multiprocessors. In International Symp. of Parallel Architectures, Algorithms, and Networks (I-SPAN 99). 1999 Google ScholarDigital Library
- Paul R. Wilson, Mark S. Johnstone, Michael Neeley and David Boles. Dynamic Storage Allocation: A Survey and Critical Review. In Proc. International Workshop on Memory Management, 1995 Google ScholarDigital Library
Index Terms
- Mostly lock-free malloc
Recommendations
SuperMalloc: a super fast multithreaded malloc for 64-bit machines
ISMM '15: Proceedings of the 2015 International Symposium on Memory ManagementSuperMalloc is an implementation of malloc(3) originally designed for X86 Hardware Transactional Memory (HTM)@. It turns out that the same design decisions also make it fast even without HTM@. For the malloc-test benchmark, which is one of the most ...
Scalable lock-free dynamic memory allocation
PLDI '04: Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementationDynamic memory allocators (malloc/free) rely on mutual exclusion locks for protecting the consistency of their shared data structures under multithreading. The use of locking has many disadvantages with respect to performance, availability, robustness, ...
Mostly lock-free malloc
MSP 2002 and ISMM 2002Modern multithreaded applications, such as application servers and database engines, can severely stress the performance of user-level memory allocators like the ubiquitous malloc subsystem. Such allocators can prove to be a major scalability impediment ...
Comments