Article

Mostly lock-free malloc

Authors:
Dave Dice

Sun Microsystems, Inc., Burlington, MA

Sun Microsystems, Inc., Burlington, MA
View Profile

,
Alex Garthwaite

Sun Microsystems Laboratories, Burlington, MA

Sun Microsystems Laboratories, Burlington, MA
View Profile

ISMM '02: Proceedings of the 3rd international symposium on Memory managementJune 2002Pages 163–174https://doi.org/10.1145/512429.512451

Published:20 June 2002Publication History

ISMM '02: Proceedings of the 3rd international symposium on Memory management

Pages 163–174

ABSTRACT

Modern multithreaded applications, such as application servers and database engines, can severely stress the performance of user-level memory allocators like the ubiquitous malloc subsystem. Such allocators can prove to be a major scalability impediment for the applications that use them, particularly for applications with large numbers of threads running on high-order multiprocessor systems.This paper introduces Multi-Processor Restartable Critical Sections, or MP-RCS. MP-RCS permits user-level threads to know precisely which processor they are executing on and then to safely manipulate CPU-specific data, such as malloc metadata, without locks or atomic instructions. MP-RCS avoids interference by using upcalls to notify user-level threads when preemption or migration has occurred. The upcall will abort and restart any interrupted critical sections.We use MP-RCS to implement a malloc package, LFMalloc (Lock-Free Malloc). LFMalloc is scalable, has extremely low latency, excellent cache characteristics, and is memory efficient. We present data from some existing benchmarks showing that LFMalloc is often 10 times faster than Hoard, another malloc replacement package.

References

T. Anderson, B. Bershad, E. Lazowska and H. Levy. Scheduler Activations: Effective Kernel Support for User-Level Management of Parallelism. ACM Transactions on Computer Systems, 10(1). 1992 Google ScholarDigital Library
Alan Bawden. PCLSRing: Keeping Process State Modular. Available at ftp://ftp.ai.mit.edu/pub/alan/pclsr.memo. 1993Google Scholar
Emery Berger, Kathryn McKinley, Robert Blumofe and Paul Wilson. Hoard: A Scalable Memory Allocator for Multithreaded Applications. In ASPLOS-IX: Ninth International Conference on Architectural Support for Programming Languages and Operating Systems. 1997 Google ScholarDigital Library
Brian N. Bershad. Fast Mutual Exclusion for Uniprocessors. In ASPLOS-V: Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 1992 Google ScholarDigital Library
Brian N. Bershad. Practical Considerations for Non-Blocking Concurrent Objects. In Proc. International Conference on Distributed Computing Systems, (ICDCS). May 1993Google Scholar
Hans-J. Boehm. Fast Multiprocessor Memory Allocation and Garbage Collection. HP Labs Technical Report HPL-2000-165. 2000Google Scholar
Jeff Bonwick and Jonathan Adams. Magazines and Vmem: Extending the Slab Allocator to Many CPUs and Arbitrary Resources. In Proc. USENIX Technical Conference. 2001 Google ScholarDigital Library
Ben Gamsa, Orran Krieger, Jonathan Appavoo and Michael Stumm. Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System. In Proc. of Symp. On Operating System Design and Implementation. (OSDI-III). 1999 Google ScholarDigital Library
Wolfram Golger. Dynamic Memory Allocator Implementations in Linux System Binaries. Available at www.dent.med.uni-muenchen.de/~wmglo/malloc-slides.html. Site visited January 2002Google Scholar
Michael Greenwald. Ph. D. Thesis. Non-Blocking Synchronization and System Design. Stanford University, 1999 Google ScholarDigital Library
Maurice Herlihy. A Method for Implementing Highly Concurrent Data Objects. ACM Transactions on Programming Languages and Systems (TOPLAS) 15(5), November 1993 Google ScholarDigital Library
Richard L. Hudson, J. Eliot B. Moss, Sreenivas Subramoney and Weldon Washburn. Cycles to Recycle: Garbage Collection on the IA-64. In Tony Hoskings, editor, ISMM 2000, Proc. Second International Symposium on Memory Management, 36(1). of the ACM SIGPLAN Notices. 2000 Google ScholarDigital Library
IBM OS/390 MVS Programming: Resource Recovery. 1998. GC28-1739-03Google Scholar
Theodore Johnson and Krishna Harathi. Interruptible Critical Sections. Dept. of Computer Science, University of Florida. Technical Report TR94-007. 1994Google Scholar
L. I. Kontothanassis, R. W. Wisniewski, and M. L. Scott. Scheduler-Conscious Synchronization. ACM Trans. on Computer Systems, February 1997 Google ScholarDigital Library
P. Larson and M. Krishnan. Memory Allocation for Long-Running Server Applications. In International Symp. On Memory Management (ISMM 98). 1988 Google ScholarDigital Library
Chuck Lever and David Boreham. Malloc() Performance in a Multithreaded Linux Environment. In USENIX Technical Conference, 2000 Google ScholarDigital Library
Jim Maura and Richard McDougall. Solaris. Internals: Core Kernel Architecture. Sun Microsystems Press. Prentice-Hall. 2001 Google ScholarDigital Library
Paul McKenney and John Slingwine. Read-Copy Update: Using Execution History to Solve Concurrency Problems. In 10th IASTED International Conference on Parallel and Distributed Computing Systems. (PDCS.98). 1998Google Scholar
Paul McKenney, Jack Slingwine and Phil Krueger. Experience with a Parallel Memory Allocator. In Software . Practice & Experience. Vol. 31. 2001 Google ScholarDigital Library
Mark Moir. Practical Implementations of Non-Blocking Synchronization Primitives. In Proc. of the 16th ACM Symposium on the Principles of Distributed Computing. (PODC) 1997 Google ScholarDigital Library
William Moran and Farnham Jahanian. Cheap Mutual Exclusion. In Proc. USENIX Technical Conference. 1992Google Scholar
David Mosberger, Peter Druschel and Larry L. Peterson. A Fast and General Software Solution to Mutual Exclusion on Uniprocessors. Technical Report 94-07, Department of Computer Science, University of Arizona. June 1994Google Scholar
David Mosberger, Peter Druschel and Larry L. Peterson. Implementing Atomic Sequences on Uniprocessors Using Rollforward. In Software . Practice & Experience. Vol. 26, No. 1. January 1996 Google ScholarDigital Library
E. Moss and W. H. Kohler. Concurrency Features for the trellis/owl Language. In European Conference on Object-Oriented Programming 1987 (ECOOP.87) Google ScholarDigital Library
Greg Nakhimovsky. Improving Scalability of Multithreaded Dynamic Memory Allocation. In Dr. Dobbs Journal, #326. July 2001 Google ScholarDigital Library
O. Shivers, J. Clark and R. McGrath. Atomic Heap Transactions and Fine-grain Interrupts. In Proc. International Conference on Functional Programming (ICFP). 1999 Google ScholarDigital Library
Christopher Small and Margo Seltzer. Scheduler Activations on BSD: Sharing Thread Management State Between Kernel and Application. Harvard Computer Systems Laboratory Technical Report TR-31-95. 1995Google Scholar
Hiroaki Takada and Ken Sakamura. Real-Time Synchronization Protocols with Abortable Critical Sections. In Proc. of the First Workshop on Real-Time Systems and Applications. (RTCSA). 1994Google Scholar
John Valois. Lock-Free Data Structures. Ph. D. Thesis, Rensselaer Polytechnic Institute, 1995 Google ScholarDigital Library
Voon-Yee Vee and Wen-Jing Hsu. A Scalable and Efficient Storage Allocator on Shared-Memory Multiprocessors. In International Symp. of Parallel Architectures, Algorithms, and Networks (I-SPAN 99). 1999 Google ScholarDigital Library
Paul R. Wilson, Mark S. Johnstone, Michael Neeley and David Boles. Dynamic Storage Allocation: A Survey and Critical Review. In Proc. International Workshop on Memory Management, 1995 Google ScholarDigital Library

Index Terms

Mostly lock-free malloc
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Allocation / deallocation strategies

Recommendations

SuperMalloc: a super fast multithreaded malloc for 64-bit machines
ISMM '15: Proceedings of the 2015 International Symposium on Memory Management

SuperMalloc is an implementation of malloc(3) originally designed for X86 Hardware Transactional Memory (HTM)@. It turns out that the same design decisions also make it fast even without HTM@. For the malloc-test benchmark, which is one of the most ...
Read More
Scalable lock-free dynamic memory allocation
PLDI '04: Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation

Dynamic memory allocators (malloc/free) rely on mutual exclusion locks for protecting the consistency of their shared data structures under multithreading. The use of locking has many disadvantages with respect to performance, availability, robustness, ...
Read More
Mostly lock-free malloc
MSP 2002 and ISMM 2002

Modern multithreaded applications, such as application servers and database engines, can severely stress the performance of user-level memory allocators like the ubiquitous malloc subsystem. Such allocators can prove to be a major scalability impediment ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISMM '02: Proceedings of the 3rd international symposium on Memory management
June 2002
192 pages
ISBN:1581135394
DOI:10.1145/512429
General Chair:
Hans-J. Boehm
Hewlett-Packard Labs, USA
,
Program Chair:
David Detlefs
Sun Microsystems Labs, USA
ACM SIGPLAN Notices Volume 38, Issue 2 supplement
MSP 2002 and ISMM 2002
February 2003
291 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/773039
Issue’s Table of Contents
Copyright © 2002 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 June 2002
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
affinity
locality
lock-free operations
malloc
restartable critical sections
Qualifiers
- Article
Conference

Acceptance Rates
ISMM '02 Paper Acceptance Rate17of41submissions,41%Overall Acceptance Rate72of156submissions,46%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 43
  Total Citations
  View Citations
- 1,315
  Total Downloads
- Downloads (Last 12 months)19
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Mostly lock-free malloc

ISMM '02: Proceedings of the 3rd international symposium on Memory management

ABSTRACT

References

Cited By

Index Terms

Recommendations

SuperMalloc: a super fast multithreaded malloc for 64-bit machines

Scalable lock-free dynamic memory allocation

Mostly lock-free malloc