skip to main content
10.1145/512429.512451acmconferencesArticle/Chapter ViewAbstractPublication PagesismmConference Proceedingsconference-collections
Article

Mostly lock-free malloc

Published:20 June 2002Publication History

ABSTRACT

Modern multithreaded applications, such as application servers and database engines, can severely stress the performance of user-level memory allocators like the ubiquitous malloc subsystem. Such allocators can prove to be a major scalability impediment for the applications that use them, particularly for applications with large numbers of threads running on high-order multiprocessor systems.This paper introduces Multi-Processor Restartable Critical Sections, or MP-RCS. MP-RCS permits user-level threads to know precisely which processor they are executing on and then to safely manipulate CPU-specific data, such as malloc metadata, without locks or atomic instructions. MP-RCS avoids interference by using upcalls to notify user-level threads when preemption or migration has occurred. The upcall will abort and restart any interrupted critical sections.We use MP-RCS to implement a malloc package, LFMalloc (Lock-Free Malloc). LFMalloc is scalable, has extremely low latency, excellent cache characteristics, and is memory efficient. We present data from some existing benchmarks showing that LFMalloc is often 10 times faster than Hoard, another malloc replacement package.

References

  1. T. Anderson, B. Bershad, E. Lazowska and H. Levy. Scheduler Activations: Effective Kernel Support for User-Level Management of Parallelism. ACM Transactions on Computer Systems, 10(1). 1992 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alan Bawden. PCLSRing: Keeping Process State Modular. Available at ftp://ftp.ai.mit.edu/pub/alan/pclsr.memo. 1993Google ScholarGoogle Scholar
  3. Emery Berger, Kathryn McKinley, Robert Blumofe and Paul Wilson. Hoard: A Scalable Memory Allocator for Multithreaded Applications. In ASPLOS-IX: Ninth International Conference on Architectural Support for Programming Languages and Operating Systems. 1997 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Brian N. Bershad. Fast Mutual Exclusion for Uniprocessors. In ASPLOS-V: Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 1992 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Brian N. Bershad. Practical Considerations for Non-Blocking Concurrent Objects. In Proc. International Conference on Distributed Computing Systems, (ICDCS). May 1993Google ScholarGoogle Scholar
  6. Hans-J. Boehm. Fast Multiprocessor Memory Allocation and Garbage Collection. HP Labs Technical Report HPL-2000-165. 2000Google ScholarGoogle Scholar
  7. Jeff Bonwick and Jonathan Adams. Magazines and Vmem: Extending the Slab Allocator to Many CPUs and Arbitrary Resources. In Proc. USENIX Technical Conference. 2001 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ben Gamsa, Orran Krieger, Jonathan Appavoo and Michael Stumm. Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System. In Proc. of Symp. On Operating System Design and Implementation. (OSDI-III). 1999 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Wolfram Golger. Dynamic Memory Allocator Implementations in Linux System Binaries. Available at www.dent.med.uni-muenchen.de/~wmglo/malloc-slides.html. Site visited January 2002Google ScholarGoogle Scholar
  10. Michael Greenwald. Ph. D. Thesis. Non-Blocking Synchronization and System Design. Stanford University, 1999 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Maurice Herlihy. A Method for Implementing Highly Concurrent Data Objects. ACM Transactions on Programming Languages and Systems (TOPLAS) 15(5), November 1993 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Richard L. Hudson, J. Eliot B. Moss, Sreenivas Subramoney and Weldon Washburn. Cycles to Recycle: Garbage Collection on the IA-64. In Tony Hoskings, editor, ISMM 2000, Proc. Second International Symposium on Memory Management, 36(1). of the ACM SIGPLAN Notices. 2000 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. IBM OS/390 MVS Programming: Resource Recovery. 1998. GC28-1739-03Google ScholarGoogle Scholar
  14. Theodore Johnson and Krishna Harathi. Interruptible Critical Sections. Dept. of Computer Science, University of Florida. Technical Report TR94-007. 1994Google ScholarGoogle Scholar
  15. L. I. Kontothanassis, R. W. Wisniewski, and M. L. Scott. Scheduler-Conscious Synchronization. ACM Trans. on Computer Systems, February 1997 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. Larson and M. Krishnan. Memory Allocation for Long-Running Server Applications. In International Symp. On Memory Management (ISMM 98). 1988 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Chuck Lever and David Boreham. Malloc() Performance in a Multithreaded Linux Environment. In USENIX Technical Conference, 2000 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jim Maura and Richard McDougall. Solaris. Internals: Core Kernel Architecture. Sun Microsystems Press. Prentice-Hall. 2001 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Paul McKenney and John Slingwine. Read-Copy Update: Using Execution History to Solve Concurrency Problems. In 10th IASTED International Conference on Parallel and Distributed Computing Systems. (PDCS.98). 1998Google ScholarGoogle Scholar
  20. Paul McKenney, Jack Slingwine and Phil Krueger. Experience with a Parallel Memory Allocator. In Software . Practice & Experience. Vol. 31. 2001 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Mark Moir. Practical Implementations of Non-Blocking Synchronization Primitives. In Proc. of the 16th ACM Symposium on the Principles of Distributed Computing. (PODC) 1997 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. William Moran and Farnham Jahanian. Cheap Mutual Exclusion. In Proc. USENIX Technical Conference. 1992Google ScholarGoogle Scholar
  23. David Mosberger, Peter Druschel and Larry L. Peterson. A Fast and General Software Solution to Mutual Exclusion on Uniprocessors. Technical Report 94-07, Department of Computer Science, University of Arizona. June 1994Google ScholarGoogle Scholar
  24. David Mosberger, Peter Druschel and Larry L. Peterson. Implementing Atomic Sequences on Uniprocessors Using Rollforward. In Software . Practice & Experience. Vol. 26, No. 1. January 1996 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. E. Moss and W. H. Kohler. Concurrency Features for the trellis/owl Language. In European Conference on Object-Oriented Programming 1987 (ECOOP.87) Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Greg Nakhimovsky. Improving Scalability of Multithreaded Dynamic Memory Allocation. In Dr. Dobbs Journal, #326. July 2001 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. O. Shivers, J. Clark and R. McGrath. Atomic Heap Transactions and Fine-grain Interrupts. In Proc. International Conference on Functional Programming (ICFP). 1999 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Christopher Small and Margo Seltzer. Scheduler Activations on BSD: Sharing Thread Management State Between Kernel and Application. Harvard Computer Systems Laboratory Technical Report TR-31-95. 1995Google ScholarGoogle Scholar
  29. Hiroaki Takada and Ken Sakamura. Real-Time Synchronization Protocols with Abortable Critical Sections. In Proc. of the First Workshop on Real-Time Systems and Applications. (RTCSA). 1994Google ScholarGoogle Scholar
  30. John Valois. Lock-Free Data Structures. Ph. D. Thesis, Rensselaer Polytechnic Institute, 1995 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Voon-Yee Vee and Wen-Jing Hsu. A Scalable and Efficient Storage Allocator on Shared-Memory Multiprocessors. In International Symp. of Parallel Architectures, Algorithms, and Networks (I-SPAN 99). 1999 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Paul R. Wilson, Mark S. Johnstone, Michael Neeley and David Boles. Dynamic Storage Allocation: A Survey and Critical Review. In Proc. International Workshop on Memory Management, 1995 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Mostly lock-free malloc

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    ISMM '02: Proceedings of the 3rd international symposium on Memory management
    June 2002
    192 pages
    ISBN:1581135394
    DOI:10.1145/512429
    • cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 38, Issue 2 supplement
      MSP 2002 and ISMM 2002
      February 2003
      291 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/773039
      Issue’s Table of Contents

    Copyright © 2002 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 20 June 2002

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • Article

    Acceptance Rates

    ISMM '02 Paper Acceptance Rate17of41submissions,41%Overall Acceptance Rate72of156submissions,46%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader