skip to main content
10.1145/1133956.1133968acmconferencesArticle/Chapter ViewAbstractPublication PagesismmConference Proceedingsconference-collections
Article

Scalable locality-conscious multithreaded memory allocation

Published:10 June 2006Publication History

ABSTRACT

We present Streamflow, a new multithreaded memory manager designed for low overhead, high-performance memory allocation while transparently favoring locality. Streamflow enables low over-head simultaneous allocation by multiple threads and adapts to sequential allocation at speeds comparable to that of custom sequential allocators. It favors the transparent exploitation of temporal and spatial object access locality, and reduces allocator-induced cache conflicts and false sharing, all using a unified design based on segregated heaps. Streamflow introduces an innovative design which uses only synchronization-free operations in the most common case of local allocations and deallocations, while requiring minimal, non-blocking synchronization in the less common case of remote deallocations. Spatial locality at the cache and page level is favoredby eliminating small objects headers, reducing allocator-induced conflicts via contiguous allocation of page blocks in physical memory, reducing allocator-induced false sharing by using segregated heaps and achieving better TLB performance and fewer page faults via the use of superpages. Combining these locality optimizations with the drastic reduction of synchronization and latency overhead allows Streamflow to perform comparably with optimized sequential allocators and outperform--on a shared-memory systemwith four two-way SMT processors--four state-of-the-art multi-processor allocators by sizeable margins in our experiments. The allocation-intensive sequential and parallel benchmarks used in our experiments represent a variety of behaviors, including mostly local object allocation-deallocation patterns and producer-consumer allocation-deallocation patterns.

References

  1. N. Arora, R. Blumofe, and C. Greg-Plaxton. Thread Scheduling for Multiprogrammed Multiprocessors. In Proc. of the 10th ACM Symposium on Parallel Algorithms and Architectures, pages 119--129, Puerto Vallarta, Mexico, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Barrett and B. Zorn. Using Lifetime Predictors to Improve Memory Allocation Performance. In Proc. of the 1993 ACM SIGPLAN Conference on Programming Languages Design and Implementation, pages 187--196, June 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. E. Berger, K. Mckinley, R. Blumofe, and P. Wilson. Hoard: A Scalable Memory Allocator for Multithreaded Applications. In Proc. of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 117--128, Cambridge, MA, November 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. E. Berger, B. Zorn, and K. McKinley. Reconsidering Custom Memory Allocation. In Proc. of the 17th ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applpications, pages 1--12, Seattle, WA, November 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Filip Blagojevic. Optimizing Irregular Adaptive Application on Multi-Threaded Processors: The Case of Medium-Grain Parallel Delaunay Mesh Generation. Master's thesis, The College of William and Mary, Williamsburg, VA, U.S.A., December 2005.Google ScholarGoogle Scholar
  6. C. Cascaval, E. Duesterwald, P. Sweeney, and R. Wisniewski. Multiple Page Size Modeling and Optimization. In Proc. of the 14th International Conference on Parallel Architectures and Compilation Techniques, pages 339--349, Saint Louis, MO, September 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Y. Feng and E. Berger. A Locality-Improving Dynamic Memory Allocator. In Proceedings of the Third Annual ACM SIGPLAN Workshop on Memory Systems Performance, Chicago, IL, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Gay and A. Aiken. Memory Management with Explicit Regions. In Proc. of the 1998 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 313--323, Montreal, Canada, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Wolfram Gloger. Dynamic Memory Allocator Implementations in Linux System Libraries. http://www.dent.med.uni-muenchen.de/ wmglo/malloc-slides.html.Google ScholarGoogle Scholar
  10. Google. Google Performance Tools. http://goog-perftools.sourceforge.net/.Google ScholarGoogle Scholar
  11. D. Grunwald, B. Zorn, and R. Henderson. Improving the Cache Locality of Memory Allocation. In Proc. of the ACM SIGPLAN 1993 Conference on Programming Language Design and Implementation, pages 177--186, Albuquerque, NM, June 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. Kamp. Malloc(3) Revisted. http://phk.freebsd.dk/pubs/malloc.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. K. C. Knowlton. A Fast Storage Allocator. Communications of the ACM, 8(10):623--625, 1965. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. E. Knuth. Dynamic Storage Allocation. In The Art of Computer Programming, volume 1. Addison-Wesley, 1968. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Larson and M. Krishnan. Memory Allocation for Long-Running Server Applications. In Proceedings of the First International Symposium on Memory Management, pages 176--185, Vancouver, BC, October 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Lea. A Memory Allocator. http://gee.cs.oswego.edu/dl/html/malloc.html.Google ScholarGoogle Scholar
  17. L. McDowell, S. Eggers, and S. Gribble. Improving Server Software Support for Simultaneous Multithreaded Processors. In Proc. of the 2003 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 37--48, San Diego, CA, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Michael. Scalable Lock-free Dynamic Memory Allocation. In Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation, pages 35--46, Washington, DC, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Navarro, S. Iyer, and A. Cox. Practical, Transparent Operating System Support for Superpages. In Proc. of the Fifth Symposiumon Operating Systems Design and Implementation, pages 89--104, Boston, MA, December 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. Romer, W. Ohlrich, A. Karlin, and B. Bershad. Reducing TLB and Memory Overhead using Online Superpage Promotion. In Proc. of the 22nd International Symposium on Computer Architecture, pages 176--187, Santa Margherita Ligure, Italy, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Ross. The AED Free Storage Package. Communications of the ACM, 10(8):481--492, 1967. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Seidl and B. Zorn. Segregating Heap Objects by Reference Behavior and Lifetime. In Proc. of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 12--23, San Jose, CA, October 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Y. Shuf, M. Gupta, R. Bordawekar, and J. Pal Singh. Exploiting Prolific Types for Memory Management and Optimizations. In Proc.of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Lanugages, pages 295--306, Portland, OR, January 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. G. Steele. Data Representation in PDP-10 MACLISP. Technical Report AI Lab Memo 421, MIT, 1977.Google ScholarGoogle Scholar
  25. V. Vee and W. Hsu. A Scalable and Efficient Storage Allocator on Shared Memory Multiprocessors. In Proceedings of the 1999 International Symposium on Parallel Architectures, Algorithms and Networks, pages 230--235, Perth, Australia, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. K. Vo. Vmalloc: A General and Efficient Memory Allocator. Software Practice and Experience, 26(3):357--374, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  27. P. Wilson, M. Johnstone, M. Neely, and D. Boles. Dynamic Storage Allocation: A Survey and Critical Review. In Proc. of the International Workshop on Memory Management, LNCS Vol. 986, pages 1--116, Kinross, UK, September 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Scalable locality-conscious multithreaded memory allocation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ISMM '06: Proceedings of the 5th international symposium on Memory management
          June 2006
          202 pages
          ISBN:1595932216
          DOI:10.1145/1133956
          • General Chair:
          • Erez Petrank,
          • Program Chair:
          • Eliot Moss

          Copyright © 2006 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 10 June 2006

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate72of156submissions,46%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader