skip to main content
10.1145/2749469.2750378acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

ArMOR: defending against memory consistency model mismatches in heterogeneous architectures

Published:13 June 2015Publication History

ABSTRACT

Architectural heterogeneity is increasing: numerous products and studies have proven the benefits of combining cores and accelerators with varying ISAs into a single system. However, an underappreciated barrier to unlocking the full potential of heterogeneity is the need to specify and to reconcile differences in memory consistency models across layers of the hardware-software stack and among on-chip components.

This paper presents ArMOR, a framework for specifying, comparing, and translating between memory consistency models. ArMOR defines MOSTs, an architecture-independent and precise format for specifying the semantics of memory ordering requirements such as preserved program order or explicit fences. MOSTs allow any two consistency models to be directly and algorithmically compared, and they help avoid many of the pitfalls of traditional consistency model analysis. As a case study, we use ArMOR to automatically generate translation modules called shims that dynamically translate code compiled for one memory model to execute on hardware implementing a different model.

References

  1. S. Adve and K. Gharachorloo, "Shared memory consistency models: A tutorial," IEEE Computer, vol. 29, no. 12, pp. 66--76, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Adve and M. Hill, "Weak ordering: a new definition," ISCA, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Alglave, "A formal hierarchy of weak memory models," Formal Methods in System Design (FMSD), vol. 41, no. 2, pp. 178--210, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Alglave, M. Batty, A. Donaldson, G. Gopalakrishnan, J. Ketema, D. Poetzl, T. Sorensen, and J. Wickerson, "GPU concurrency: weak behaviours and programming assumptions," ASPLOS, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Alglave, A. Fox, S. Ishtiaq, M. O. Myreen, S. Sarkar, P. Sewell, and F. Z. Nardelli, "The semantics of Power and ARM machine code," 4th Workshop on Declarative Aspects of Multicore Programming (DAMP), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Alglave, L. Maranget, S. Sarkar, and P. Sewell, "Fences in weak memory models," CAV, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Alglave, L. Maranget, and M. Tautschnig, "Herding cats: Modelling, simulation, testing, and data-mining for weak memory," ACM TOPLAS, vol. 36, July 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. ARM, "ARM architecture reference manual," 2013.Google ScholarGoogle Scholar
  9. Arvind and J.-W. Maessen, "Memory model = instruction reordering + store atomicity," ISCA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Bach, M. Charney, R. Cohn, E. Demikhovsky, T. Devor, K. Hazelwood, A. Jaleel, C.-K. Luk, G. Lyons, H. Patil, and A. Tal, "Analyzing parallel programs with Pin," IEEE Computer, vol. 43, no. 3, pp. 34--41, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. Baraz, T. Devor, O. Etzion, S. Goldenberg, A. Skaletsky, Y. Wang, and Y. Zemach, "IA-32 execution layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium-based systems," MICRO, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Batty, K. Memarian, S. Owens, S. Sarkar, and P. Sewell, "Clarifying and compiling C/C++ Concurrency: from C++11 to POWER," POPL, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Bienia, "Benchmarking modern multiprocessors," Ph.D. dissertation, Princeton University, January 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The gem5 simulator," SIGARCH Comp. Arch. News, vol. 39, no. 2, Aug. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H.-J. Boehm and S. Adve, "Foundations of the C++ concurrency memory model," PLDI, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Broadcom, "Migrating CPU specific code from the PowerPC to the Broadcom SB-1 processor," White Paper SB-1-WP100-R, 2002.Google ScholarGoogle Scholar
  17. S. Burckhardt, R. Alur, and M. M. K. Martin, "CheckFence: Checking consistency of concurrent data types on relaxed memory models," PLDI, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Chen, R. Raghavan, J. N. Dale, and E. Iwata, "Cell broadband engine architecture and its first implementation---a performance view," IBM Journal of Research and Development, vol. 51, no. 5, pp. 559--572, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. Choi, R. Komuravelli, H. Sung, R. Smolinski, N. Honarmand, S. V. Adve, V. S. Adve, N. P. Carter, and C.-T. Chou, "DeNovo: Rethinking the memory hierarchy for disciplined parallelism," PACT, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. DeVuyst, A. Venkat, and D. Tullsen, "Execution migration in a heterogeneous-ISA chip multiprocessor," ASPLOS, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Y. Duan, A. Muzahid, and J. Torrellas, "WeeFence: Toward making fences free in TSO," ISCA, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. I. Gelado, J. E. Stone, J. Cabezas, S. Patel, N. Navarro, and W.-M. W. Hwu, "An asymmetric distributed shared memory model for heterogeneous parallel systems," ASPLOS, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. Gharachorloo, A. Gupta, and J. Hennessy, "Two techniques to enhance the performance of memory consistency models," 29th International Conference on Parallel Processing (ICPP), 1991.Google ScholarGoogle Scholar
  24. P. Greenhalgh, "big.LITTLE processing with ARM Cortex-A15 & Cortex-A7," ARM White Paper, 2011. {Online}. Available: http://www.arm.com/files/downloads/big_LITTLE_Final_Final.pdfGoogle ScholarGoogle Scholar
  25. M. Gschwind, K. Ebcioğlu, E. Altman, and S. Sathaye, "Binary translation and architecture convergence issues for IBM System/390," ICS, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. L. Higham and L. Jackson, "Translating between Itanium and Sparc memory consistency models," SPAA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. T. Q. Huynh and A. Roychoudhury, "Memory model sensitive bytecode verification," Formal Methods in System Design (FMSD), vol. 31, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. IBM, "Power ISA version 2.07," 2013.Google ScholarGoogle Scholar
  29. Intel, "Intel Itanium architecture software developer's manual, revision 2.3," 2010.Google ScholarGoogle Scholar
  30. Intel, "Intel 64 and IA-32 architectures software developer's manual," 2013.Google ScholarGoogle Scholar
  31. J. H. Kelm, D. R. Johnson, W. Tuohy, S. S. Lumetta, and S. J. Patel, "Cohesion: A hybrid memory model for accelerators," ISCA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Khronos Group, "OpenCL 2.0." {Online}. Available: http://www.khronos.org/openclGoogle ScholarGoogle Scholar
  33. M. Kuperstein, M. Vechev, and E. Yahav, "Automatic inference of memory fences," FMCAD, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. N. M. Lê, A. Pop, A. Cohen, and F. Zappa Nardelli, "Correct and efficient work-stealing for weak memory models," PPoPP, 2013.Google ScholarGoogle Scholar
  35. J. Lee and D. A. Padua, "Hiding relaxed memory consistency with compilers," PACT, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood, "Pin: building customized program analysis tools with dynamic instrumentation," PLDI, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. D. Lustig and M. Martonosi, "Reducing GPU offload latency via fine-grained CPU-GPU synchronization," HPCA, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. D. Lustig, M. Pellauer, and M. Martonosi, "PipeCheck: Specifying and verifying microarchitectural enforcement of memory consistency models," MICRO, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. D. Lustig, C. Trippel, M. Pellauer, and M. Martonosi, "ArMOR: Defending against consistency model mismatches in heterogeneous architectures," Princeton Computer Science Tech. Report TR-981-15, 2015, (conference paper extension).Google ScholarGoogle Scholar
  40. S. Mador-Haim, L. Maranget, S. Sarkar, K. Memarian, J. Alglave, S. Owens, R. Alur, M. M. K. Martin, P. Sewell, and D. Williams, "An axiomatic memory model for POWER multiprocessors," 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. J. Manson, W. Pugh, and S. Adve, "The Java memory model," POPL, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. F. Z. Nardelli, P. Sewell, J. Sevcik, S. Sarkar, S. Owens, L. Maranget, M. Batty, and J. Alglave, "Relaxed memory models must be rigorous," 2009.Google ScholarGoogle Scholar
  43. NVIDIA, "NVIDIA Tegra K1: A new era in mobile computing," 2014. {Online}. Available: http://www.nvidia.com/content/pdf/tegra_white_papers/tegra_k1_whitepaper_v1.0.pdfGoogle ScholarGoogle Scholar
  44. NVIDIA, "CUDA C programming guide v5.5," 2013.Google ScholarGoogle Scholar
  45. S. Owens, S. Sarkar, and P. Sewell, "A better x86 memory model: x86-TSO," 22nd Conference on Theorem Proving in Higher Order Logics (TPHOLs), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. R. Paige and R. E. Tarjan, "Three partition refinement algorithms," SIAM Journal on Computing, vol. 16, no. 6, pp. 973--989, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. S. Pelley, P. M. Chen, and T. F. Wenisch, "Memory persistency," ISCA, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. A. Putnam, A. Caulfield, E. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, G. P. Gopal, J. Gray, M. Haselman, S. Hauck, S. Heil, A. Hormati, J.-Y. Kim, S. Lanka, J. Larus, E. Peterson, S. Pope, A. Smith, J. Thong, P. Y. Xiao, and D. Burger, "A reconfigurable fabric for accelerating large-scale datacenter services," ISCA, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Qualcomm, "Snapdragon S4 processors: System on chip solutions for a new mobile age," October 2011. {Online}. Available: https://developer.qualcomm.com/download/qusnapdragons4whitepaperfnlrev6.pdfGoogle ScholarGoogle Scholar
  50. B. Saha, X. Zhou, H. Chen, Y. Gao, S. Yan, M. Rajagopalan, J. Fang, P. Zhang, R. Ronen, and A. Mendelson, "Programming model for a heterogeneous x86 platform," PLDI, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. S. Sarkar, P. Sewell, J. Alglave, L. Maranget, and D. Williams, "Understanding POWER microprocessors," PLDI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. J. Ševčík, V. Vafeiadis, F. Zappa Nardelli, S. Jagannathan, and P. Sewell, "CompCertTSO: A verified compiler for relaxed-memory concurrency," Journal of the ACM (JACM), vol. 60, no. 3, p. 22, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. P. Sewell et al., "C/C++11 mappings to processors," http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html.Google ScholarGoogle Scholar
  54. D. Shasha and M. Snir, "Efficient and correct execution of parallel programs that share memory," TOPLAS, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. X. Shen, Arvind, and L. Rudolph, "Commit-Reconcile and Fences: A new memory model for architects and compiler writers," ISCA, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. A. L. Shimpi, "AMD announced K12 core: Custom 64-bit ARM design in 2016." {Online}. Available: http://www.anandtech.com/show/7990/amd-announces-k12-core-custom-64bit-arm-design-in-2016Google ScholarGoogle Scholar
  57. A. Singh, S. Narayanasamy, D. Marino, T. Millstein, and M. Musuvathi, "End-to-end sequential consistency," ISCA, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. D. Sorin, M. Hill, and D. Wood, A Primer on Memory Consistency and Cache Coherence, ser. Synthesis Lectures on Computer Architecture, M. Hill, Ed. Morgan & Claypool Publishers, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. SPARC, "SPARC architecture manual, version 9," 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. H. Sung, R. Komuravelli, and S. V. Adve, "DeNovoND: efficient hardware support for disciplined non-determinism," ASPLOS, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Z. Sura, X. Fang, C.-L. Wong, S. P. Midkiff, J. Lee, and D. Padua, "Compiler techniques for high performance sequentially consistent Java programs," PPoPP, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. J. M. Tendler, J. S. Dodson, J. Fields, H. Le, and B. Sinharoy, "POWER4 system microarchitecture," IBM Journal of Research and Development, vol. 46, no. 1, pp. 5--25, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. "Top500," http://www.top500.org, accessed: Jul. 28, 2014.Google ScholarGoogle Scholar
  64. V. Vafeiadis and F. Z. Nardelli, "Verifying fence elimination optimisations," SAS, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. A. Venkat and D. M. Tullsen, "Harnessing ISA diversity: Design of a heterogeneous-ISA chip multiprocessor," ISCA, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. ArMOR: defending against memory consistency model mismatches in heterogeneous architectures

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture
            June 2015
            768 pages
            ISBN:9781450334020
            DOI:10.1145/2749469

            Copyright © 2015 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 13 June 2015

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate543of3,203submissions,17%

            Upcoming Conference

            ISCA '24

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader