research-article

ArMOR: defending against memory consistency model mismatches in heterogeneous architectures

Authors:
Daniel Lustig

Princeton University

Princeton University
View Profile

,
Caroline Trippel

Princeton University

Princeton University
View Profile

,
Michael Pellauer

NVIDIA Research

NVIDIA Research
View Profile

,
Margaret Martonosi

Princeton University

Princeton University
View Profile

ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer ArchitectureJune 2015Pages 388–400https://doi.org/10.1145/2749469.2750378

Published:13 June 2015Publication History

ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture

Pages 388–400

ABSTRACT

Architectural heterogeneity is increasing: numerous products and studies have proven the benefits of combining cores and accelerators with varying ISAs into a single system. However, an underappreciated barrier to unlocking the full potential of heterogeneity is the need to specify and to reconcile differences in memory consistency models across layers of the hardware-software stack and among on-chip components.

This paper presents ArMOR, a framework for specifying, comparing, and translating between memory consistency models. ArMOR defines MOSTs, an architecture-independent and precise format for specifying the semantics of memory ordering requirements such as preserved program order or explicit fences. MOSTs allow any two consistency models to be directly and algorithmically compared, and they help avoid many of the pitfalls of traditional consistency model analysis. As a case study, we use ArMOR to automatically generate translation modules called shims that dynamically translate code compiled for one memory model to execute on hardware implementing a different model.

References

S. Adve and K. Gharachorloo, "Shared memory consistency models: A tutorial," IEEE Computer, vol. 29, no. 12, pp. 66--76, 1996. Google ScholarDigital Library
S. Adve and M. Hill, "Weak ordering: a new definition," ISCA, 1990. Google ScholarDigital Library
J. Alglave, "A formal hierarchy of weak memory models," Formal Methods in System Design (FMSD), vol. 41, no. 2, pp. 178--210, 2012. Google ScholarDigital Library
J. Alglave, M. Batty, A. Donaldson, G. Gopalakrishnan, J. Ketema, D. Poetzl, T. Sorensen, and J. Wickerson, "GPU concurrency: weak behaviours and programming assumptions," ASPLOS, 2015. Google ScholarDigital Library
J. Alglave, A. Fox, S. Ishtiaq, M. O. Myreen, S. Sarkar, P. Sewell, and F. Z. Nardelli, "The semantics of Power and ARM machine code," 4th Workshop on Declarative Aspects of Multicore Programming (DAMP), 2009. Google ScholarDigital Library
J. Alglave, L. Maranget, S. Sarkar, and P. Sewell, "Fences in weak memory models," CAV, 2010. Google ScholarDigital Library
J. Alglave, L. Maranget, and M. Tautschnig, "Herding cats: Modelling, simulation, testing, and data-mining for weak memory," ACM TOPLAS, vol. 36, July 2014. Google ScholarDigital Library
ARM, "ARM architecture reference manual," 2013.Google Scholar
Arvind and J.-W. Maessen, "Memory model = instruction reordering + store atomicity," ISCA, 2006. Google ScholarDigital Library
M. Bach, M. Charney, R. Cohn, E. Demikhovsky, T. Devor, K. Hazelwood, A. Jaleel, C.-K. Luk, G. Lyons, H. Patil, and A. Tal, "Analyzing parallel programs with Pin," IEEE Computer, vol. 43, no. 3, pp. 34--41, 2010. Google ScholarDigital Library
L. Baraz, T. Devor, O. Etzion, S. Goldenberg, A. Skaletsky, Y. Wang, and Y. Zemach, "IA-32 execution layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium-based systems," MICRO, 2003. Google ScholarDigital Library
M. Batty, K. Memarian, S. Owens, S. Sarkar, and P. Sewell, "Clarifying and compiling C/C++ Concurrency: from C++11 to POWER," POPL, 2012. Google ScholarDigital Library
C. Bienia, "Benchmarking modern multiprocessors," Ph.D. dissertation, Princeton University, January 2011. Google ScholarDigital Library
N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The gem5 simulator," SIGARCH Comp. Arch. News, vol. 39, no. 2, Aug. 2011. Google ScholarDigital Library
H.-J. Boehm and S. Adve, "Foundations of the C++ concurrency memory model," PLDI, 2008. Google ScholarDigital Library
Broadcom, "Migrating CPU specific code from the PowerPC to the Broadcom SB-1 processor," White Paper SB-1-WP100-R, 2002.Google Scholar
S. Burckhardt, R. Alur, and M. M. K. Martin, "CheckFence: Checking consistency of concurrent data types on relaxed memory models," PLDI, 2007. Google ScholarDigital Library
T. Chen, R. Raghavan, J. N. Dale, and E. Iwata, "Cell broadband engine architecture and its first implementation---a performance view," IBM Journal of Research and Development, vol. 51, no. 5, pp. 559--572, 2007. Google ScholarDigital Library
B. Choi, R. Komuravelli, H. Sung, R. Smolinski, N. Honarmand, S. V. Adve, V. S. Adve, N. P. Carter, and C.-T. Chou, "DeNovo: Rethinking the memory hierarchy for disciplined parallelism," PACT, 2011. Google ScholarDigital Library
M. DeVuyst, A. Venkat, and D. Tullsen, "Execution migration in a heterogeneous-ISA chip multiprocessor," ASPLOS, 2012. Google ScholarDigital Library
Y. Duan, A. Muzahid, and J. Torrellas, "WeeFence: Toward making fences free in TSO," ISCA, 2013. Google ScholarDigital Library
I. Gelado, J. E. Stone, J. Cabezas, S. Patel, N. Navarro, and W.-M. W. Hwu, "An asymmetric distributed shared memory model for heterogeneous parallel systems," ASPLOS, 2010. Google ScholarDigital Library
K. Gharachorloo, A. Gupta, and J. Hennessy, "Two techniques to enhance the performance of memory consistency models," 29th International Conference on Parallel Processing (ICPP), 1991.Google Scholar
P. Greenhalgh, "big.LITTLE processing with ARM Cortex-A15 & Cortex-A7," ARM White Paper, 2011. {Online}. Available: http://www.arm.com/files/downloads/big_LITTLE_Final_Final.pdfGoogle Scholar
M. Gschwind, K. Ebcioğlu, E. Altman, and S. Sathaye, "Binary translation and architecture convergence issues for IBM System/390," ICS, 2000. Google ScholarDigital Library
L. Higham and L. Jackson, "Translating between Itanium and Sparc memory consistency models," SPAA, 2006. Google ScholarDigital Library
T. Q. Huynh and A. Roychoudhury, "Memory model sensitive bytecode verification," Formal Methods in System Design (FMSD), vol. 31, 2007. Google ScholarDigital Library
IBM, "Power ISA version 2.07," 2013.Google Scholar
Intel, "Intel Itanium architecture software developer's manual, revision 2.3," 2010.Google Scholar
Intel, "Intel 64 and IA-32 architectures software developer's manual," 2013.Google Scholar
J. H. Kelm, D. R. Johnson, W. Tuohy, S. S. Lumetta, and S. J. Patel, "Cohesion: A hybrid memory model for accelerators," ISCA, 2010. Google ScholarDigital Library
Khronos Group, "OpenCL 2.0." {Online}. Available: http://www.khronos.org/openclGoogle Scholar
M. Kuperstein, M. Vechev, and E. Yahav, "Automatic inference of memory fences," FMCAD, 2012. Google ScholarDigital Library
N. M. Lê, A. Pop, A. Cohen, and F. Zappa Nardelli, "Correct and efficient work-stealing for weak memory models," PPoPP, 2013.Google Scholar
J. Lee and D. A. Padua, "Hiding relaxed memory consistency with compilers," PACT, 2000. Google ScholarDigital Library
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood, "Pin: building customized program analysis tools with dynamic instrumentation," PLDI, 2005. Google ScholarDigital Library
D. Lustig and M. Martonosi, "Reducing GPU offload latency via fine-grained CPU-GPU synchronization," HPCA, 2013. Google ScholarDigital Library
D. Lustig, M. Pellauer, and M. Martonosi, "PipeCheck: Specifying and verifying microarchitectural enforcement of memory consistency models," MICRO, 2014. Google ScholarDigital Library
D. Lustig, C. Trippel, M. Pellauer, and M. Martonosi, "ArMOR: Defending against consistency model mismatches in heterogeneous architectures," Princeton Computer Science Tech. Report TR-981-15, 2015, (conference paper extension).Google Scholar
S. Mador-Haim, L. Maranget, S. Sarkar, K. Memarian, J. Alglave, S. Owens, R. Alur, M. M. K. Martin, P. Sewell, and D. Williams, "An axiomatic memory model for POWER multiprocessors," 2012.Google ScholarDigital Library
J. Manson, W. Pugh, and S. Adve, "The Java memory model," POPL, 2005. Google ScholarDigital Library
F. Z. Nardelli, P. Sewell, J. Sevcik, S. Sarkar, S. Owens, L. Maranget, M. Batty, and J. Alglave, "Relaxed memory models must be rigorous," 2009.Google Scholar
NVIDIA, "NVIDIA Tegra K1: A new era in mobile computing," 2014. {Online}. Available: http://www.nvidia.com/content/pdf/tegra_white_papers/tegra_k1_whitepaper_v1.0.pdfGoogle Scholar
NVIDIA, "CUDA C programming guide v5.5," 2013.Google Scholar
S. Owens, S. Sarkar, and P. Sewell, "A better x86 memory model: x86-TSO," 22nd Conference on Theorem Proving in Higher Order Logics (TPHOLs), 2009. Google ScholarDigital Library
R. Paige and R. E. Tarjan, "Three partition refinement algorithms," SIAM Journal on Computing, vol. 16, no. 6, pp. 973--989, 1987. Google ScholarDigital Library
S. Pelley, P. M. Chen, and T. F. Wenisch, "Memory persistency," ISCA, 2014. Google ScholarDigital Library
A. Putnam, A. Caulfield, E. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, G. P. Gopal, J. Gray, M. Haselman, S. Hauck, S. Heil, A. Hormati, J.-Y. Kim, S. Lanka, J. Larus, E. Peterson, S. Pope, A. Smith, J. Thong, P. Y. Xiao, and D. Burger, "A reconfigurable fabric for accelerating large-scale datacenter services," ISCA, 2014. Google ScholarDigital Library
Qualcomm, "Snapdragon S4 processors: System on chip solutions for a new mobile age," October 2011. {Online}. Available: https://developer.qualcomm.com/download/qusnapdragons4whitepaperfnlrev6.pdfGoogle Scholar
B. Saha, X. Zhou, H. Chen, Y. Gao, S. Yan, M. Rajagopalan, J. Fang, P. Zhang, R. Ronen, and A. Mendelson, "Programming model for a heterogeneous x86 platform," PLDI, 2009. Google ScholarDigital Library
S. Sarkar, P. Sewell, J. Alglave, L. Maranget, and D. Williams, "Understanding POWER microprocessors," PLDI, 2011. Google ScholarDigital Library
J. Ševčík, V. Vafeiadis, F. Zappa Nardelli, S. Jagannathan, and P. Sewell, "CompCertTSO: A verified compiler for relaxed-memory concurrency," Journal of the ACM (JACM), vol. 60, no. 3, p. 22, 2013. Google ScholarDigital Library
P. Sewell et al., "C/C++11 mappings to processors," http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html.Google Scholar
D. Shasha and M. Snir, "Efficient and correct execution of parallel programs that share memory," TOPLAS, 1988. Google ScholarDigital Library
X. Shen, Arvind, and L. Rudolph, "Commit-Reconcile and Fences: A new memory model for architects and compiler writers," ISCA, 1999. Google ScholarDigital Library
A. L. Shimpi, "AMD announced K12 core: Custom 64-bit ARM design in 2016." {Online}. Available: http://www.anandtech.com/show/7990/amd-announces-k12-core-custom-64bit-arm-design-in-2016Google Scholar
A. Singh, S. Narayanasamy, D. Marino, T. Millstein, and M. Musuvathi, "End-to-end sequential consistency," ISCA, 2012. Google ScholarDigital Library
D. Sorin, M. Hill, and D. Wood, A Primer on Memory Consistency and Cache Coherence, ser. Synthesis Lectures on Computer Architecture, M. Hill, Ed. Morgan & Claypool Publishers, 2011. Google ScholarDigital Library
SPARC, "SPARC architecture manual, version 9," 1994. Google ScholarDigital Library
H. Sung, R. Komuravelli, and S. V. Adve, "DeNovoND: efficient hardware support for disciplined non-determinism," ASPLOS, 2013. Google ScholarDigital Library
Z. Sura, X. Fang, C.-L. Wong, S. P. Midkiff, J. Lee, and D. Padua, "Compiler techniques for high performance sequentially consistent Java programs," PPoPP, 2005. Google ScholarDigital Library
J. M. Tendler, J. S. Dodson, J. Fields, H. Le, and B. Sinharoy, "POWER4 system microarchitecture," IBM Journal of Research and Development, vol. 46, no. 1, pp. 5--25, 2002. Google ScholarDigital Library
"Top500," http://www.top500.org, accessed: Jul. 28, 2014.Google Scholar
V. Vafeiadis and F. Z. Nardelli, "Verifying fence elimination optimisations," SAS, 2011. Google ScholarDigital Library
A. Venkat and D. M. Tullsen, "Harnessing ISA diversity: Design of a heterogeneous-ISA chip multiprocessor," ISCA, 2014. Google ScholarDigital Library

Index Terms

ArMOR: defending against memory consistency model mismatches in heterogeneous architectures
1. Computer systems organization
  1. Architectures
2. Hardware

Recommendations

ArMOR: defending against memory consistency model mismatches in heterogeneous architectures
ISCA'15

Architectural heterogeneity is increasing: numerous products and studies have proven the benefits of combining cores and accelerators with varying ISAs into a single system. However, an underappreciated barrier to unlocking the full potential of ...
Read More
An evaluation of speculative instruction execution on simultaneous multithreaded processors

Modern superscalar processors rely heavily on speculative execution for performance. For example, our measurements show that on a 6-issue superscalar, 93% of committed instructions for SPECINT95 are speculative. Without speculation, processor resources ...
Read More
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

To achieve high performance, contemporary computer systems rely on two forms of parallelism: instruction-level parallelism (ILP) and thread-level parallelism (TLP). Wide-issue super-scalar processors exploit ILP by executing multiple instructions from a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture
June 2015
768 pages
ISBN:9781450334020
DOI:10.1145/2749469
General Chair:
Debbie Marr
Intel
,
Program Chair:
David Albonesi
Cornell
ACM SIGARCH Computer Architecture News Volume 43, Issue 3S
ISCA'15
June 2015
745 pages
ISSN:0163-5964
DOI:10.1145/2872887
Editor:
Doug DeGroot
acm dot org
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 June 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate543of3,203submissions,17%
Upcoming Conference
ISCA '24

Sponsor:

sigarch

ISCA '24: The 51st Annual International Symposium on Computer Architecture

June 29 - July 3, 2024

Buenos Aires , Argentina
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 35
  Total Citations
  View Citations
- 615
  Total Downloads
- Downloads (Last 12 months)22
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

ArMOR: defending against memory consistency model mismatches in heterogeneous architectures

ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

ArMOR: defending against memory consistency model mismatches in heterogeneous architectures

An evaluation of speculative instruction execution on simultaneous multithreaded processors

Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading