Performance Characterization of the 64-bit x86 Architecture from Compiler Optimizations’ Perspective

Liu, Jack; Wu, Youfeng

doi:10.1007/11688839_14

Jack Liu¹⁸ &
Youfeng Wu¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3923))

Included in the following conference series:

International Conference on Compiler Construction

1832 Accesses
2 Citations

Abstract

Intel Extended Memory 64 Technology (EM64T) and AMD 64-bit architecture (AMD64) are emerging 64-bit x86 architectures that are fully x86 compatible. Compared with the 32-bit x86 architecture, the 64-bit x86 architectures cater some new features to applications. For instance, applications can address 64 bits of virtual memory space, perform operations on 64-bit-wide operands, get access to 16 general-purpose registers (GPRs) and 16 extended multi-media (XMM) registers, and use a register-based argument passing convention. In this paper, we investigate the performance impacts of these new features from compiler optimizations’ standpoint. Our research compiler is based on the Intel Fortran/C++ production compiler, and our experiments are conducted on the SPEC2000 benchmark suite. Results show that for 64-bit-wide pointer and long data types, several SPEC2000 C benchmarks are slowed down by more than 20%, which is mainly due to the enlarged memory footprint. To evaluate the performance potential of 64-bit x86 architectures, we designed and implemented the LP32 code model such that the sizes of pointer and long are 32 bits. Our experiments demonstrate that on average the LP32 code model speeds up the SPEC2000 C benchmarks by 13.4%. For the register-based argument passing convention, our experiments show that the performance gain is less than 1% because of the aggressive function inlining optimization. Finally, we observe that using 16 GPRs and 16 XMM registers significantly outperforms the scenario when only 8 GPRs and 8 XMM registers are used. However, our results also show that using 12 GPRs and 12 XMM registers can achieve as competitive performance as employing 16 GPRs and 16 XMM registers.

Download to read the full chapter text

Chapter PDF

Experimental Characterization of OpenMP Offloading Memory Operations and Unified Shared Memory Support

OmpMemOpt: Optimized Memory Movement for Heterogeneous Computing

An Analysis of Core- and Chip-Level Architectural Features in Four Generations of Intel Server Processors

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Intel Corporation, Santa Clara. 64-bit Extension Technology Software Developer’s Guide Volume 1&2 Order Number 300834, 300835
Google Scholar
Luna, D., Pettersson, M., Sagonas, K.: Efficiently compiling a functional language on AMD64: the HiPE experience. In: PPDP 2005: Proceedings of the 7th ACM SIGPLAN international conference on Principles and practice of declarative programming, pp. 176–186 (2005)
Google Scholar
Hubička, J.: Porting GCC to the AMD64 architecture. In: Proceedings of the GCC Developers Summit, May 2003, pp. 79–105 (2003)
Google Scholar
Luk, C.-K., Cohn, R., et al.: Pin: building customized program analysis tools with dynamic instrumentation. In: Proceedings of the ACM SIGPLAN 2005 Conference on Programming language design and implementation, pp. 190–200 (2005)
Google Scholar
Sprunt, B.: Pentium 4 Performance-Monitoring Features. IEEE Micro. 22(4), 72–82 (2002)
Article Google Scholar
Chaitin, G.J.: Register allocation & spilling via graph coloring. In: SIGPLAN 1982: Proceedings of the, SIGPLAN symposium on Compiler construction, pp. 98–101 (1982)
Google Scholar
Briggs, P., Cooper, K.D., Kennedy, K., Torczon, L.: Coloring heuristics for register allocation. In: PLDI 1989: Proceedings of the ACM SIGPLAN, Conference on Programming language design and implementation, pp. 275–284 (1989)
Google Scholar
Hubička, J., Jaeger, A., Mitchell, M. (eds.): System V Application Binary Interface: AMD64 Architecture Processor Supplement, Available from: http://www.x86-64.org
Lattner, C., Adve, V.S.: Transparent Pointer Compression for Linked Data Structures. In: Proceedings of Memory System Performance Workshop (2005)
Google Scholar
Adl-Tabatabai, A.-R., et al.: Improving 64-Bit Java IPF Performance by Compressing Heap References. In: Proceedings of CGO, March 2004, pp. 100–111 (2004)
Google Scholar
Koes, D., Goldstein, S.C.: A Progressive Register Allocator for Irregular Architectures. In: Proceedings of CGO, pp. 269–280 (2005)
Google Scholar
Govindarajan, R., Yang, H., Amaral, J.N., Zhang, C., Gao, G.R.: Minimum Register Instruction Sequencing to Reduce Register Spills in Out-of-Order Issue Superscalar Architectures. IEEE Transaction on Computers 52(1) (January 2003)
Google Scholar
Kong, T., Wilken, K.D.: Precise register allocation for irregular architectures. In: Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, pp. 297–307 (1998)
Google Scholar
Appel, A.W., George, L.: Optimal spilling for CISC machines with few registers. In: Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation, pp. 243–253 (2001)
Google Scholar
Kochetkov, K.: SPEC CPU 2000, Part 19. EM64T in Intel Pentium 4, (June 2005), Available from: http://www.digit-life.com/articles2/cpu/insidespeccpu2000-part-j.html
Intel Corporation, Santa Clara. IA-32 Intel® Architecture Software Developer’s Manual, Volume 1: Basic Architecture (2005), Order Number 253665
Google Scholar
Intel Corporation, Santa Clara. IA-32 Intel® Architecture Software Developers Manual, Volume 3: System Programming Guide (2005), Order Number 253668
Google Scholar
Poletto, M., Sarkar, V.: Linear scan register allocation. ACM Transactions on Programming Languages and Systems 21(5), 895–913 (1999)
Article Google Scholar
Ailamaki, A., DeWitt, D.J., Hill, M.D., Wood, D.A.: DBMSs on a Modern Processor: Where Does Time Go? In: Proceedings of 25th International Conference on Very Large Data Bases, Edinburgh, Scotland, UK, September 1999, pp. 266–277 (1999)
Google Scholar
Keeton, K., Patterson, D.A., et al.: Performance characterization of a Quad Pentium Pro SMP using OLTP workloads. In: Proceedings of the 25th annual international symposium on Computer architecture, pp. 15–26 (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Intel Corporation, 2200 Mission Blvd, Santa Clara, CA, USA
Jack Liu & Youfeng Wu

Authors

Jack Liu
View author publications
You can also search for this author in PubMed Google Scholar
Youfeng Wu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Laboratory, University of Cambridge, 15 JJ Thomson Avenue, CB3 0FD, Cambridge, UK
Alan Mycroft
Saarland University, Germany
Andreas Zeller

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, J., Wu, Y. (2006). Performance Characterization of the 64-bit x86 Architecture from Compiler Optimizations’ Perspective. In: Mycroft, A., Zeller, A. (eds) Compiler Construction. CC 2006. Lecture Notes in Computer Science, vol 3923. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11688839_14

Download citation

DOI: https://doi.org/10.1007/11688839_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33050-9
Online ISBN: 978-3-540-33051-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Performance Characterization of the 64-bit x86 Architecture from Compiler Optimizations’ Perspective

Abstract

Chapter PDF

Similar content being viewed by others

Experimental Characterization of OpenMP Offloading Memory Operations and Unified Shared Memory Support

OmpMemOpt: Optimized Memory Movement for Heterogeneous Computing

An Analysis of Core- and Chip-Level Architectural Features in Four Generations of Intel Server Processors

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Performance Characterization of the 64-bit x86 Architecture from Compiler Optimizations’ Perspective

Abstract

Chapter PDF

Similar content being viewed by others

Experimental Characterization of OpenMP Offloading Memory Operations and Unified Shared Memory Support

OmpMemOpt: Optimized Memory Movement for Heterogeneous Computing

An Analysis of Core- and Chip-Level Architectural Features in Four Generations of Intel Server Processors

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation