abstract

Student Research Poster: Software Out-of-Order Execution for In-Order Architectures

Author:
Kim-Anh Tran

Uppsala University, Uppsala, Sweden

Uppsala University, Uppsala, Sweden
View Profile

PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and CompilationSeptember 2016Pages 458https://doi.org/10.1145/2967938.2971466

Published:11 September 2016Publication History

PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and Compilation

Pages 458

ABSTRACT

Processor cores are divided into two categories: fast and power-hungry out-of-order processors, and efficient, but slower in-order processors. To achieve high performance with low-energy budgets, this proposal aims to deliver out-of-order processing by software (SWOOP) on in-order architectures.

Problem: A primary cause for slowdown in in-order processors is last-level cache misses (caused by difficult to predict data-dependent loads), resulting in cores stalling.

Solution: As loads are non-blocking operations, independent instructions are scheduled to run before the loads return. We execute critical load instructions earlier in the program for a three-fold benefit: increasing memory and instruction level parallelism, and hiding memory latency.

Related work: Some instruction scheduling policies attempt to hide memory latency, but scheduling is confined by basic block limits and register pressure. Software pipelining is restricted by dependencies between instructions and decoupled access-execute (DAE) suffers from address re-computation. Unlike EPIC (evolved from VLIW), SWOOP does not require hardware support for predicated execution, speculative loads and their verification, delayed exception handling, memory disambiguation etc.

References

A. Jimborean et al. Fix the code. don't tweak the hardware: A new compiler approach to voltage-frequency scaling. In CGO, 2014. Google ScholarDigital Library
J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach, Appendix H. Morgan Kaufmann Publishers Inc., 2011. Google ScholarDigital Library
M. Lam. Software pipelining: An effective scheduling technique for VLIW machines. In PLDI, 1988. Google ScholarDigital Library

Index Terms

Student Research Poster: Software Out-of-Order Execution for In-Order Architectures
1. Hardware
  1. Electronic design automation
    1. High-level and register-transfer level synthesis
      1. Hardware-software codesign
2. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Clairvoyance: look-ahead compile-time scheduling
CGO '17: Proceedings of the 2017 International Symposium on Code Generation and Optimization

To enhance the performance of memory-bound applications, hardware designs have been developed to hide memory latency, such as the out-of-order (OoO) execution engine, at the price of increased energy consumption. Contemporary processor cores span a ...
Read More
Scheduling instruction effects for a statically pipelined processor
CASES '15: Proceedings of the 2015 International Conference on Compilers, Architecture and Synthesis for Embedded Systems

Statically pipelined processors have a fully exposed datapath where all portions of the pipeline are directly controlled by effects within an instruction, which simplifies hardware and enables a new level of compiler optimizations. This paper describes ...
Read More
Evaluation of scheduling techniques on a SPARC-based VLIW testbed
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture

The performance of Very Long Instruction Word (VLIW) microprocessors depends on the close cooperation between the compiler and the architecture. This paper evaluates a set of important compilation techniques and related architectural features for VLIW ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and Compilation
September 2016
474 pages
ISBN:9781450341219
DOI:10.1145/2967938
General Chairs:
Ayal Zaks
Intel, Israel
,
Bilha Mendelson
Optitura, Israel
,
Program Chairs:
Lawrence Rauchwerger
Texas A&M University, USA
,
Wen-mei W. Hwu
University of Illinois at Urbana-Champaign, USA
Copyright © 2016 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 September 2016
Check for updates
Author Tags
compiler
energy
software out-of-order execution
Qualifiers
- abstract
Conference

Acceptance Rates
PACT '16 Paper Acceptance Rate31of119submissions,26%Overall Acceptance Rate121of471submissions,26%
More
Upcoming Conference
PACT '24

Sponsor:

sigarch

International Conference on Parallel Architectures and Compilation Techniques

October 14 - 16, 2024

Southern California , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 81
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Student Research Poster: Software Out-of-Order Execution for In-Order Architectures

PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and Compilation

ABSTRACT

References

Cited By

Index Terms

Recommendations

Clairvoyance: look-ahead compile-time scheduling

Scheduling instruction effects for a statically pipelined processor

Evaluation of scheduling techniques on a SPARC-based VLIW testbed