ABSTRACT
Processor cores are divided into two categories: fast and power-hungry out-of-order processors, and efficient, but slower in-order processors. To achieve high performance with low-energy budgets, this proposal aims to deliver out-of-order processing by software (SWOOP) on in-order architectures.
Problem: A primary cause for slowdown in in-order processors is last-level cache misses (caused by difficult to predict data-dependent loads), resulting in cores stalling.
Solution: As loads are non-blocking operations, independent instructions are scheduled to run before the loads return. We execute critical load instructions earlier in the program for a three-fold benefit: increasing memory and instruction level parallelism, and hiding memory latency.
Related work: Some instruction scheduling policies attempt to hide memory latency, but scheduling is confined by basic block limits and register pressure. Software pipelining is restricted by dependencies between instructions and decoupled access-execute (DAE) suffers from address re-computation. Unlike EPIC (evolved from VLIW), SWOOP does not require hardware support for predicated execution, speculative loads and their verification, delayed exception handling, memory disambiguation etc.
- A. Jimborean et al. Fix the code. don't tweak the hardware: A new compiler approach to voltage-frequency scaling. In CGO, 2014. Google ScholarDigital Library
- J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach, Appendix H. Morgan Kaufmann Publishers Inc., 2011. Google ScholarDigital Library
- M. Lam. Software pipelining: An effective scheduling technique for VLIW machines. In PLDI, 1988. Google ScholarDigital Library
Index Terms
- Student Research Poster: Software Out-of-Order Execution for In-Order Architectures
Recommendations
Clairvoyance: look-ahead compile-time scheduling
CGO '17: Proceedings of the 2017 International Symposium on Code Generation and OptimizationTo enhance the performance of memory-bound applications, hardware designs have been developed to hide memory latency, such as the out-of-order (OoO) execution engine, at the price of increased energy consumption. Contemporary processor cores span a ...
Scheduling instruction effects for a statically pipelined processor
CASES '15: Proceedings of the 2015 International Conference on Compilers, Architecture and Synthesis for Embedded SystemsStatically pipelined processors have a fully exposed datapath where all portions of the pipeline are directly controlled by effects within an instruction, which simplifies hardware and enables a new level of compiler optimizations. This paper describes ...
Evaluation of scheduling techniques on a SPARC-based VLIW testbed
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on MicroarchitectureThe performance of Very Long Instruction Word (VLIW) microprocessors depends on the close cooperation between the compiler and the architecture. This paper evaluates a set of important compilation techniques and related architectural features for VLIW ...
Comments