research-article

Predictor virtualization

Authors:
Ioana Burcea

University of Toronto, Toronto, ON, Canada

University of Toronto, Toronto, ON, Canada
View Profile

,
Stephen Somogyi

Carnegie-Mellon University, Pittsburgh, PA

Carnegie-Mellon University, Pittsburgh, PA
View Profile

,
Andreas Moshovos

University of Toronto, Toronto, ON, Canada

University of Toronto, Toronto, ON, Canada
View Profile

,
Babak Falsafi

Carnegie-Mellon University, Pittsburgh, PA

Carnegie-Mellon University, Pittsburgh, PA
View Profile

ASPLOS XIII: Proceedings of the 13th international conference on Architectural support for programming languages and operating systemsMarch 2008Pages 157–167https://doi.org/10.1145/1346281.1346301

Published:01 March 2008Publication History

ASPLOS XIII: Proceedings of the 13th international conference on Architectural support for programming languages and operating systems

Pages 157–167

ABSTRACT

Many hardware optimizations rely on collecting information about program behavior at runtime. This information is stored in lookup tables. To be accurate and effective, these optimizations usually require large dedicated on-chip tables. Although technology advances offer an increased amount of on-chip resources, these resources are allocated to increase the size of on-chip conventional cache hierarchies.

This work proposes Predictor Virtualization, a technique that uses the existing memory hierarchy to emulate large predictor tables. We demonstrate the benefits of this technique by virtualizing a state-of-the-art data prefetcher. Full-system, cycle-accurate simulations demonstrate that the virtualized prefetcher preserves the performance benefits of the original design, while reducing the on-chip storage dedicated to the predictor table from 60KB down to less than one kilobyte.

Supplemental Material

1346301.mp4

mp4

142.3 MB

Download

Available for Download

other

Slides from the presentation

zip

p157-ioana-slides.zip (14.1 MB)

Supplemental material for Predictor virtualization

mp3

1346301.mp3 (10.3 MB)

References

Almog, Y., Rosner, R., Schwartz, N., and Schmorak, A. Specialized Dynamic Optimizations for High-Performance Energy-Efficient Microarchitecture. In Proc. of the Intl' Symposium on Code Generation and Optimization, 2004. Google ScholarDigital Library
Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., and Warfield, A. Xen and the art of virtualization. In Proc. of the 19th Symposium on Operating Systems Principles, 2003. Google ScholarDigital Library
Barroso, L. A., Gharachorloo, K., McNamara, R., Nowatzyk, A., Qadeer, S., Sano, B., Smith, S., Stets, R., and Verghese, B. Piranha: a scalable architecture based on single-chipu multiprocessing. In Proc. Intl' Symposium on Computer Architecture, 2000. Google ScholarDigital Library
Cantin, J. F., Lipasti, M. H., and Smith, J. E. Stealth prefetching. In Proc. of the 12th Intl' Conference on Architectural Support For Programming Languages and Operating Systems, 2006. Google ScholarDigital Library
Chaiken, D., Kubiatowicz, J., and Agarwal, A. LimitLESS directories: A scalable cache coherence scheme. In Proc. of the Intl' Conference on Architectural Support For Programming Languages and Operating Systems, 1991. Google ScholarDigital Library
Clark, C., Fraser, K., Hand, S., Hansen, J. G., Jul, E., Limpach, C., Pratt, I., and Warfield, A. Live migration of virtual machines. In Proc. of the 2nd Symposium on Networked Systems Design & Implementation, 2005. Google ScholarDigital Library
Cooksey, R., Jourdan, S., and Grunwald, D. A stateless, content-directed data prefetching mechanism. In Proc. of the 10th Intl' Conference on Architectural Support For Programming Languages and Operating Systems, 2002. Google ScholarDigital Library
Collins, J., Sair, S., Calder, B., and Tullsen, D. M. Pointer cache assisted prefetching. In Proc. of the 35th Intl' Symposium on Microarchitecture, 2002. Google ScholarDigital Library
Ekman, M., and Stenström, P. Enhancing multiprocessor architecture simulation speed using matched-pair comparison. Proc. Intl' Symp. on the Performance Analysis of Systems and Software, 2005. Google ScholarDigital Library
Ferdman, M., and Falsafi, B. Last-Touch Correlated Data Streaming. In Proc. of the Intl' Symposium on Performance Analysis of Systems and Software, 2007.Google ScholarCross Ref
Gniady, C. and Falsafi, B. Speculative sequential consistency with little custom storage. In Proc. of the Intl' Conference on Parallel Architectures and Compilation Techniques, 2002. Google ScholarDigital Library
Hardavellas, N., Somogyi, S., Wenisch, T. F., Wunderlich, R. E., Chen, S., Kim, J., Falsafi, B, Hoe, J. C., and Nowatzyk, A. G. SimFlex: A fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture. SIGMETRICS Performance Evaluation Review, 2004. Google ScholarDigital Library
Hu, Z., Martonosi, M., and Kaxiras, S. Timekeeping in the Memory System: Predicting and Optimizing Memory Behavior. In Proc.of the 29th Intl' Symposium on Computer Architecture, 2002. Google ScholarDigital Library
Jerger, N., Hill, E., and Lipasti, M. Friendly Fire: Understanding the Effects of Multiprocessor Prefetching. In Proc. of the International Symposium on Performance Analysis of Systems and Software, 2006.Google Scholar
Keltcher, C.N., McGrath, K.J., Ahmed, A., Conway, P. The AMD Opteron processor for multiprocessor servers. IEEE Micro, 23(2): 66--76, 2003. Google ScholarDigital Library
Lipasti, M. H. and Shen, J. P. Exceeding the dataflow limit via value prediction. In Proc. of the 29th Intl' Symposium on Microarchitecture, pages 226--237, 1996. Google ScholarDigital Library
Lipasti, M. H., Wilkerson, C. B., and Shen, J. P. Value locality and load value prediction. In Proc. of the Seventh Intl' Conference on Architectural Support For Programming Languages and Operating Systems, 1996. Google ScholarDigital Library
Nesbit, K. J., and Smith, J. E. Data Cache Prefetching Using a Global History Buffer. In the Proc. of the 10th Intl' Symposium on High Performance Computer Architecture, 2004. Google ScholarDigital Library
Patel, S.J., and Lumetta, S.S. rePLay: A hardware framework for dynamic optimization. Transactions on Computers, 50(6): 590--608, 2001. Google ScholarDigital Library
Qureshi, M.K., Lynch, D.N., Mutlu, O., Patt, Y. N., A Case for MLP-Aware Cache Replacement, In Proc. of the 33rd Intl' Symposium on Computer Architecture, 2006. Google ScholarDigital Library
Rajwar, R., Herlihy, M., and Lai, K. Virtualizing Transactional Memory. In Proc. of the 32nd Intl' Symposium on Computer Architecture, 2005. Google ScholarDigital Library
Ranganathan, P., Adve, S., and Jouppi, N. P. Reconfigurable caches and their application to media processing. In Proc. of the 27th Intl' Symposium on Computer Architecture 2000. Google ScholarDigital Library
Rosner, R., Almog, Y., Moffie, M., Schwartz, N., and Mendelson, A. Power awareness through selective dynamically optimized traces. In Proc. of the 31th Intl' Symposium on Computer Architecture, 2004. Google ScholarDigital Library
Sazeides, Y. and Smith, J. E.The predictability of data values. In Proc. of the 30th Intl' Symposium on Microarchitecture, 1997 Google ScholarDigital Library
Sherwood, T., Sair, S., and Calder, B. Predictor-directed stream buffers. In Proc. of the 33rd Intl' Symposium on Microarchitecture, 2000 Google ScholarDigital Library
Sodani, A. and Sohi, G. S. Dynamic instruction reuse. In Proc. of the 24th Intl' Symposium on Computer Architecture, 1997 Google ScholarDigital Library
Somogyi, S., Wenisch, T. F., Ailamaki, A., Falsafi, B., Moshovos, A. Spatial Memory Streaming. In Proc. Intl' Symposium on Computer Architecture, 2006. Google ScholarDigital Library
Tendler, J., Dodson, S., and Fields, S. IBM eServer Power4 System Microarchitecture, Technical White Paper, IBM Server Group, 2001Google Scholar
VMWare -- http://www.vmware.comGoogle Scholar
Wang, K. and Franklin, M. Highly accurate data value prediction using hybrid predictors. In the Proc. of the 30th Intl' Symposium on Microarchitecture, 1997. Google ScholarDigital Library
Wang, Z., Burger, D., McKinley, K. S., Reinhardt, S. K., and Weems, C. C. Guided region prefetching: a cooperative hardware/software approach. In Proc. of the 30th Intl' Symposium on Computer Architecture, 2003 Google ScholarDigital Library
Wenisch, T. F., Somogyi, S., Hardavellas, N., Kim, J., Ailamaki, A., and Falsafi, B. Temporal Streaming of Shared Memory. In Proc. of the 32nd Intl' Symposium on Computer Architecture, 2005. Google ScholarDigital Library
Wenisch, T.F., Wunderlich, R. E., Ferdman, M., Ailamaki, A., Falsafi, B., and Hoe, J. C. SimFlex: statistical sampling of computer system simuation. IEEE Micro, 26(4): 18--31, 2006. Google ScholarDigital Library
Wunderlich, R. E., Wenisch, T. F., Falsafi, B., Hoe, J. C. SMARTS: Accelerating microarchitecture simulation via rigorous statistical sampling. In Proc. of the 30th Intl' Symposium on Computer Architecture, 2003. Google ScholarDigital Library
Zhang, W., Calder, B., and Tullsen, D. M. An Event-Driven Multithreaded Dynamic Optimization Framework. In Proc. of the 14th Intl' Conference on Parallel Architectures and Compilation Techniques, 2005. Google ScholarDigital Library

Index Terms

Predictor virtualization
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

Predictor virtualization
ASPLOS '08

Many hardware optimizations rely on collecting information about program behavior at runtime. This information is stored in lookup tables. To be accurate and effective, these optimizations usually require large dedicated on-chip tables. Although ...
Read More
Predictor virtualization
ASPLOS '08

Many hardware optimizations rely on collecting information about program behavior at runtime. This information is stored in lookup tables. To be accurate and effective, these optimizations usually require large dedicated on-chip tables. Although ...
Read More
Predictor virtualization
ASPLOS '08

Many hardware optimizations rely on collecting information about program behavior at runtime. This information is stored in lookup tables. To be accurate and effective, these optimizations usually require large dedicated on-chip tables. Although ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPLOS XIII: Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
March 2008
352 pages
ISBN:9781595939586
DOI:10.1145/1346281
General Chair:
Susan Eggers
University of Washington, USA
,
Program Chair:
James Larus
Microsoft Research, USA
ACM SIGPLAN Notices Volume 43, Issue 3
ASPLOS '08
March 2008
339 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1353536
Issue’s Table of Contents
ACM SIGARCH Computer Architecture News Volume 36, Issue 1
ASPLOS '08
March 2008
339 pages
ISSN:0163-5964
DOI:10.1145/1353534
Issue’s Table of Contents
ACM SIGOPS Operating Systems Review Volume 42, Issue 2
ASPLOS '08
March 2008
339 pages
ISSN:0163-5980
DOI:10.1145/1353535
Issue’s Table of Contents
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 March 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
caches
memory hierarchy
metadata
predictor virtualization
Qualifiers
- research-article
Conference

Acceptance Rates
ASPLOS XIII Paper Acceptance Rate31of127submissions,24%Overall Acceptance Rate535of2,713submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 43
  Total Citations
  View Citations
- 1,060
  Total Downloads
- Downloads (Last 12 months)27
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Predictor virtualization

ASPLOS XIII: Proceedings of the 13th international conference on Architectural support for programming languages and operating systems

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Predictor virtualization

Predictor virtualization

Predictor virtualization