research-article

Phantom-BTB: a virtualized branch target buffer design

Authors:
Ioana Burcea

University of Toronto, Toronto, Canada

University of Toronto, Toronto, Canada
View Profile

,
Andreas Moshovos

Unversity of Toronto, Toronto, Canada

Unversity of Toronto, Toronto, Canada
View Profile

ASPLOS XIV: Proceedings of the 14th international conference on Architectural support for programming languages and operating systemsMarch 2009Pages 313–324https://doi.org/10.1145/1508244.1508281

Published:07 March 2009Publication History

ASPLOS XIV: Proceedings of the 14th international conference on Architectural support for programming languages and operating systems

Pages 313–324

ABSTRACT

Modern processors use branch target buffers (BTBs) to predict the target address of branches such that they can fetch ahead in the instruction stream increasing concurrency and performance. Ideally, BTBs would be sufficiently large to capture the entire working set of the application and sufficiently small for fast access and practical on-chip dedicated storage. Depending on the application, these requirements are at odds.

This work introduces a BTB design that accommodates large instruction footprints without dedicating expensive onchip resources. In the proposed Phantom-BTB (PBTB) design, a conventional BTB is augmented with a virtual table that collects branch target information as the application runs. The virtual table does not have fixed dedicated storage. Instead, it is transparently allocated, on demand, in the on-chip caches, at cache line granularity. The entries in the virtual table are proactively prefetched and installed in the dedicated conventional BTB, thus, increasing its perceived capacity. Experimental results with commercial workloads under full-system simulation demonstrate that PBTB improves IPC performance over a 1K-entry BTB by 6.9% on average and up to 12.7%, with a storage overhead of only 8%. Overall, the virtualized design performs within 1% of a conventional 4K-entry, single-cycle access BTB, while the dedicated storage is 3.6 times smaller.

References

First the tick, now the tock: Next generation Intel microarchitecture (Nehalem). White Paper, Intel Co., 2008.Google Scholar
Anastassia Ailamaki, David J. DeWitt, Mark D. Hill, and David A.Wood. DBMSs on a modern processor: Where does time go? In Proceedings of the 25th International Conference on Very Large Data Bases, 1999. Google ScholarDigital Library
Alaa R. Alameldeen and David A. Wood. Variability in architectural simulations of multi-threaded workloads. In Proceedings of the 9th International Symposium on High-Performance Computer Architecture, 2003. Google ScholarDigital Library
Murali Annavaram, Trung Diep, and John Shen. Branch behavior of a commercial OLTP workload on Intel IA32 processors. In Proceedings of the IEEE International Conference of Computer Design, 2002. Google ScholarDigital Library
Luiz Andre Barroso, Kourosh Gharachorloo, and Edouard Bugnion. Memory system characterization of commercial workloads. In Proceedings of the 25th Annual International Symposium on Computer Architecture, 1998. Google ScholarDigital Library
Ioana Burcea and Andreas Moshovos. Virtualizing branch target buffers. Technical report, University of Toronto, 2008. www.eecg.toronto.edu/~ioana/papers/pbtb tech rep.pdf.Google Scholar
Ioana Burcea, Stephen Somogyi, Andreas Moshovos, and Babak Falsafi. Predictor virtualization. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems, 2008. Google ScholarDigital Library
Po-Yung Chang, Eric Hao, and Yale N. Patt. Target prediction for indirect jumps. In Proceedings of the 24th Annual International Symposium on Computer Architecture, 1997. Google ScholarDigital Library
Karel Driesen and Urs Holzle. The cascaded predictor: economical and adaptive branch target prediction. In Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture, 1998. Google ScholarDigital Library
Philip G. Emma, Allan M. Hartstein, Brian R. Prasky, Thomas R. Puzak, Moinuddin K. A. Qureshi, and Vijayalakshmi Srinivasan. Context lookahead storage structures. IBM, U.S. Patent 7337271 B2, 2008.Google Scholar
Barry Fagin and Kathryn Russell. Partial resolution in branch target buffers. In Proceedings of the 28th Annual International Symposium on Microarchitecture, 1995. Google ScholarDigital Library
Nikolaos Hardavellas, Stephen Somogyi, Thomas F. Wenisch, Roland E. Wunderlich, Shelley Chen, Jangwoo Kim, Babak Falsafi, James C. Hoe, and Andreas G. Nowatzyk. Simflex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture. SIGMETRICS Performance Evaluation Review, 31(4), 2004. Google ScholarDigital Library
R. B. Hilgendorf, G. J. Heim, and W. Rosenstiel. Evaluation of branch-prediction methods on traces from commercial applications. IBM Journal of Research and Development, 43(4), 1999. Google ScholarDigital Library
Daniel A. Jimenez. Reconsidering complex branch predictors. In Proceedings of the 9th International Symposium on High-Performance Computer Architecture, 2003. Google ScholarDigital Library
Daniel A. Jimenez, Stephen W. Keckler, and Calvin Lin. The impact of delay on the design of branch predictors. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture, 2000. Google ScholarDigital Library
Jose A. Joao, Onur Mutlu, Hyesoon Kim, Rishi Agarwal, and Yale N. Patt. Improving the performance of object-oriented languages with dynamic predication of indirect jumps. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems, 2008. Google ScholarDigital Library
David R. Kaeli and Philip G. Emma. Branch history table prediction of moving target branches due to subroutine returns. In Proceedings of the 18th Annual International Symposium on Computer Architecture, 1991. Google ScholarDigital Library
Kimberly Keeton, David A. Patterson, Yong Qiang He, Roger C. Raphael, and Walter E. Baker. Performance characterization of a Quad Pentium Pro SMP using OLTP workloads. In Proceedings of the 25th Annual International Symposium on Computer Architecture, 1998. Google ScholarDigital Library
Chetana N. Keltcher, Kevin J. McGrath, Ardsher Ahmed, and Pat Conway. The AMD Opteron processor for multiprocessor servers. IEEE Micro, 23(02), 2003. Google ScholarDigital Library
Ryotaro Kobayashi, Yuji Yamada, Hideki Ando, and Toshio Shimada. A cost-effective branch target buffer with a two-level table organization. In Proceedings of the 2nd International Symposium of Low-Power and High-Speed Chips (COOL Chips II), 1999.Google Scholar
Johnny K. F. Lee and Alan Jay Smith. Branch prediction strategies and branch target buffer design. Computer, 17(1), 1984. Google ScholarDigital Library
Tao Li, Ravi Bhargava, and Lizy Kurian John. Adapting branch-target buffer to improve the target predictability of java code. ACM Transactions on Architecture and Code Optimization, 2(2), 2005. Google ScholarDigital Library
Albert Meixner and Daniel J. Sorin. Unified microprocessor core storage. In Proceedings of the 4th International Conference on Computing Frontiers, 2007. Google ScholarDigital Library
Ravi Nair. Dynamic path-based branch correlation. In Proceedings of the 28th Annual International Symposium on Microarchitecture, 1995. Google ScholarDigital Library
Chris H. Perleberg and Alan Jay Smith. Branch target buffer design and optimization. IEEE Transactions on Computers, 42(4), 1993. Google ScholarDigital Library
Parthasarathy Ranganathan, Sarita Adve, and Norman P. Jouppi. Reconfigurable caches and their application to media processing. In Proceedings of the 27th Annual International Symposium on Computer Architecture, 2000. Google ScholarDigital Library
Andre Seznec, Stephen Felix, Venkata Krishnan, and Yiannakis Sazeides. Design tradeoffs for the alpha ev8 conditional branch predictor. In Proceedings of the 29th Annual International Symposium on Computer Architecture, 2002. Google ScholarDigital Library
Ed H. Sussenguth. Instruction sequence control. IBM, U.S. Patent 3559183, 1971.Google Scholar
Tse-Yu Yeh and Yale N. Patt. Branch history table indexing to prevent pipeline bubbles in wide-issue superscalar processors. In Proceedings of the 26th Annual International Symposium on Microarchitecture, 1993. Google ScholarDigital Library
Tse-Yu Yeh and Yale N. Patt. Alternative implementations of two-level adaptive branch prediction. In Proceedings of the 19th Annual International Symposium on Computer Architecture, 1992. Google ScholarDigital Library

Index Terms

Phantom-BTB: a virtualized branch target buffer design
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

Phantom-BTB: a virtualized branch target buffer design
ASPLOS 2009

Modern processors use branch target buffers (BTBs) to predict the target address of branches such that they can fetch ahead in the instruction stream increasing concurrency and performance. Ideally, BTBs would be sufficiently large to capture the entire ...
Read More
Phantom-BTB: a virtualized branch target buffer design
ASPLOS 2009

Modern processors use branch target buffers (BTBs) to predict the target address of branches such that they can fetch ahead in the instruction stream increasing concurrency and performance. Ideally, BTBs would be sufficiently large to capture the entire ...
Read More
Micro BTB: a high performance and storage efficient last-level branch target buffer for servers
CF '22: Proceedings of the 19th ACM International Conference on Computing Frontiers

High-performance branch target buffers (BTBs) and the L1I cache are key to high-performance front-end. Modern branch predictors are highly accurate, but with an increase in code footprint in modern-day server workloads, BTB and L1I misses are still ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPLOS XIV: Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
March 2009
358 pages
ISBN:9781605584065
DOI:10.1145/1508244
General Chair:
Mary Lou Soffa
University of Virginia, USA
,
Program Chair:
Mary Jane Irwin
Penn State University, USA
ACM SIGARCH Computer Architecture News Volume 37, Issue 1
ASPLOS 2009
March 2009
346 pages
ISSN:0163-5964
DOI:10.1145/2528521
Issue’s Table of Contents
ACM SIGPLAN Notices Volume 44, Issue 3
ASPLOS 2009
March 2009
346 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1508284
Issue’s Table of Contents
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 March 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
branch target buffer
predictor metadata prefetching
predictor virtualization
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate535of2,713submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 23
  Total Citations
  View Citations
- 662
  Total Downloads
- Downloads (Last 12 months)35
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Phantom-BTB: a virtualized branch target buffer design

ASPLOS XIV: Proceedings of the 14th international conference on Architectural support for programming languages and operating systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Phantom-BTB: a virtualized branch target buffer design

Phantom-BTB: a virtualized branch target buffer design

Micro BTB: a high performance and storage efficient last-level branch target buffer for servers