Article

Free Access

An effective on-chip preloading scheme to reduce data access penalty

Authors:
Jean-Loup Baer

Department of Computer Science and Engineering, University of Washington, Seattle, WA

Department of Computer Science and Engineering, University of Washington, Seattle, WA
View Profile

,
Tien-Fu Chen

Department of Computer Science and Engineering, University of Washington, Seattle, WA

Department of Computer Science and Engineering, University of Washington, Seattle, WA
View Profile

Supercomputing '91: Proceedings of the 1991 ACM/IEEE conference on SupercomputingAugust 1991Pages 176–186https://doi.org/10.1145/125826.125932

Published:01 August 1991Publication History

Supercomputing '91: Proceedings of the 1991 ACM/IEEE conference on Supercomputing

Pages 176–186

References

1.A. Borg, R. E. Kessler, and D. W. Wall. Generation and analysis of very long address traces. In Proc. of the 17th Annual Int. Syrup. on Computer Architecture, pages 270-281, May 1990. Google ScholarDigital Library
2.E. Cornish, E. Granston, and A. Veidenbaum. Compiler-directed data prefetching in multiprocessor with memory hierarchies. In Proc. 1990 Int. Conf. on Supercomputing, pages 354-368, 1990. Google ScholarDigital Library
3.J. L. H ennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, San Mateo, CA, 1990. Google ScholarDigital Library
4.M. D. Hill. Aspects of Cache Memory and Instruction Buffer Performance. PhD thesis, University of California, Berkeley, 1987. Google ScholarDigital Library
5.N. P. Jouppi. Improving direct-mapped cache performance by the addition of a small fullyassociative cache and prefetch buffers. In Proc. of the 17th Annual Int. Syrup. on Computer Architecture, pages 364-373, May 1990. Google ScholarDigital Library
6.D. Kroft. Lookup-free instruction fetch/prefetch cache organization. In Proc. of the 8th Annual Int. Symp. on Computer Architecture, pages 81- 87, 1981. Google ScholarDigital Library
7.J. K. F. Lee and A. J. Smith. Branch prediction strategies and branch target buffer design. Computer, pages 6-22, January 1984.Google Scholar
8.R. L. Lee, P-C. Yew, and D. H. Lawrie. Data prefetching in shared memory multiprocessors. In Proc. of the Int. Conf. on Parallel Processing, pages 28-31, 1987.Google Scholar
9.C. H. Perleberg and A. J. S1,,iih. Branch target buffer design and optimiza, ,r,. Technical Report UCB/CSD 89/552, Univc,-ity of California, Berkeley, December 1989. Google ScholarDigital Library
10.A. K. Porterfield. Software molt,~)ds for improvement of cache performance on supercomputer application. Technical Report COMP TR 89-93, Rice University, May 1989.Google Scholar
11.S. Przybylski. The performance impact of block sizes and fetch strategies. In Proc. of the 17lh Annual Int. Symp. on Computer Architecture, pages 160-169, May 1990. Google ScholarDigital Library
12.A. J. Smith. Cache memories. A CM Computing Surveys, 14(3):473-530, September 1982. Google ScholarDigital Library
13.J. E. Smith. Decoupled access/execute computer architecture. In Proc. of the 9th Annual Int. Syrup. on Computer Architecture, pages 112-119, 1982. Google ScholarDigital Library
14.The Perfect Club, et al. The Perfect Club benchmarks: Effective performance evaluation of supercomputers. Int. J. of Supercompuler Applications, 23(3):5-40, Fall 1989.Google Scholar

Index Terms

Recommendations

Hardware solutions to reduce effective memory access time
Read More
Recency-based TLB preloading
Special Issue: Proceedings of the 27th annual international symposium on Computer architecture (ISCA '00)

Caching and other latency tolerating techniques have been quite successful in maintaining high memory system performance for general purpose processors. However, TLB misses have become a serious bottleneck as working sets are growing beyond the capacity ...
Read More
Recency-based TLB preloading
ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture

Caching and other latency tolerating techniques have been quite successful in maintaining high memory system performance for general purpose processors. However, TLB misses have become a serious bottleneck as working sets are growing beyond the capacity ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
Supercomputing '91: Proceedings of the 1991 ACM/IEEE conference on Supercomputing
August 1991
920 pages
ISBN:0897914597
DOI:10.1145/125826
Conference Chair:
Ray Elliott
Copyright © 1991 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 August 1991
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Supercomputing '91 Paper Acceptance Rate83of215submissions,39%Overall Acceptance Rate1,516of6,373submissions,24%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 349
  Total Citations
  View Citations
- 1,846
  Total Downloads
- Downloads (Last 12 months)253
- Downloads (Last 6 weeks)23
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An effective on-chip preloading scheme to reduce data access penalty

Supercomputing '91: Proceedings of the 1991 ACM/IEEE conference on Supercomputing

References

Cited By

Index Terms

Recommendations

Hardware solutions to reduce effective memory access time

Recency-based TLB preloading

Recency-based TLB preloading

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

An effective on-chip preloading scheme to reduce data access penalty

Supercomputing '91: Proceedings of the 1991 ACM/IEEE conference on Supercomputing

References

Cited By

Index Terms

Recommendations

Hardware solutions to reduce effective memory access time

Recency-based TLB preloading

Recency-based TLB preloading

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media