Abstract
Tapeworm II is a software-based simulation tool that evaluates the cache and TLB performance of multiple-task and operating system intensive workloads. Tapeworm resides in an OS kernel and causes a host machine's hardware to drive simulations with kernel traps instead of with address traces, as is conventionally done. This allows Tapeworm to quickly and accurately capture complete memory referencing behavior with a limited degradation in overall system performance. This paper compares trap-driven simulation, as implemented in Tapeworm, with the more common technique of trace-driven memory simulation with respect to speed, accuracy, portability and flexibility.
- Agarwal88 Agarwal, A., Hennessy, J. and Horowitz, M. Cache performance of operating system and multiprogramming workloads. ACM Transactions on Computer Systems 6 (Number 4): 393-431, 1988. Google ScholarDigital Library
- Agarwal86 Agarwal, A., Sites, R. L. and Horowitz, M. ATUM: A new technique for capturing address traces using microcode, In Proceedings of the 13th International Symposium on Computer Architecture, Tokyo, Japan, IEEE, 119-127, 1986. Google ScholarDigital Library
- Alexander85 Alexander, C. A., Keshlear, W. M. and Briggs, F. Translation buffer performance in a UNIX environment. Computer Architecture News 13 (5): 2-14, 1985. Google ScholarDigital Library
- Anderson91 Anderson, T E., Levy, H. M., Bershad, B. N., et al. The interaction of architecture and operating system design, In Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Santa Clara, California, ACM, 108-119, 1991. Google ScholarDigital Library
- Appel91 Appel, A. and Li, K. Virtual memory primitives for user programs, In The 4th International Conference on Architectural Support for Programming Languages and Operating Systems, Santa Clara, California, ACM, 96-107, 1991. Google ScholarDigital Library
- Borg90 Borg, A., Kessler, R. E. and Wall, D. W. Generation and analysis of very long address traces, In The 17th Annual International Symposium on Computer Architecture, IEEE, 1990. Google ScholarDigital Library
- Chen93b Chen, B. Software methods for system address tracing, In The 4th Workshop on Workstation Operating Systems, Napa, California, 1993.Google ScholarCross Ref
- Chen93a Chert, B. and Bershad, B. The impact of operating system structure on memory system performance, In Proc. 14th Symposium on Operating System Principles, 1993. Google ScholarDigital Library
- Clark83 Clark, D. Cache performance in the VAX-11/780. ACM Transactions on Computer Systems I: 24-37, 1983. Google ScholarDigital Library
- Cmelik94 Cmelik, B. and Keppel, D. Shade: A Fast Instruction- Set Simulator for Execution Profiling, In SIGMETRiCS, Nashville, TN, ACM, 128-137, 1994. Google ScholarDigital Library
- Cvetanovic94 Cvetanovic, Z. and Bhandarkar, D. Characterization of Alpha AXP performance using TP and SPEC Work- /oads, In The 21st Annual International Symposium on Computer Architecture, Chicago, Ill., IEEE, 1994. Google ScholarDigital Library
- Eggers90 Eggers, S. J., Keppel, D. R., Koldinger, E. J., et al. Techniques for efficient inline tracing on a shared-memory multiprocessor, In SIGMETRICS Conference on Measurement and Modeling of Computer Systems, ACM, 34-47, 1990. Google ScholarDigital Library
- Flanagan92 Flanagan, K., Grimsrud, K., Archibald, J., et al. BACH: BYU address collection hardware. Brigham Young University. TR-A150-92.1. 1992.Google Scholar
- Gee93 Gee, J., Hill, M., Pnevmatikatos, D., et al. Cache Performance of the SPEC92 Benchmark Suite. IEEE Micro (August): 17-27, 1993. Google ScholarDigital Library
- Holliday91 Holliday, M. A. Techniques for cache and memory simulation using address reference traces. International journal in computer simulation 1: 129-15 I, 1991.Google Scholar
- Hsu89 Hsu, P. Introduction to Shade. Sun Microsystems. 1989.Google Scholar
- Kessler91 Kessler, R. Analysis of multi-megabyte secondary CPU cache memories. University of Wisconsin-Madison. 1991. Google ScholarDigital Library
- Kessler92 Kessler, R. and Hill, M. Page placement algorithms for large real-indexed caches. ACM Transaction on Computer Systems 10 (4): 338-359, 1992. Google ScholarDigital Library
- Larus90 Larus, J. R. Abstract Execution: A technique for efficiently tracing programs. University of Wisconsin-Madison. 1990.Google Scholar
- Larus93 Larus, J. R. Efficient program tracing, iEEE Computer May, 1993: 52-60, 1993. Google ScholarDigital Library
- Lebeck94 Lebeck, A. and Wood, D. Fast-Cache: A new abstraction for memory system simulation. The University of Wisconsin - Madison. Technical Report Number 1211. 1994.Google Scholar
- Magnusson93 Magnusson, P. S. A design for efficient simulation of a multtprocessor, In MASCOTS '93 - Proceedings of the 1993 Western Simulation Multiconference on International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, La Jolla, California, 1993. Google ScholarDigital Library
- Martonosi92 Martonosi, M., Gupta, A. and Anderson, T. Mem- Spy: Analyzing memory system bottlenecks in programs, In SIGMETRICS Conference on the Measurement and Modeling of Computer Systems, ACM, 1992. Google ScholarDigital Library
- Martonosi93 Martonosi, M., Gupta, A. and Anderson, T Effectiveness of trace sampling for performance debugging tools, In SIGMETRICS, Santa Clara, California, ACM, 248-259, 1993. Google ScholarDigital Library
- Mattson70 Mattson, R. L., Gecsei, J., Slutz, D. R., et al. Evaluation Techniques for Storage Hierarchies. IBM Systems Journal 9 (2): 78-117, 1970.Google ScholarDigital Library
- MReport92 Report, M. Sebastopol, CA, MicroDesign Resources, 1992.Google Scholar
- MReport93 Report, M. Sebastopol, CA, MicroDesign Resources, 1993.Google Scholar
- MIPS88 MIPS. RISCompiler Languages Prograrnrner~ Guide. MIPS, 1988.Google Scholar
- Mogul91 Mogul, J. C. and Borg, A. The effect of context switches on cache performance, In Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Santa Clam, California, ACM, 75-84, 1991. Google ScholarDigital Library
- Nagle92 Nagle, D., Uhlig, R. and Mudge, T. Monster: A Tool for Analyzing the Interaction Between Operating Systems and Computer Architectures. The University of Michigan. CSE- TR- 147-92. 1992.Google Scholar
- Nagle93 Nagle, D., Uhlig, R., Stanley, T, S. Sechrest, T Mudge, R. Brown, Design tradeoffs for software-managed TLBs, In The 20th Annual Intemational Symposium on Computer Architecture, San Diego, California, IEEE, 27-38, 1993. Google ScholarDigital Library
- Nagle94 Nagle, D., Uhlig, R., Mudge, T, et al. Optimal Allocation of On-chip Memory for Multiple-API Operating Systems, In The 21st International Symposium on Computer Architecture, Chicago, IL, 1994. Google ScholarDigital Library
- Ousterhout89 Ousterhout, J. Why aren't operating systems getting faster as fast as hardware. WRL Technical Note (TN-11): 1989.Google Scholar
- Patel92 Patel, K., Smith, B. C. and Rowe, L. A. Performance of a Software MPEG Video Decoder. University of California, Berkeley. 1992.Google Scholar
- Puzak85 Puzak, T. Cache-memory design. University of Massachusetts. 1985.Google Scholar
- Reinhardt93 Reinhardt, S., Hill, M., Larus, J., et al. The Wisconsin Wind Tunnel: Virtual Prototyping of Parallel Computers, In SIGMETRICS 93 (Special Issue of Performance Evaluation Review), Santa Clara, CA, ACM, 48-60, 1993. Google ScholarDigital Library
- Sites88 Sites, R. L. and Agarwal, A. Multiprocessor cache analysis with ATUM, in The 15th Annual Intemational Symposium on Computer Architecture, Honolulu, Hawaii, IEEE, 186-195, 1988. Google ScholarDigital Library
- Smith82 Smith, A. J. Cache Memories. Computing Surveys 14 (3): 473-530, 1982. Google ScholarDigital Library
- Smith91 Smith, M. D. Tracing with pixie. Stanford University, Stanford, CA. 1991.Google Scholar
- SPEC91 SPEC. The SPEC Benchmark Suite. SPEC Newsletter. 3: 3-4, 1991.Google Scholar
- Sugumar93 sugumar, R. Multi-configuration simulation algorithms for the evaluation of computer designs. University of Michigan. 1993. Google ScholarDigital Library
- Talluri94 Talluri, M. and Hill, M. Surpassing the TLB Performance of Superpages with Less Operating System Support, In ASPLOS-VI, San Jose, CA, ACM, In this proceedings, 1994. Google ScholarDigital Library
- Thompson89 Thompson, J. and Smith, A. Efficient (stack) algorithms for analysis of write-back and sector memories. ACM Transactions on Computer Systems 7 (1): 78-116, 1989. Google ScholarDigital Library
- Torrellas92 Torrellas, J., Gupta, A. and Hennessy, J. Characterizing the caching and synchronization performance of multiprocessor operating system, In Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, Boston, Massachusetts, ADM, 162-174, 1992. Google ScholarDigital Library
- Uhlig94a Uhlig, R., Nagle, D., Mudge, T, Sechrest, S., Kernelbased Memory Simulation (Extended AbstracO, In SIGMET- RICS, Nashville, TN, University of Michigan, 286-287, 1994. Google ScholarDigital Library
- Uhlig94b Uhlig, R., Nagle, D., Stanley, T., S. Sechrest, T Mudge, R. Brown, Design tradeoffs for software-managed TLBs. ACM Transactions on Computer Systems. To appear in Fall, 1994. Google ScholarDigital Library
Comments