Abstract
Nondeterminism is a characteristic of many parallel programs that needs dedicated support from analysis tools and programming environments. In order to allow cyclic debugging of such programs, record&replay mechanisms are used most frequently. Such techniques operate in two phases, where the record phase traces a program’s execution that can be arbitrarily repeated during subsequent replay phases. In contrast to most existing approaches, this paper describes a mechanism that is transparently integrated in the underlying message passing interface. The main advantage of this approach is its omnipresence, such that a program’s execution can be repeated immediately after it has been observed. Other benefits are the lack of instrumentation and a corresponding simplification of the whole technique for inexperienced users. The difficulties addressed by this approach are concerned with the amount of monitor overhead, which must neither perturb the program’s execution nor generate huge amounts of trace data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chassin de Kergommeaux, J., Ronsse, M., De Bosschere, K.: MPL*: Efficient Record/Replay of Nondeterministic Features in Message Passing Libraries. Proc. 6th EuroPVM/MPI Users’ Group Meeting, Barcelona, Spain, 141–148 (Sept. 1999).
Clemencon, C., Fritscher, J., Rühl, R.: Visualization, Execution Control and Replay of Massively Parallel Programs within Annai’s Debugging Tool. Proc. High Performance Computing Symposium, HPCS’ 95, Montreal, Canada, 393–404 (July 1995).
Curtis, R.S., Wittie, L.D.: BugNet: A Debugging System for Parallel Programming Environments. Proc. 3rd Intl. Conf. on Distr. Computing Systems, Miami, FL, USA, 394–399 (October 1982).
Fagot, A., Chassin de Kergommeaux, J.: Systematic Assessment of the Overhead of Tracing Parallel Programs. Proc. EUROMICRO PDP’ 96, 4th EUROMICRO Workshop on Parallel and Distributed Processing, IEEE Computer Society Press, Braga, Portugal, 179–186 (January 1996).
Geist, G.A., Sunderam, V.S.: Network-based Concurrent Computing on the PVM System. in: Concurrency-Practice & Experience, 4, No. 4, 293–311 (1992).
Kranzlmüller, D.: Event Graph Analysis for Debugging Massively Parallel Programs. PhD Thesis, GUP Linz, Joh. Kepler Univ. Linz, Austria, (September 2000) http://www.gup.uni-linz.ac.at/~dk/thesis .
Lamport, L.: Time, Clocks, and the Ordering of Events in a Distributed System. Communications of the ACM, 558–565 (July 1978).
LeBlanc, T.J., Mellor-Crummey, J.M.: Debugging Parallel Programs with Instant Replay. IEEE Transactions on Computers, C-36, No. 4, 471–481 (April 1987).
LeBlanc, T.J., Robbins, A.D.: Event Driven Monitoring of Distributed Programs. Proc. 5th Intl. Conference on Distributed Computing Systems, IEEE Computer Society Press, Denver, CO, USA, 515–522 (May 1985).
Leu, E., Schiper, A.: Execution Replay: A Mechanism for Integrating a Visualization Tool with a Symbolic Debugger. in: Roberts, Y., Bouge, L., Cosnard, M., Trystram, D., (Eds.), Proc. CONPAR 92-VAPP V, Lecture Notes in Computer Science, 634, Springer-Verlag (1992).
Mackey, M.: Program Replay in PVM. Technical Report, Concurrent Computing Department, Hewlett-Packard Laboratories (May 1993).
May, J., Berman, F.: Panorama: A Portable, Extensible Parallel Debugger. Proc. 3rd ACM/ONR Workshop on Parallel and Distributed Debugging, San Diego, CA, USA (May 1993).
Message Passing Interface Forum: MPI: A Message-Passing Interface Standard-Version 1.1. (June 1995) http://www.mcs.anl.gov/mpi/.
Netzer, R.H.B., Miller, B.P.: Optimal Tracing and Replay for Debugging Message-Passing Parallel Program. Proc. Supercomputing 92, Minneapolis, MN, USA, 502–511 (November 1992).
Ronsse, M.A., Kranzlmüller, D.: RoltMP-Replay of Lamport Timestamps for Message-Passing Parallel Systems. Proc. EUROMICRO PDP’ 98, 6th EUROMICRO Workshop on Par. and Distr. Processing, Madrid, Spain, 87–93 (January 1998).
Smith, E.T.: Debugging Tools for Message-Based, Communicating Processes. Proc. 4th Intl. Conference on Distributed Computing Systems, San Francisco, CA, 303–310 (May 1984).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kranzlmüller, D., Schaubschläger, C., Volkert, J. (2001). An Integrated Record&Replay Mechanism for Nondeterministic Message Passing Programs. In: Cotronis, Y., Dongarra, J. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2001. Lecture Notes in Computer Science, vol 2131. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45417-9_28
Download citation
DOI: https://doi.org/10.1007/3-540-45417-9_28
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42609-7
Online ISBN: 978-3-540-45417-5
eBook Packages: Springer Book Archive