skip to main content
article

User-level process checkpoint and restore for migration

Published:01 April 2001Publication History
Skip Abstract Section

Abstract

In simple words, process checkpointing means saving the state of a process, so that, it can be reconstructed in the future. Checkpointing followed by restore is important for the purpose of load balancing and fault tolerance. For load balancing, processes may have to be migrated among workstations. Before migrating, a process has to be checkpointed, so that, it can be restored from where it left off. For fault tolerance, a process must be ready for a restore at a different site. Thus, an earlier checkpoint must be ready for the restore. In both cases the process needs to be restarted from its latest checkpoint, thus work done preceding the checkpoint is not wasted. This paper discusses simple techniques of implementing a user-level checkpoint and restore operations for Unix processes. The technique does not require any changes in the user programs or the operating system. The details given show the simplicity of the implementation.

References

  1. {1} M. Bozyigit, K. Al-Tawil, S. Naseer. A kernel integrated task migration infrastructure for clusters of workstations. Computers and Electrical Engineering, vol. 26, pp. 279-295, 2000, Elsevier Science Ltd.Google ScholarGoogle ScholarCross RefCross Ref
  2. {2} M. Bozyigit, J. Al-Ghamdi, M. Ghouseuddin and H. Barada. A load balanced distributed computing system. Concurrency: Practice and Experience, vol. 11 (12), pp. 753-771, 1999, John Wiley & Sons, Ltd.Google ScholarGoogle ScholarCross RefCross Ref
  3. {3} M. Litzkow, T. Tannenbaum, J. Basney, and M. Livny. Checkpoint and Migration of Unix Processes in the Condor Distributed Processing System. University of Wisconsin-Madison Computer Science Technical Report # 1346, 1997.Google ScholarGoogle Scholar
  4. {4} K. A. Iskra, F. van der Linden, Z. W. Hendrikse, B. J. Overeinder, G. D. van Albada, P. M. A. Sloot. The implementation of Dynamite - an environment for migrating PVM tasks. Operating Systems Review, vol. 34 (3), pp. 40-55, July 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. {5} D. H. J. Epema, Miron Livny, R. van Dantzig, X. Evers, and Jim Pruyne. A Worldwide Flock of Condors: Load Sharing among Workstation Clusters. Journal on Future Generations of Computer Systems, vol. 12, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. {6} A. Giest, A. Beguelin, J. Dongarra, W. Jiang, R. Mancheck, and V. Sunderam. PVM: Parallel Virtual Machine. A Users' Guide and Tutorial for Networked Parallel Computing. MIT Press, Cambridge, Massachusetts, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. {7} Message Passing Interface Forum. MPI: A Message Passing Interface Standard. Technical Report CS-94-230, Computer Science Department, University of Tennessee, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. {8} Krueger P, Chawia R. The Stealth Distributed Scheduler. Proceedings of 8th Conference on DCS, pp. 336-43, 1991.Google ScholarGoogle Scholar
  9. {9} J. S. Plank, M. Beck, G. Kingsley, and K. Li. Libckpt: Transparent Checkpointing under Unix. Proceedings of Usenix Winter 1995 Technical Conference, New Orleans, LA, pp. 213- 223, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. {10} M. Theimer, K. Lantz, and D. Cheriton, Preemtable remote execution facilities for the V-System . Proceedings of the 10th Symposium on Operating System Principles, December 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. {11} Y. Artsy and R. Finkel. Designing a process migration facility: The Charlotte experience. IEEE Computer, September 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. User-level process checkpoint and restore for migration

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGOPS Operating Systems Review
          ACM SIGOPS Operating Systems Review  Volume 35, Issue 2
          April 2001
          90 pages
          ISSN:0163-5980
          DOI:10.1145/377069
          Issue’s Table of Contents

          Copyright © 2001 Authors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 April 2001

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader