article

User-level process checkpoint and restore for migration

ACM SIGOPS Operating Systems Review Volume 35 Issue 2April 2001pp 86–96https://doi.org/10.1145/377069.377091

Published:01 April 2001Publication History

ACM SIGOPS Operating Systems Review

Abstract

In simple words, process checkpointing means saving the state of a process, so that, it can be reconstructed in the future. Checkpointing followed by restore is important for the purpose of load balancing and fault tolerance. For load balancing, processes may have to be migrated among workstations. Before migrating, a process has to be checkpointed, so that, it can be restored from where it left off. For fault tolerance, a process must be ready for a restore at a different site. Thus, an earlier checkpoint must be ready for the restore. In both cases the process needs to be restarted from its latest checkpoint, thus work done preceding the checkpoint is not wasted. This paper discusses simple techniques of implementing a user-level checkpoint and restore operations for Unix processes. The technique does not require any changes in the user programs or the operating system. The details given show the simplicity of the implementation.

References

{1} M. Bozyigit, K. Al-Tawil, S. Naseer. A kernel integrated task migration infrastructure for clusters of workstations. Computers and Electrical Engineering, vol. 26, pp. 279-295, 2000, Elsevier Science Ltd.Google ScholarCross Ref
{2} M. Bozyigit, J. Al-Ghamdi, M. Ghouseuddin and H. Barada. A load balanced distributed computing system. Concurrency: Practice and Experience, vol. 11 (12), pp. 753-771, 1999, John Wiley & Sons, Ltd.Google ScholarCross Ref
{3} M. Litzkow, T. Tannenbaum, J. Basney, and M. Livny. Checkpoint and Migration of Unix Processes in the Condor Distributed Processing System. University of Wisconsin-Madison Computer Science Technical Report # 1346, 1997.Google Scholar
{4} K. A. Iskra, F. van der Linden, Z. W. Hendrikse, B. J. Overeinder, G. D. van Albada, P. M. A. Sloot. The implementation of Dynamite - an environment for migrating PVM tasks. Operating Systems Review, vol. 34 (3), pp. 40-55, July 2000. Google ScholarDigital Library
{5} D. H. J. Epema, Miron Livny, R. van Dantzig, X. Evers, and Jim Pruyne. A Worldwide Flock of Condors: Load Sharing among Workstation Clusters. Journal on Future Generations of Computer Systems, vol. 12, 1996. Google ScholarDigital Library
{6} A. Giest, A. Beguelin, J. Dongarra, W. Jiang, R. Mancheck, and V. Sunderam. PVM: Parallel Virtual Machine. A Users' Guide and Tutorial for Networked Parallel Computing. MIT Press, Cambridge, Massachusetts, 1994. Google ScholarDigital Library
{7} Message Passing Interface Forum. MPI: A Message Passing Interface Standard. Technical Report CS-94-230, Computer Science Department, University of Tennessee, 1994. Google ScholarDigital Library
{8} Krueger P, Chawia R. The Stealth Distributed Scheduler. Proceedings of 8th Conference on DCS, pp. 336-43, 1991.Google Scholar
{9} J. S. Plank, M. Beck, G. Kingsley, and K. Li. Libckpt: Transparent Checkpointing under Unix. Proceedings of Usenix Winter 1995 Technical Conference, New Orleans, LA, pp. 213- 223, 1995. Google ScholarDigital Library
{10} M. Theimer, K. Lantz, and D. Cheriton, Preemtable remote execution facilities for the V-System . Proceedings of the 10th Symposium on Operating System Principles, December 1985. Google ScholarDigital Library
{11} Y. Artsy and R. Finkel. Designing a process migration facility: The Charlotte experience. IEEE Computer, September 1988. Google ScholarDigital Library

Index Terms

User-level process checkpoint and restore for migration
1. Software and its engineering
  1. Software notations and tools
    1. Software libraries and repositories
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        File systems management
    2. Extra-functional properties
      1. Software fault tolerance
        Checkpoint / restart

Recommendations

Comments on "transparent user-level process checkpoint and restore for migration" by Bozyigit and Wasiq

The simple checkpointing and migration system for UNIX processes as described in the article of Bozyigit and Wasiq [1] can be improved in two ways: First by a technique to checkpoint and migrate applications without the need to recompile them and second ...
Read More
Process Migration for MPI Applications based on Coordinated Checkpoint
ICPADS '05: Proceedings of the 11th International Conference on Parallel and Distributed Systems - Volume 01

A lot of research has been done on faulttolerance for MPI applications, some on checkpoint/restart, and some on network faulttolerance. Process migration, however, has not gained widespread use due to the additional complexity of the requirement that ...
Read More
Checkpoint and restore of file locks in userspace
CEE-SECR '17: Proceedings of the 13th Central & Eastern European Software Engineering Conference in Russia

Checkpoint/restore (a.k.a checkpoint/restart) is a technique which is naturally described by its two parts. The first one is a checkpoint. It allows creating snapshot of an application. The second one is restart. It uses the snapshot to run a copy of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGOPS Operating Systems Review Volume 35, Issue 2
April 2001
90 pages
ISSN:0163-5980
DOI:10.1145/377069
Editor:
William M. Waite
Univ. of Colorado, Boulder, CO
Issue’s Table of Contents
Copyright © 2001 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 April 2001
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 39
  Total Citations
  View Citations
- 782
  Total Downloads
- Downloads (Last 12 months)15
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

User-level process checkpoint and restore for migration

ACM SIGOPS Operating Systems Review

Abstract

References

Cited By

Index Terms

Recommendations

Comments on "transparent user-level process checkpoint and restore for migration" by Bozyigit and Wasiq

Process Migration for MPI Applications based on Coordinated Checkpoint

Checkpoint and restore of file locks in userspace

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

User-level process checkpoint and restore for migration

ACM SIGOPS Operating Systems Review

Abstract

References

Cited By

Index Terms

Recommendations

Comments on "transparent user-level process checkpoint and restore for migration" by Bozyigit and Wasiq

Process Migration for MPI Applications based on Coordinated Checkpoint

Checkpoint and restore of file locks in userspace

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media