IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Regular Section
A Concurrent Partial Snapshot Algorithm for Large-Scale and Dynamic Distributed Systems
Yonghwan KIMTadashi ARARAGIJunya NAKAMURAToshimitsu MASUZAWA
Author information
JOURNAL FREE ACCESS

2014 Volume E97.D Issue 1 Pages 65-76

Details
Abstract

Checkpoint-rollback recovery, which is a universal method for restoring distributed systems after faults, requires a sophisticated snapshot algorithm especially if the systems are large-scale, since repeatedly taking global snapshots of the whole system requires unacceptable communication cost. As a sophisticated snapshot algorithm, a partial snapshot algorithm has been introduced that takes a snapshot of a subsystem consisting only of the nodes that are communication-related to the initiator instead of a global snapshot of the whole system. In this paper, we modify the previous partial snapshot algorithm to create a new one that can take a partial snapshot more efficiently, especially when multiple nodes concurrently initiate the algorithm. Experiments show that the proposed algorithm greatly reduces the amount of communication needed for taking partial snapshots.

Content from these authors
© 2014 The Institute of Electronics, Information and Communication Engineers
Previous article Next article
feedback
Top