Exploiting Data-Flow for Fault-Tolerance in a Wide-Area Parallel System

Report
Authors:NguyenTuong, Anh, Department of Computer ScienceUniversity of Virginia Grimshaw, Andrew, Department of Computer ScienceUniversity of Virginia Hyett, Mark, Department of Computer ScienceUniversity of Virginia
Abstract:

Wide-area parallel processing systems will soon be available to researchers to solve a range of problems. In these systems, it is certain that host failures and other faults will be a common occurrence. Unfortunately, most parallel processing systems have not been designed with fault-tolerance in mind. Mental is a high-performance object-oriented parallel processing system that is based on an extension of the data-ffow model. The functional nature of data-flow enables both parallelism and faulttolerance. In this paper, we exploit the data underpinning of Mental to provide easy - to - use and transparent fault-tolerance. We present results on both a srnall - scale network and a wide-area heterogeneous environment that consists of three sites: the National Center for Supercomputing Applications, the University of Virginia and the NASA Langley Research Center.
Note: Abstract extracted from PDF file via OCR

Rights:
All rights reserved (no additional license for public reuse)
Language:
English
Source Citation:

NguyenTuong, Anh, Andrew Grimshaw, and Mark Hyett. "Exploiting Data-Flow for Fault-Tolerance in a Wide-Area Parallel System." University of Virginia Dept. of Computer Science Tech Report (1996).

Publisher:
University of Virginia, Department of Computer Science
Published Date:
1996