Abstract
Intermittent faults are an increasingly challenging difficulty in embedded and real-time systems. As process technologies shrink circuitry, it becomes increasingly susceptible to transient faults from radiation sources such as cosmic rays. Additionally, as software complexity increases, intermittent faults such as race conditions challenge software reliability. Given these motivations, research has approached the paired problems of recovering from a fault, and doing so predictably. However, most past research has been limited in focus to the predictable recovery of faults at the application-level. Examples include systems infrastructures [2] enabling application fault recovery, and scheduling theory [3] that considers periodic faults, and the impact on schedulability for recovery and re-execution of failed applications.
- The Composite component-based system: http://composite.seas.gwu.edu.Google Scholar
- A. Egan, D. Kutz, D. Mikulin, R. Melhem, and D. Mosse. Fault-tolerant rt-mach and an application to real-time train control. Software Practice and Experience, 1999. Google ScholarDigital Library
- P. Mejia-Alvarez and H. Aydin. Scheduling optional computations in fault-tolerant real-time systems. In RTCSA, 2000. Google ScholarDigital Library
- K. Pattabiraman, V. Grover, and B. Zorn. Protecting critical data in unsafe languages. In Eurosys, 2008. Google ScholarDigital Library
Recommendations
Application-Level Fault Tolerance as a Complement to System-Level Fault Tolerance
Special issue on embedded fault-tolerance systemsAs multiprocessor systems become more complex, their reliability will need to increase as well. In this paper we propose a novel technique which is applicable to a wide variety of distributed real-time systems, especially those exhibiting data ...
Using dynamic task level redundancy for OpenMP fault tolerance
ARCS'12: Proceedings of the 25th international conference on Architecture of Computing SystemsObtaining fault tolerant applications and systems is one of today's most important topics of research. Fault tolerance is becoming more and more essential in shared memory parallel programs and in multi/many core architectures due to the decreasing size ...
Application-Aware Byzantine Fault Tolerance
DASC '14: Proceedings of the 2014 IEEE 12th International Conference on Dependable, Autonomic and Secure ComputingByzantine fault tolerance has been intensively studied over the past decade as a way to enhance the intrusion resilience of computer systems. However, state-machine-based Byzantine fault tolerance algorithms require deterministic application processing ...
Comments