Abstract
This article presents a software protection technique against radiation-induced faults which is based on a multi-threaded strategy. Data triplication and instructions flow duplication or triplication techniques are used to improve system reliability and thus, ensure a correct system operation. To achieve this objective, a relaxed lockstep model to synchronize the execution of both, redundant threads and variables under protection on different processing units is defined. The evaluation was performed by means of simulated fault injection campaigns in a multi-core ARM system. Results show that despite being considered techniques that imply an evident overhead in memory and instructions (Duplication With Comparison and Re-Execution – DWC-R and Triple Modular Redundancy – TMR), spreading the replicas in different instruction flows not only produce similar results than classic techniques, but also improves the computational and recovery time in presence of soft-errors. In addition, this paper highlights the importance of protecting memory-allocated data, since the instruction flow triplication is not enough to improve the overall system reliability.
Similar content being viewed by others
References
Benedetto JM, Eaton PH, Mavis DG, Gadlage M, Turflinger T (2006) Digital single event transient trends with technology node scaling. IEEE Trans Nuclear Sci 53:3462–3465
Gaillard R (2011) Single event effects: mechanisms and classification. In: Nicolaidis M (ed) Soft errors in modern electronic systems, vol. 41 of frontiers in electronic testing. Springer, Dordrecht, pp 27–54,
Baumann R (2005) Radiation-induced soft errors in advanced semiconductor technologies. IEEE Trans Dev Mater Reliab 5:305–316
Iturbe X, Venu B, Ozer E, Das S (2016) A triple core lock-step (TCLS) ARM®; cortex®;-R5 processor for safety-critical and ultra-reliable applications. In: Proc. 2016 46th Annual IEEE/IFIP international conference on dependable systems and networks workshop (DSN-W). IEEE, pp 246–249
Goloubeva O, Rebaudengo M, Reorda S, Violante M (2006) Software-implemented hardware fault tolerance, vol XIV. Springer
Quinn H, Baker Z, Fairbanks T, Tripp JL, Duran G (2015) Software resilience and the effectiveness of software mitigation in microcontrollers. IEEE Trans Nuclear Sci, 62:2532–2538
Cuenca-Asensi S, Martinez-Alvarez A, Restrepo-Calle F, Palomo FR, Guzman-Miranda H, Aguirre MA (2011) A novel co-design approach for soft errors mitigation in embedded systems. IEEE Trans Nuclear Sci 58:1059–1065
Oz I, Arslan S (2019) A survey on multithreading alternatives for soft error fault tolerance. ACM Comput Surv 52:27,1–27,38
Reinhardt SK, Mukherjee S (2000) Transient fault detection via simultaneous multithreading. ACM SIGARCH Comput Architect News 28:25–36
Mukherjee S, Kontz M, Reinhardt SK (2002) Detailed design and evaluation of redundant multithreading alternatives. ACM SIGARCH Comput Architect News 30:99–110
Wang C, seop Kim H, Wu Y, Ying V (2007) Compiler-managed software-based redundant multi-threading for transient fault detection. In: Proc. International symposium on code generation and optimization (CGO2007). IEEE, pp 244–258
Shye A, Blomstedt J, Moseley T, Reddi V, Connors D (2009) PLR: a software approach to transient fault tolerance for multicore architectures. IEEE Trans Depend Secur Comput 6: 135–148
Rodrigues G, Rosa F, Kastensmidt FL, Reis R, Ost L (2017) Investigating parallel TMR approaches and thread disposability in Linux. In: Proc. 2017 24th IEEE international conference on electronics, circuits and systems (ICECS). IEEE, pp 393– 396
de Oliveira A, Tambara LA, Kastensmidt FL (2017) Applying lockstep in dual-core ARM cortex-a9 to mitigate radiation-induced soft errors. In: 2017 IEEE 8th Latin American symposium on circuits & systems (LASCAS). IEEE, pp 1–4
de Oliveira AB, Rodrigues G, Kastensmidt FL (2017) Analyzing lockstep dual-core ARM cortex-a9 soft error mitigation in freeRTOS applications. In: Proceedings of the 30th symposium on integrated circuits and systems design chip on the sands - SBCCI 2017, SBCCI ’17. ACM Press, New York, pp 84–89
Rodrigues G, ROSA F, de Oliveira A, Kastensmidt FL, Ost L, Reis R (2017) Analyzing the impact of fault tolerance methods in ARM processors under soft errors running linux and parallelization APIs. IEEE Trans Nuclear Sci 64(8):2196–2203
Rodrigues G, Kastensmidt FL, Reis R, Rosa F, Ost L (2016) Analyzing the impact of using pthreads versus OpenMP under fault injection in ARM cortex-a9 dual-core. In: 2016 16th European conference on radiation and its effects on components and systems (RADECS). IEEE, pp 1–6
Hukerikar S, Teranishi K, Diniz PC, Lucas RF (2017) RedThreads: an interface for application-level fault detection/correction through adaptive redundant multithreading. Int J Parallel Prog 46:225–251
Monson JS, Wirthlin M, Hutchings B (2010) Fault injection results of linux operating on an FPGA embedded platform. In: Proc. 2010 international conference on reconfigurable computing and FPGAs. IEEE, pp 37–42
So H, Didehban M, Shrivastava A, Lee K (2019) A software-level redundant multithreading for soft/hard error detection and recovery. In: Proc. 2019 design, automation & test in europe conference & exhibition (DATE). IEEE, pp 1559–1562
Serrano-Cases A, Restrepo-Calle F, Cuenca-Asensi S, Martinez-Alvarez A (2019) Softerror mitigation for multi-core processors based on thread replication. In: Proc. 2019 IEEE Latin American test symposium (LATS). IEEE, pp 1–5
Reinhardt SK, Mukherjee S (2000) Transient fault detection via simultaneous multithreading. ACM SIGARCH Comput Architect News 28:25–36
Martinez-Alvarez A, Cuenca-Asensi S, Restrepo-Calle F, Palomo Pinto FR, Guzman-Miranda H, Aguirre MA (2012) Compiler-directed soft error mitigation for embedded systems. IEEE Trans Depend Secur Comput 9:159–172
Pallister J, Hollis SJ, Bennett J (2013) BEEBS: open benchmarks for energy measurements on embedded platforms. arXiv:https://arxiv.org/abs/1308.5174
Isaza-Gonzalez J, Serrano-Cases A, Restrepo-Calle F, Cuenca-Asensi S, Martinez-Alvarez A (2016) Dependability evaluation of COTS microprocessors via on-chip debugging facilities. In: Proc. 2016 17th Latin-American test symposium (LATS). IEEE, pp 27–32
Reyneri LM, Serrano-Cases A, Morilla Y, Cuenca-Asensi S, Martínez-Álvarez A (2019) A compact model to evaluate the effects of high level C++ code hardening in radiation environments. Electronics 8:653
Reis G, Chang J, Vachharajani N, Rangan R, August D, Mukherjee S (2005) Design and evaluation of hybrid fault-detection systems. In: Proc. 32nd International symposium on computer architecture (ISCA2005). IEEE, pp 148–159
Acknowledgements
This work was funded by the Spanish Ministry of Economy and Competitiveness and the European Regional Development Fund through the following projects: ‘Evaluación temprana de los efectos de radiación mediante simulación y virtualización. Estrategias de mitigación en arquitecturas de microprocesadores avanzados’, (Ref: ESP2015-68245-C4-3-P, MINECO/FEDER, UE).
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible Editor: L.M.B. Pöhls
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Serrano-Cases, A., Restrepo-Calle, F., Cuenca-Asensi, S. et al. Multi-Threaded Mitigation of Radiation-Induced Soft Errors in Bare-Metal Embedded Systems. J Electron Test 36, 47–57 (2020). https://doi.org/10.1007/s10836-019-05846-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10836-019-05846-4