Abstract
We propose a new, low-cost fault-tolerant structure for the hypercube that employs spare processors and extra links. The target of the proposed structure is to fully tolerate the first faulty node, no matter where it occurs, and “almost fully” tolerate the second, meaning that the underlying hypercube topology can be resumed if the second faulty node occurs at most locations—expectantly 92% of locations. The unique features of our structure are that (1) it utilizes the unused extra link-ports in the processor nodes of the hypercube to obtain the proposed topology, so that minimum extra hardware is needed in constructing the fault-tolerant structure and (2) the structure's node-degrees are low as desired—the primary and spare nodes all have node-degrees of n + 2 for an n-dimensional hypercube. The number of spare nodes is one fourth of primary nodes. The reconfiguration algorithm in the presence of faults is elegant and efficient. The proposed structure also effectively enhances the diagnosability of the hypercube system. It is shown that the diagnosability of the structure is increased to n + 2, whereas an ordinary n-dimensional hypercube has diagnosability n.
Similar content being viewed by others
References
J. R. Armstrong and F. G. Gray. Fault diagnosis in a Boolean n cube array of microprocessors. IEEE Transactions on Computing, C-30(8):587-590, 1981.
F. J. Allan, T. Kameda, and S. Toida. An approach to the diagnosability analysis of a system. IEEE Transactions on Computing, 24(10):1040-1042, 1975.
M. S. Alam and R. G. Melhem. An efficient modular spare allocation scheme and its application to fault-tolerant binary hypercube. IEEE Transactions on Parallel Distributed Systems, 2:117-126, 1991.
P. Banerjee. Strategies for recon.guring hypercube under faults. In Proceedings of the 20th International Symposium on Fault-Tolerant Computing, 1990.
J. Bruck, R. Cypher, and C.-T. Ho. Efficient fault-tolerant mesh and hypercube architectures. In Proceedings of the 22nd International Symposium on Fault-Tolerant Computing, July 1992, pp. 162-169.
J. Bruck, R. Cypher, and D. Soroker. Running algorithms ef.ciently on faulty hypercubes. Computer Architecture News, 19(1):89-96, 1991.
J. Bruck, R. Cypher, and D. Soroker. Embedding cube-connected cycles graphs into faulty hypercubes. IEEE Transactions on Computing, 43(10):1210-1220, 1994.
S. L. Chau and A. L. Liestman. A proposal for a fault-tolerant binary hypercube architecture. In Proceedings of IEEE Fault Tolerant Computing, 1989, pp. 323-330.
G.-M. Chiu and K.-S. Chen. Use of routing capability for fault-tolerant routing in hypercube multicomputers. IEEE Transactions on Computing, 46(8):953-958, 1997.
G.-M. Chiu and S.-P. Wu. A fault-tolerant routing strategy in hypercube multicomputers. IEEE Transactions on Computing, 45(2):143-155, 1996.
K. Kaneko and H. Ito. Fault-tolerant routing algorithms for hypercube networks. In Proceedings of the 13th International Parallel Processing Symposium (IPPS) and 10th Symposium on Parallel and Distributed Processing (SPDP), April 1999, pp. 218-224.
J. Kuhl and S. Reddy. Distributed fault-tolerance for large multiprocessor systems. In Proceedings of the 7th International Symposium Computing Architecture, 1980, pp. 23-30.
T. C. Lee. Quick recovery of embedded structures in hypercube computers. In Proceedings of the 5th Distributed Memory Computing Conference, April 1990, pp. 1426-1435.
F. P. Preparata, G. Metze, and R. T. Chien. On the connection assignment problem of diagnosable systems. IEEE Transactions on Electronic Computing, EC-16(12):848-854, 1967.
C. S. Raghavendra, P.-J. Yang, and S.-B. Tien. Free dimensions-an ef.cient approach to achieving fault tolerance in hypercubes. In Proceedings of the 22nd International Symposium on Fault-Tolerant Computing, July 1992, pp. 170-177.
G. F. Sullivan. A polynomial time algorithm for fault diagnosability. In Proceedings of the 25th Annual Symposium on the Foundations of Computing Science, pp. 148-156. IEEE Computer Society, 1984.
N.-F. Tzeng and S. Wei. Enhanced hypercubes. IEEE Transactions on Computing, C-40(3):284-294, 1991.
D. Wang. Diagnosability of enhanced hypercubes. IEEE Transactions on Computing, 43(9):1054-1061, 1994.
J. Wu. Adaptive fault-tolerant routing in cube-based multicomputers using safety vectors. IEEE Transactions on Parallel and Distributed Systems, 9(4):321-334, 1998.
C. S. Yang, L. P. Zu, and Y. N. Wu. A reconfigurable modular fault-tolerant hypercube architecture. IEEE Transactions on Parallel and Distributed Systems, 5(10):1018-1032, 1994.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Wang, D. A Low-Cost Fault-Tolerant Structure for the Hypercube. The Journal of Supercomputing 20, 203–216 (2001). https://doi.org/10.1023/A:1011636631661
Issue Date:
DOI: https://doi.org/10.1023/A:1011636631661