Skip to main content
Log in

A Low-Cost Fault-Tolerant Structure for the Hypercube

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

We propose a new, low-cost fault-tolerant structure for the hypercube that employs spare processors and extra links. The target of the proposed structure is to fully tolerate the first faulty node, no matter where it occurs, and “almost fully” tolerate the second, meaning that the underlying hypercube topology can be resumed if the second faulty node occurs at most locations—expectantly 92% of locations. The unique features of our structure are that (1) it utilizes the unused extra link-ports in the processor nodes of the hypercube to obtain the proposed topology, so that minimum extra hardware is needed in constructing the fault-tolerant structure and (2) the structure's node-degrees are low as desired—the primary and spare nodes all have node-degrees of n + 2 for an n-dimensional hypercube. The number of spare nodes is one fourth of primary nodes. The reconfiguration algorithm in the presence of faults is elegant and efficient. The proposed structure also effectively enhances the diagnosability of the hypercube system. It is shown that the diagnosability of the structure is increased to n + 2, whereas an ordinary n-dimensional hypercube has diagnosability n.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. J. R. Armstrong and F. G. Gray. Fault diagnosis in a Boolean n cube array of microprocessors. IEEE Transactions on Computing, C-30(8):587-590, 1981.

    Google Scholar 

  2. F. J. Allan, T. Kameda, and S. Toida. An approach to the diagnosability analysis of a system. IEEE Transactions on Computing, 24(10):1040-1042, 1975.

    Google Scholar 

  3. M. S. Alam and R. G. Melhem. An efficient modular spare allocation scheme and its application to fault-tolerant binary hypercube. IEEE Transactions on Parallel Distributed Systems, 2:117-126, 1991.

    Google Scholar 

  4. P. Banerjee. Strategies for recon.guring hypercube under faults. In Proceedings of the 20th International Symposium on Fault-Tolerant Computing, 1990.

  5. J. Bruck, R. Cypher, and C.-T. Ho. Efficient fault-tolerant mesh and hypercube architectures. In Proceedings of the 22nd International Symposium on Fault-Tolerant Computing, July 1992, pp. 162-169.

  6. J. Bruck, R. Cypher, and D. Soroker. Running algorithms ef.ciently on faulty hypercubes. Computer Architecture News, 19(1):89-96, 1991.

    Google Scholar 

  7. J. Bruck, R. Cypher, and D. Soroker. Embedding cube-connected cycles graphs into faulty hypercubes. IEEE Transactions on Computing, 43(10):1210-1220, 1994.

    Google Scholar 

  8. S. L. Chau and A. L. Liestman. A proposal for a fault-tolerant binary hypercube architecture. In Proceedings of IEEE Fault Tolerant Computing, 1989, pp. 323-330.

  9. G.-M. Chiu and K.-S. Chen. Use of routing capability for fault-tolerant routing in hypercube multicomputers. IEEE Transactions on Computing, 46(8):953-958, 1997.

    Google Scholar 

  10. G.-M. Chiu and S.-P. Wu. A fault-tolerant routing strategy in hypercube multicomputers. IEEE Transactions on Computing, 45(2):143-155, 1996.

    Google Scholar 

  11. K. Kaneko and H. Ito. Fault-tolerant routing algorithms for hypercube networks. In Proceedings of the 13th International Parallel Processing Symposium (IPPS) and 10th Symposium on Parallel and Distributed Processing (SPDP), April 1999, pp. 218-224.

  12. J. Kuhl and S. Reddy. Distributed fault-tolerance for large multiprocessor systems. In Proceedings of the 7th International Symposium Computing Architecture, 1980, pp. 23-30.

  13. T. C. Lee. Quick recovery of embedded structures in hypercube computers. In Proceedings of the 5th Distributed Memory Computing Conference, April 1990, pp. 1426-1435.

  14. F. P. Preparata, G. Metze, and R. T. Chien. On the connection assignment problem of diagnosable systems. IEEE Transactions on Electronic Computing, EC-16(12):848-854, 1967.

    Google Scholar 

  15. C. S. Raghavendra, P.-J. Yang, and S.-B. Tien. Free dimensions-an ef.cient approach to achieving fault tolerance in hypercubes. In Proceedings of the 22nd International Symposium on Fault-Tolerant Computing, July 1992, pp. 170-177.

  16. G. F. Sullivan. A polynomial time algorithm for fault diagnosability. In Proceedings of the 25th Annual Symposium on the Foundations of Computing Science, pp. 148-156. IEEE Computer Society, 1984.

  17. N.-F. Tzeng and S. Wei. Enhanced hypercubes. IEEE Transactions on Computing, C-40(3):284-294, 1991.

    Google Scholar 

  18. D. Wang. Diagnosability of enhanced hypercubes. IEEE Transactions on Computing, 43(9):1054-1061, 1994.

    Google Scholar 

  19. J. Wu. Adaptive fault-tolerant routing in cube-based multicomputers using safety vectors. IEEE Transactions on Parallel and Distributed Systems, 9(4):321-334, 1998.

    Google Scholar 

  20. C. S. Yang, L. P. Zu, and Y. N. Wu. A reconfigurable modular fault-tolerant hypercube architecture. IEEE Transactions on Parallel and Distributed Systems, 5(10):1018-1032, 1994.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, D. A Low-Cost Fault-Tolerant Structure for the Hypercube. The Journal of Supercomputing 20, 203–216 (2001). https://doi.org/10.1023/A:1011636631661

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1011636631661

Navigation