Elsevier

Information Sciences

Volume 120, Issues 1–4, November 1999, Pages 143-157
Information Sciences

An efficient multipath routing for distributed computing systems with data replication

https://doi.org/10.1016/S0020-0255(99)00056-0Get rights and content

Abstract

In distributed computing environments, executing a program often requires the access of remote data files. An efficient data routing scheme is thus important for time-critical applications. To ensure a prior desired communication quality, we present a connection-oriented routing scheme, the multipath routing, which allows multiple routes to be established between the source and the destination. Based on the multipath routing scheme, the problem of finding a collection of routing paths for an application to minimize its data transmission time is addressed. Such a problem becomes a complex combinatorial one when the application accesses multiple replicated data sources. Since finding an optimal solution is computationally infeasible in practice, we thus propose a heuristic method to get a sub-optimal solution.

Introduction

This paper presents a routing algorithm for time critical applications which need to retrieve remote high-bandwidth data file(s) in a LAN-based distributed computing system (DCS). A connection-oriented data forwarding technique, virtual circuit [4], [6], [7], which enables bandwidth to be reserved through the lifetime of a connection, is used in the network. Since the inherent link delay (including propagation delay and transmission delay) is insignificant for bulk data transfer in a high-speed LAN, the design objective of the routing algorithm is to maximize data transfer rate (or throughput). For a source-to-destination traffic session, throughput can frequently be improved by splitting the traffic over several paths. The technique of using multiple paths between a source–destination pair is called multipath routing.

Based on the multipath routing scheme, the problem of optimal routing can be described as a multicommodity flow problem if the number of paths in a source–destination session is unlimited and the inherent link delay and control overhead are negligible. Linear programming is hence a feasible technique to solve it. However, since in a DCS data files are often replicated to improve system (or program) reliability, the program may acquire data from any replica [1], [3]. This problem thus becomes a complex combinatorial one. An exhaustive approach could of course find the optimal solution, but it pays for high computation price. Our previous work has developed an alternate, the critical-cut algorithm, to solve the optimal routing problem [2]. For most experiment cases, the algorithm yields short execution times. Occasionally, it still takes exponential time. This paper thus proposes a heuristic routing algorithm to obtain acceptable routing paths. Short and stable execution time makes the proposed algorithm be applicable to the existent systems.

Assumptions: We assume that a distributed computing system can be described as an undirected graph with nodes representing computing sites and edges representing communication links. We assume virtual circuits for communication between a source and a destination, and multipath routing is supported in the virtual circuit schemes. Intermediate nodes do not buffer packets, but simply send all received packets immediately. Inherent link delay is negligible. A given file cannot be split and distributed to multiple nodes. A communication path cannot have loops.

Section snippets

Critical-cut algorithm

This section presents the critical-cut algorithm to find the optimal routing. Proofs of theorems behind the algorithm can be referred in [2].

Notations

sourcethe node which holds data files
targetthe node which issues request for data files
cutset of edges such that the removal of the edges separates a connected graph into two disconnected subgraphs
(X,X)the cut separating the nodes in set X from the other nodes (i.e., the nodes in X). Note that in presenting a cut by the notation (X,X), target is

Heuristic routing algorithm

The performance of critical-cut method depends on the cut tree and the distribution of files. For the case such as the example in Section 2.5, the cut tree is a linear array and the algorithm therefore yields very short execution time. However, for some other cases, critical cut method may take exponential time to obtain the optimal routing. In practical environments, a routing algorithm with short and stable execution time is important in real-time applications. We thus propose a heuristic

Simulation results

To evaluate the performance of the proposed heuristic routing algorithm, we compared the routes found by the heuristic algorithm with the optimal routing paths. The optimal routing paths were obtained by the critical-cut method as described in Section 2. Many factors such as the network topology, available link capacities, the number of replicated copies of each data file, and the dispersal of the program and the data files in the sense will influence the performance. In order to fairly

Conclusion

In this paper, we propose a heuristic multipath routing algorithm on virtual circuit based DCS. Such a routing scheme is appropriate for high-bandwidth demanded data transfers issued by real-time applications. To evaluate how well the algorithm performs, we made experiments on various networks to monitor the algorithm execution time and the communication capacities of the found routing paths. These observed experimental data are compared with the results obtained by the critical-cut algorithm,

References (7)

There are more references available in the full text version of this article.

Cited by (0)

View full text