Elsevier

Computer Networks

Volume 50, Issue 16, 14 November 2006, Pages 3064-3082
Computer Networks

Structuring topologically aware overlay networks using domain names

https://doi.org/10.1016/j.comnet.2005.12.003Get rights and content

Abstract

Overlay networks are application layer systems which facilitate users in performing distributed functions such as searches over the contents of other users. An important problem in such networks is that the connections among peers are arbitrary, leading in that way to a topology structure, which does not match the underlying physical topology. This topology mismatch leads to large user experienced delays, degraded performance and excessive resource consumption in Wide Area Networks. In this work we propose and evaluate the Distributed Domain Name Order (DDNO) technique, which makes unstructured overlay networks topologically aware. In DDNO, a node devotes half of its connections to nodes that share the same domain-name and the remaining half connections to random nodes. The former connections achieve good performance, because the bulk of the overlay traffic is kept within the same domain, while the latter connections ensure that the topology structure remains connected. Discovery of nodes in the same domain is achieved through on-demand lookup messages, which are guided by local ZoneCaches. Our technique is entirely decentralized making it appropriate for use in Wide Area Networks. Our simulation results, which are based on a real dataset of Internet latencies, indicate that DDNO outperforms other proposed techniques and that it optimizes many desirable properties such as end-to-end delays, connectivity and diameter.

Introduction

The advances of public networks in the last few years have increased the demand for Peer-to-Peer (P2P) application-layer protocols that can be used in the context of multicast [5], distributed object-location [24], [26], [27] and information retrieval [32]. Moreover, P2P file-sharing systems such as Napster [20] and Gnutella [9] have proven that large-scale distributed applications are feasible and that the P2P Computing model will play an important role in infrastructures of future Internet-scale systems.

In the P2P Computing model, participating nodes form a “virtual” overlay structure which serves as the communication medium between the participating computing units. In this model, each node acts both as a client and a server, allowing users to perform distributed functions such as keyword queries. This allows these systems to harness the power of many thousands of computing units rather than only utilizing resources from a monolithic system.

P2P overlays can be divided into two categories: Structured and Unstructured. In Structured P2P overlays [24], [26], [27], network hosts and objects are structured in such a way that object location can be guaranteed within some hop count boundaries. In Unstructured P2P overlays on the other hand, hosts have neither global knowledge nor structure. Early unstructured systems, such as Gnutella [9], rely on flooding the network with queries in order to locate the objects. Recently more efficient query routing techniques based on routing indices [6], heuristics [30] and caching [32] were proposed.

Unstructured P2P networks offer a number of important advantages: (i) An unstructured network imposes very small demands on individual nodes, and more specifically it allows nodes to join or leave the network without significantly affecting the system performance. (ii) Unstructured networks are appropriate for content-based retrieval (e.g., keyword searches) as opposed to object identifier location of structured overlays. (iii) Finally unstructured networks can easily accommodate nodes of varying power. Consequently, they scale to very large sizes and they offer more robust performance in the presence of node failures and connection unreliability.

In current unstructured systems, however, the connections between peers are not based on the underlying network latencies, leading in that way to an inefficient overlay structure. This phenomenon leads to excessive resource consumption in Wide Area Networks as well as degraded user experience because of the increased network delays between the peers in the overlay network. On the other hand, the large-scale and ad-hoc nature of such systems makes it infeasible to pre-compute in a centralize setting some network-efficient overlay structure. Therefore, an important problem is how to structure in a completely decentralized way an overlay network with good topological properties (i.e., low end-to-end delays, diameter and connectivity). Our motivation is to improve application performance, reduce unnecessary traffic and scale well with the size of the network.

In this work, we propose and evaluate DDNO (Distributed Domain Name Order), which is a distributed technique to make unstructured overlay networks topologically aware. In DDNO, a node tries to connect to degree/2 nodes that belong to the same domain (sibling connections) and to another degree/2 of random nodes (random connections). The resulted DDNO topology achieves high performance through sibling connections while the additional random connections ensure that the topology structure remains connected. The choice of degree/2 sibling connections presents a good tradeoff between overlay performance and connectivity in networks of arbitrary degree, as we show in our experimental evaluation. Discovery of sibling nodes in DDNO is achieved through multicast lookup messages, which are send out by each node and which traverse a set of ZoneCaches before finding other siblings. Our earlier study on the network traffic of the Gnutella [9] file-sharing network in [31], reveals that most of the participating nodes do belong to only a few ISPs (see Fig. 1). Therefore most nodes have a good probability of finding other sibling nodes which makes our scheme beneficial for the largest portion of the network. Note that these measurements are consistent with similar studies performed in 2002 by Ripeanu et al. [25], in which they found that more than 40% of these nodes are located within the top ten Autonomous Systems. Additionally the authors found that only 2–5% of Gnutella connections link nodes located within the same Autonomous System, which clearly indicates that application layer overlay networks can unnecessarily impose a huge inter-AS traffic overhead.

The DDNO overlay can become the middleware component for a variety of network-based applications. In the context of distributed file sharing for instance, a user in Germany has a higher probability of finding German music if his search first spans in the “.de” domains. If the overlay network is not topologically aware, then the user’s query will end up traversing domains across many different countries and continents, increasing therefore the delay of receiving back all answers and decreasing the probability of finding the desired results. Moreover, once the file is located the actual download time might also be very large as the file might physically reside far away from the user. Furthermore, our scheme can increase the performance of P2P Information Retrieval [32] systems. In [32] we built and evaluated a large-scale decentralized newspaper network of 1000 nodes using 75 workstations. In this context, our topologically aware scheme will enable users to span their queries to newspaper proxies that are closer to their locations enabling them therefore to locate local news.

In this paper, we consider a fully distributed technique for addressing the problem of efficient overlay construction in unstructured networks. More specifically:

  • We propose and evaluate DDNO (Distributed Domain Name Order), which is an efficient, scalable yet simple technique for constructing topologically aware overlay topologies. DDNO is entirely distributed, requires only local knowledge and therefore scales well with the size of the network.

  • We provide an extensive experimental study to evaluate the performance of our technique. In addition, we compare our technique with other heuristic-based techniques. Our results indicate that DDNO improves many desirable properties such as low end-to-end delays, connectivity and low diameter.

The remainder of the paper is organized as follows: In Section 2, we present the DDNO Algorithm, which is our proposed technique to construct topologically aware overlay networks. In Section 3, we describe three alternative methods for overlay construction in centralized and distributed environments. Section 4 describes our experimental methodology, datasets and evaluation parameters. In Section 5, we present our experimental results. Finally, in Section 6 we discuss related work and conclude the paper in Section 7.

Section snippets

DDNO—Distributed Domain Name Order protocol

In this section, we present the Distributed DNO (DDNO) algorithm which clusters nodes belonging to the same domain together without the need of a centralized component that usually assists in the overlay construction process. In particular, we explain how nodes join the DDNO topology and how domain-name lookups are performed, with the assistance of the Split-Hash and dnMatch functions. Then, we describe the topology maintenance process and how query routing works. An example of a DDNO Topology

Alternative heuristics for overlay construction

In this section, we will describe various overlay construction heuristics that are later compared to DDNO. We start out by defining the computation model of these algorithms. Specifically, each algorithm takes as an input a vertex set V = {1, 2,  , n} and constructs an overlay topology G = (V, E), where the E set represents the overlay connections between the V vertices. The construction of an optimal overlay is known to be NP-complete [8] therefore the following presented algorithms are, similarly to

Experimental methodology

Our experimental evaluation focuses on: (i) the Overlay Performance, in which we evaluate the generated overlays with respect to the overall end-to-end delays, the graph diameter and the number of clusters, and (ii) Lookup Performance, in which we evaluate the performance of lookupDN messages with respect to the number of hops each message traverses and the percentage of resolved queries.

Experimental evaluation

In this section, we present the results of our extensive experimentation with DDNO. More specifically, we implemented centralized and distributed versions of the various algorithms presented in Sections 2 DDNO—Distributed Domain Name Order protocol, 3 Alternative heuristics for overlay construction. Note that in a distributed setting some node has no topological information other than which are its own neighbors. Therefore, global lists of other active nodes or IP-latencies are not available.

Related work

The need of topologically aware unstructured overlay networks has been addressed in [23]. In the proposed BinSL algorithm [23], which was evaluated in this work, end-to-end delays are minimized using a system of k landmarks. Recently an approach to create resilient unstructured overlays with small diameters was proposed in [28]. In the proposed algorithm a node selects from a set of k nodes, r nodes at random (r  k) and then finds from the rest f = k  r nodes the ones that have the largest degree.

Conclusions and future work

In this work we propose and evaluate DDNO (Distributed Domain Name Order), which is a distributed technique to make unstructured overlays topologically aware. We compare DDNO with a number of other overlay construction techniques in both centralized and distributed settings. Our experiments indicate that DDNO is an attractive technique for topologically aware overlay construction as it optimizes many desirable properties such as end-to-end delays, diameter and avoids network partitioning,

Acknowledgements

We would like to thank Dimitrios Gunopulos (UCR), Neal Young (UCR and Akamai), Arthur W. Berger (Akamai and MIT) and Sylvia Ratnasamy (Intel) for the constructive discussions, ideas and suggestions.

Demetrios Zeinalipour-Yazti is a Visiting Lecturer in the Department of Computer Science at the University of Cyprus. His research interests include Network Data Management, Distributed Query Processing, Storage and Retrieval Methods for Peer-to-Peer and Sensor Networks. He holds a Ph.D. and M.Sc. in Computer Science and Engineering from the University of California—Riverside, and a B.Sc. in Computer Science from the University of Cyprus (2000). He has been a visiting researcher at the network

References (34)

  • D. Zeinalipour-Yazti et al.

    Exploiting locality for scalable information retrieval in peer-to-peer systems

    Information Systems Journal

    (2005)
  • V. Batagelj et al.

    PAJEK—Program for large network analysis

    Connections

    (1998)
  • B. Bollobás
    (1998)
  • M. Castro, P. Druschel, Y. Charlie Hu, A. Rowstron, Topology-aware routing in structured peer-to-peer overlay networks,...
  • Y. Chawathe, S. Ratnasamy, L. Breslau, N. Lanham, S. Shenker, Making Gnutella-like P2P systems scalable, in:...
  • Y.-H. Chu, S.G. Rao, H. Zhang, A case for end system multicast, in: Proceedings of the 2000 ACM SIGMETRICS...
  • A. Crespo, H. Garcia-Molina, Routing indices for peer-to-peer systems, in: Proceedings of the 22nd International...
  • F. Dabek, R. Cox, F. Kaashoek, R. Morris, Vivaldi: A decentralized network coordinate system, in: Proceedings of the...
  • M.R. Garey et al.

    Computers and Intractability: A Guide to the Theory of NP-Completeness

    (1979)
  • Gnutella,...
  • J.L. Gross et al.

    Graph theory and its applications

    (1999)
  • T. Hansen, J. Otero, A. McGregor, H.-W. Braun, Active measurement data analysis techniques, in: Proceedings of the...
  • S. Iyer, A. Rowstron, P. Druschel, SQUIRREL: A decentralized, peer-to-peer web cache, in: Proceedings of the...
  • V. Kalogeraki, D. Gunopulos, D. Zeinalipour-Yazti, A local search mechanism for peer-to-peer networks, in: ACM CIKM’02,...
  • J. Jin, K. Nahrstedt, Large-scale service overlay networking with distance-based clustering, in: Proceedings of...
  • Kazaa,...
  • B. Krishnamurthy, J. Wang, Y. Xie, Early measurements of a cluster-based architecture for P2P systems, in: Internet...
  • Cited by (5)

    • Alleviating the topology mismatch problem in distributed overlay networks: A survey

      2016, Journal of Systems and Software
      Citation Excerpt :

      The following table depicts the PROPs behavior regarding the stated criteria: The Distributed Domain Name Order (DDNO) approach (Zeinalipour-Yazti and Kalogeraki, 2006) uses domain names to detect topologically-close nodes. The fundamental assumption of the approach is the nodes found in the same domain are also topologically close.

    • GMAC: An overlay multicast network for mobile agent platforms

      2008, Journal of Parallel and Distributed Computing
      Citation Excerpt :

      Furthermore, the assumption that on the Internet latency represents distance is not necessarily true. To improve the resource utilization, some efforts take into account the underlying network when building overlay networks [27,19,26,35]. As explained in Section 4.2, GMAC refrains from using optimized schemes to minimize the protocol overhead, ease the overlay construction, maintenance, and failure recovery mechanisms.

    • A Read-Only Distributed Hash Table

      2011, Journal of Grid Computing
    • A locality-based LFH cluster strategy for overlay network

      2008, 2008 International Conference on Information Networking, ICOIN
    • pFusion: A P2P architecture for internet-scale content-based search and retrieval

      2007, IEEE Transactions on Parallel and Distributed Systems

    Demetrios Zeinalipour-Yazti is a Visiting Lecturer in the Department of Computer Science at the University of Cyprus. His research interests include Network Data Management, Distributed Query Processing, Storage and Retrieval Methods for Peer-to-Peer and Sensor Networks. He holds a Ph.D. and M.Sc. in Computer Science and Engineering from the University of California—Riverside, and a B.Sc. in Computer Science from the University of Cyprus (2000). He has been a visiting researcher at the network intelligence lab of Akamai Technologies and is currently a reviewer of several scientific journals and conferences in his areas of study.

    Vana Kalogeraki is an Assistant Professor at the Department of Computer Science and Engineering at the University of California, Riverside. Her research interests include distributed and real-time systems, peer-to-peer systems and sensor systems. She received her Ph.D. from the University of California, Santa Barbara in 2000. In 2001–2002, she held a Research Scientist position at Hewlett-Packard Labs in Palo Alto, CA. She has published many technical papers, including co-authoring the Object Management Group (OMG) CORBA Dynamic Scheduling Standard and delivered tutorials. She has served as the Program co-Chair of the “International Workshop on Databases, Information Systems and Peer-to-Peer Computing (DBISP2P)” at VLDB’2003, the Program co-Chair of the “13th International Workshop on Parallel and Distributed Real-Time Systems (WPDRTS’05)” and the Program Chair of the “IEEE International Conference on Pervasive Services (ICPS’05)”. She is currently an Associate Editor for the Ad hoc Networks Journal. Her research is supported by NSF.

    View full text