Elsevier

Computer Networks

Volume 48, Issue 3, 21 June 2005, Pages 377-399
Computer Networks

Self-similarity and long range dependence on the internet: a second look at the evidence, origins and implications

https://doi.org/10.1016/j.comnet.2004.11.026Get rights and content

Abstract

In this paper we critically reexamine some of the long standing beliefs regarding self-similarity and long range dependence (LRD) on the Internet. Power law tails have been conjectured to be a cause of LRD. In this paper, we reexamine the claims regarding heavy tails. We first examine the generative models for the heavy tail phenomena, both in terms of the fragility of some proposed mechanisms to modeling perturbations as well as the weak statistical evidence for the mechanisms. Next, we take a look at some of the implications of LRD in key performance aspects of Internet algorithms. Finally, we present an alternative model explaining the LRD phenomena of Internet traffic. We argue that the multiple time-scale nature of the generation of traffic and transport protocols make the observation of LRD inevitable.

Introduction

Long range dependence made a dramatic appearance in network traffic modeling, with the publication of the seminal paper by Leland et al. [1]. The paper clearly established that simple Poisson models inherited from the telephony world were not appropriate for packet traffic, and one needed new approaches and paradigms for modeling. Network traffic was shown to be “fractal”, or more accurately long range dependent (LRD), and models and explanations were sought for this surprising behavior. The paper also postulated that heavy tails (specifically power laws) of some distributions associated with the generation of traffic were the likely cause of LRD.

Researchers focused their efforts on detecting power laws, and sure enough, power laws were discovered for Web file sizes, Website connectivities, and the router connection degrees, see, for example, [2], [3], [4], [5]. In this paper we focus on the importance of the power laws in the context of LRD.

These discoveries are important in the study of various protocols and algorithms. For example, the Web file size distribution is important for Web-server scheduling. More importantly, these discoveries motivate us to identify the mechanisms behind the observed power laws and therefore enable us to design mechanisms to improve the current operational structure of the Internet.

In the past few years there has been a significant amount of research on the issue of power laws. Various mechanisms have been proposed as the origins of power law phenomena, including reflected or drifted multiplicative processes, exponentially stopped geometric Brownian motion, self-organized criticality, preferential growth, and highly optimized tolerance among others [3], [6], [7], [8], [9]. There is also a group of researchers advocating lognormal rather than power law distribution as the correct description of the data (see, for example, [10]). The debate has focused on the “tail” behavior since, for example, the primary difference between a lognormal distribution and a power law distribution lies in the tail. We believe that this debate on the nature of the tail (power law vs. lognormal) is of little practical interest or consequence. Expanding on this is the subject of this paper.

Our view on the “tail debate” is as follows: First, we believe that there is never sufficient data to support any analytical form summarizing the tail behavior and therefore any summary could be misleading and dangerous.

Second, all mechanisms aimed at explaining the power law or lognormal tail are fragile in the sense that minor perturbations to their assumptions lead to different analytical forms for the tail. We will make this concrete in the paper. It is important to understand that such perturbations seem to always exist in engineering settings. For example, in the case of exponentially stopped Geometric Brownian motion one needs a stopping time that has an asymptotic exponential tail, which can never be the case in engineering settings since such a stopping time has to end at some finite value. In general, many of the conclusions in the above mentioned studies are statements about the asymptotic tail behavior, which never occur in practice.

Given the difficulty to identify the tail of a distribution from limited data, the fragility of mechanisms to explain different tails, what is one to do as an engineer? Fortunately, as we will observe in this paper, there is little evidence that the tail impacts the design of various algorithms and infrastructure on the Internet. This leads to our next point, which is that engineers should focus on the behavior of a distribution’s “waist”, that refers to the portion for which we have enough data to summarize the distributional information, not its “tail”.

In the next part of the paper, we present a generative model for network traffic that does not rely on heavy tails to produce LRD-like traffic. We argue that the multiple timescale nature of the generation of traffic, coupled with transport protocol issues, make the appearance of LRD-like behavior inevitable for Internet traffic.

The rest of the paper is organized as follows. In the next section, we cover some preliminaries regarding the usage of the terms self-similarity and long range dependence in the context of network traffic modeling and analysis. We present our point of view regarding these definitions. Next, we explore some of the weaknesses concerning the power-law hypothesis as an explanation for LRD in network traffic. We then present our generative model for network traffic and the impact of transport protocols on its correlation structure. Finally, we present our conclusions.

Section snippets

Preliminaries and definitions

We begin by first addressing some semantic issues related to “self-similarity” and LRD. The term “self-similar” is one which is encountered in almost all recent literature dealing with traffic modeling. The usage is often ambiguous, confusing and misleading. There are a variety of terms, we present definitions of some commonly used ones.

Continuous self-similar process: A continuous time process Y(t) is said to be exactly self-similar with self-similarity parameter H (the Hurst parameter) if it

Terminology

We now turn our attention to the study of power law tails and file size distributions. Before we begin, we introduce terminology that will be used in the remainder of this section. First, consider two functions f,g:R+R+. We say that f  g iff limx→∞f(x)/g(x) = c, 0 < c < ∞. We say that f  g iff limx→∞f(x)/g(x) = 0.

The complementary cumulative distribution function (CCDF) of a random variable X is defined as F¯X(x)=P(Xx). We say that X is characterized by a power law distribution if F¯X(x)x-α, where α > 

A Markovian model for LRD traffic

We now present a model, that is Markovian in its components, and yet exhibits LRD-like correlation and spectral behavior. The model was originally proposed by us in [34]. The crucial observation in developing the model is that layered nature of generation of traffic has an important role in determining the correlation structure for individual traffic sources.

Conclusions

In this paper we have argued that the focus on power law tails in the Internet is misguided. First, many mechanisms have been proposed to explain where they come from. However, they are all very fragile, sensitive to the underlying assumptions. Second, it is extremely difficult if not impossible to statistically characterize a distribution tail based on a finite amount of data. Third, in many applications, the tail plays little role in determining design and performance. Instead, it is the

Wei-Bo Gong (S’87”CM’87”CSM’97”CF’99) received his PhD degree from Harvard University in 1987, and has been with the Department of Electrical and Computer Engineering, University of Massachusetts, Amherst since then. He is also an adjunct professor in the Department of Computer Science at the same campus. His major research interests include control and systems methods in communication networks, network security, and network modeling and analysis. He is a receipient of the IEEE Transactions on

References (36)

  • S. Robert et al.

    On a markov modulated chain with pseudo-long range dependences

    Performance Evaluation

    (1996)
  • D.R. Figueiredo et al.

    On the autocorrelation structure of TCP traffic

    Computer Networks: Special Issue on Advances in Modeling and Engineering of Long-Range Dependent Traffic

    (2002)
  • W.E. Leland et al.

    On the self-similar nature of ethernet traffic (extended version)

    IEEE/ACM Transactions on Networking

    (1999)
  • M.E. Crovella et al.

    Self-similarity in world wide Web traffic: Evidence and possible causes

    IEEE/ACM Transactions on Networking

    (1997)
  • J.M. Carlson et al.

    Highly optimized tolerance: A mechanism for power laws in designed systems

    Physical Review E

    (1999)
  • X. Zhu, J. Yu, J. Doyle, Heavy tails, generalized coding, and optimal Web layout, in: Proceedings of IEEE/INFOCOM01,...
  • M. Faloutsos, P. Faloutsos, C. Faloutsos, On power-law relationships of the internet topology, in: Proceedings of...
  • M. Mitzenmacher

    A brief history of generative models for power law and lognormal distributions

    Internet Mathematics

    (2003)
  • W.J. Reed, The double pareto-lognormal distribution—a new parametric model for size distribution, Available from...
  • A.L. Barabasi et al.

    Emergence of scaling in random networks

    Science

    (2000)
  • A.B. Downey, The structural cause of file size distributions, in: Proceedings of IEEE/MASCOTS’01,...
  • K. Thompson et al.

    Wide-area Internet traffic patterns and characteristics

    IEEE Network

    (1997)
  • C. Fraleigh et al.

    Packet-level traffic measurements from the sprint IP backbone

    IEEE Network

    (2003)
  • G.W. Wornell et al.

    Estimation of fractal signals from noisy measurements using wavelets

    IEEE Transactions on Signal Processing

    (1992)
  • J. Cao et al.

    Internet traffic tends toward poisson and independent as the load increases

  • M. Mitzenmacher, Dynamic models for file sizes and double Pareto distributions, 2002, in...
  • X. Gabaix

    Zipf’s law for cities: an explanation

    Quarterly Journal of Economics

    (1997)
  • Cited by (77)

    • Multi-fractional generalized Cauchy process and its application to teletraffic

      2020, Physica A: Statistical Mechanics and its Applications
    • Record length requirement of long-range dependent teletraffic

      2017, Physica A: Statistical Mechanics and its Applications
    • Multifractal and Gaussian fractional sum–difference models for Internet traffic

      2017, Performance Evaluation
      Citation Excerpt :

      suggested that the hierarchical nature of the IP networks gave rise to the observed scaling phenomenon. [47] studied the wavelet spectrum of the packet counts and provided a wavelet estimator of the Hurst parameter. [45] used a n-level hierarchical on–off process to model the aggregate traffic. [78]

    • Self-Similarity of Traffic Within a 5G Standalone Network

      2023, International Conference on Ubiquitous and Future Networks, ICUFN
    View all citing articles on Scopus

    Wei-Bo Gong (S’87”CM’87”CSM’97”CF’99) received his PhD degree from Harvard University in 1987, and has been with the Department of Electrical and Computer Engineering, University of Massachusetts, Amherst since then. He is also an adjunct professor in the Department of Computer Science at the same campus. His major research interests include control and systems methods in communication networks, network security, and network modeling and analysis. He is a receipient of the IEEE Transactions on Automatic Control’s George Axelby Outstanding paper award, an IEEE Fellow, and the Program Committee Chair for the 43rd IEEE Conference on Decision and Control.

    Yong Liu is a senior postdoctoral research associate in computer networks research group of Computer Science department at the University of Massachusetts, Amherst. He received his PhD degree from Electrical and Computer Engineering department at the same university in May 2002. From June to August 2000, he worked as a summer intern in Networking Technologies and Performance Department of Bell Labs, Holmdel, NJ. From September 2000 to March 2001, he continued to work as a co-op in the same department. He received his master degree in the field of automatic control from the University of Science and Technology of China, in July 1997. He is a member of IEEE and ACM.

    Vishal Misra has been an Assistant Professor of Computer Science and Electrical Engineering at Columbia University since 2001.He received a B.Tech degree from IIT Bombay, and an MS and PhD from the University of Massachusetts, Amherst, all in Electrical Engineering. His interests lie in the modeling, analysis and design of various aspects of communication networks, with particular emphasis on network traffic and congestion control mechanisms. He is currently also interested in the robustness of network architectures and protocols. He has received an NSF CAREER Award, an IBM Faculty Award and a DoE CAREER award.

    Don Towsley (M’78-SM’93-F’95) holds a B.A. in Physics (1971) and a Ph.D. in Computer Science (1975) from University of Texas. From 1976 to 1985 he was a member of the faculty of the Department of Electrical and Computer Engineering at the University of Massachusetts, Amherst. He is currently a Distinguished Professor at the University of Massachusetts in the Department of Computer Science. He has held visiting positions at IBM T.J. Watson Research Center, Yorktown Heights, NY; Laboratoire MASI, Paris, France; INRIA, Sophia-Antipolis, France; AT&T Labs - Research, Florham Park, NJ; and Microsoft Research Lab, Cambridge, UK. His research interests include networks and performance evaluation. He currently serves on the Editorial board of Journal of the ACM and IEEE Journal on Selected Areas in Communications and has previously served on several editorial boards including those of the IEEE Transactions on Communications and IEEE/ACM Transactions on Networking. He was a Program Co-chair of the joint ACM SIGMETRICS and PERFORMANCE ’92 conference and the Performance 2002 conference. He is a member of ACM and ORSA, and Chair of IFIP Working Group 7.3. He has received the 1998 IEEE Communications Society William Bennett Best Paper Award and numerous best conference/workshop paper awards. Last, he has been elected Fellow of both the ACM and IEEE.

    View full text