Self-similarity and long range dependence on the internet: a second look at the evidence, origins and implications
Introduction
Long range dependence made a dramatic appearance in network traffic modeling, with the publication of the seminal paper by Leland et al. [1]. The paper clearly established that simple Poisson models inherited from the telephony world were not appropriate for packet traffic, and one needed new approaches and paradigms for modeling. Network traffic was shown to be “fractal”, or more accurately long range dependent (LRD), and models and explanations were sought for this surprising behavior. The paper also postulated that heavy tails (specifically power laws) of some distributions associated with the generation of traffic were the likely cause of LRD.
Researchers focused their efforts on detecting power laws, and sure enough, power laws were discovered for Web file sizes, Website connectivities, and the router connection degrees, see, for example, [2], [3], [4], [5]. In this paper we focus on the importance of the power laws in the context of LRD.
These discoveries are important in the study of various protocols and algorithms. For example, the Web file size distribution is important for Web-server scheduling. More importantly, these discoveries motivate us to identify the mechanisms behind the observed power laws and therefore enable us to design mechanisms to improve the current operational structure of the Internet.
In the past few years there has been a significant amount of research on the issue of power laws. Various mechanisms have been proposed as the origins of power law phenomena, including reflected or drifted multiplicative processes, exponentially stopped geometric Brownian motion, self-organized criticality, preferential growth, and highly optimized tolerance among others [3], [6], [7], [8], [9]. There is also a group of researchers advocating lognormal rather than power law distribution as the correct description of the data (see, for example, [10]). The debate has focused on the “tail” behavior since, for example, the primary difference between a lognormal distribution and a power law distribution lies in the tail. We believe that this debate on the nature of the tail (power law vs. lognormal) is of little practical interest or consequence. Expanding on this is the subject of this paper.
Our view on the “tail debate” is as follows: First, we believe that there is never sufficient data to support any analytical form summarizing the tail behavior and therefore any summary could be misleading and dangerous.
Second, all mechanisms aimed at explaining the power law or lognormal tail are fragile in the sense that minor perturbations to their assumptions lead to different analytical forms for the tail. We will make this concrete in the paper. It is important to understand that such perturbations seem to always exist in engineering settings. For example, in the case of exponentially stopped Geometric Brownian motion one needs a stopping time that has an asymptotic exponential tail, which can never be the case in engineering settings since such a stopping time has to end at some finite value. In general, many of the conclusions in the above mentioned studies are statements about the asymptotic tail behavior, which never occur in practice.
Given the difficulty to identify the tail of a distribution from limited data, the fragility of mechanisms to explain different tails, what is one to do as an engineer? Fortunately, as we will observe in this paper, there is little evidence that the tail impacts the design of various algorithms and infrastructure on the Internet. This leads to our next point, which is that engineers should focus on the behavior of a distribution’s “waist”, that refers to the portion for which we have enough data to summarize the distributional information, not its “tail”.
In the next part of the paper, we present a generative model for network traffic that does not rely on heavy tails to produce LRD-like traffic. We argue that the multiple timescale nature of the generation of traffic, coupled with transport protocol issues, make the appearance of LRD-like behavior inevitable for Internet traffic.
The rest of the paper is organized as follows. In the next section, we cover some preliminaries regarding the usage of the terms self-similarity and long range dependence in the context of network traffic modeling and analysis. We present our point of view regarding these definitions. Next, we explore some of the weaknesses concerning the power-law hypothesis as an explanation for LRD in network traffic. We then present our generative model for network traffic and the impact of transport protocols on its correlation structure. Finally, we present our conclusions.
Section snippets
Preliminaries and definitions
We begin by first addressing some semantic issues related to “self-similarity” and LRD. The term “self-similar” is one which is encountered in almost all recent literature dealing with traffic modeling. The usage is often ambiguous, confusing and misleading. There are a variety of terms, we present definitions of some commonly used ones.
Continuous self-similar process: A continuous time process Y(t) is said to be exactly self-similar with self-similarity parameter H (the Hurst parameter) if it
Terminology
We now turn our attention to the study of power law tails and file size distributions. Before we begin, we introduce terminology that will be used in the remainder of this section. First, consider two functions . We say that f ∼ g iff limx→∞f(x)/g(x) = c, 0 < c < ∞. We say that f ≺ g iff limx→∞f(x)/g(x) = 0.
The complementary cumulative distribution function (CCDF) of a random variable X is defined as . We say that X is characterized by a power law distribution if , where α >
A Markovian model for LRD traffic
We now present a model, that is Markovian in its components, and yet exhibits LRD-like correlation and spectral behavior. The model was originally proposed by us in [34]. The crucial observation in developing the model is that layered nature of generation of traffic has an important role in determining the correlation structure for individual traffic sources.
Conclusions
In this paper we have argued that the focus on power law tails in the Internet is misguided. First, many mechanisms have been proposed to explain where they come from. However, they are all very fragile, sensitive to the underlying assumptions. Second, it is extremely difficult if not impossible to statistically characterize a distribution tail based on a finite amount of data. Third, in many applications, the tail plays little role in determining design and performance. Instead, it is the
Wei-Bo Gong (S’87”CM’87”CSM’97”CF’99) received his PhD degree from Harvard University in 1987, and has been with the Department of Electrical and Computer Engineering, University of Massachusetts, Amherst since then. He is also an adjunct professor in the Department of Computer Science at the same campus. His major research interests include control and systems methods in communication networks, network security, and network modeling and analysis. He is a receipient of the IEEE Transactions on
References (36)
- et al.
On a markov modulated chain with pseudo-long range dependences
Performance Evaluation
(1996) - et al.
On the autocorrelation structure of TCP traffic
Computer Networks: Special Issue on Advances in Modeling and Engineering of Long-Range Dependent Traffic
(2002) - et al.
On the self-similar nature of ethernet traffic (extended version)
IEEE/ACM Transactions on Networking
(1999) - et al.
Self-similarity in world wide Web traffic: Evidence and possible causes
IEEE/ACM Transactions on Networking
(1997) - et al.
Highly optimized tolerance: A mechanism for power laws in designed systems
Physical Review E
(1999) - X. Zhu, J. Yu, J. Doyle, Heavy tails, generalized coding, and optimal Web layout, in: Proceedings of IEEE/INFOCOM01,...
- M. Faloutsos, P. Faloutsos, C. Faloutsos, On power-law relationships of the internet topology, in: Proceedings of...
A brief history of generative models for power law and lognormal distributions
Internet Mathematics
(2003)- W.J. Reed, The double pareto-lognormal distribution—a new parametric model for size distribution, Available from...
Emergence of scaling in random networks
Science
Wide-area Internet traffic patterns and characteristics
IEEE Network
Packet-level traffic measurements from the sprint IP backbone
IEEE Network
Estimation of fractal signals from noisy measurements using wavelets
IEEE Transactions on Signal Processing
Internet traffic tends toward poisson and independent as the load increases
Zipf’s law for cities: an explanation
Quarterly Journal of Economics
Cited by (77)
Multi-fractional generalized Cauchy process and its application to teletraffic
2020, Physica A: Statistical Mechanics and its ApplicationsPerformance evaluation of elephant flow predictors in data center networking
2020, Future Generation Computer SystemsRecord length requirement of long-range dependent teletraffic
2017, Physica A: Statistical Mechanics and its ApplicationsMultifractal and Gaussian fractional sum–difference models for Internet traffic
2017, Performance EvaluationCitation Excerpt :suggested that the hierarchical nature of the IP networks gave rise to the observed scaling phenomenon. [47] studied the wavelet spectrum of the packet counts and provided a wavelet estimator of the Hurst parameter. [45] used a n-level hierarchical on–off process to model the aggregate traffic. [78]
Efficient Generators of the Generalized Fractional Gaussian Noise and Cauchy Processes
2023, Fractal and FractionalSelf-Similarity of Traffic Within a 5G Standalone Network
2023, International Conference on Ubiquitous and Future Networks, ICUFN
Wei-Bo Gong (S’87”CM’87”CSM’97”CF’99) received his PhD degree from Harvard University in 1987, and has been with the Department of Electrical and Computer Engineering, University of Massachusetts, Amherst since then. He is also an adjunct professor in the Department of Computer Science at the same campus. His major research interests include control and systems methods in communication networks, network security, and network modeling and analysis. He is a receipient of the IEEE Transactions on Automatic Control’s George Axelby Outstanding paper award, an IEEE Fellow, and the Program Committee Chair for the 43rd IEEE Conference on Decision and Control.
Yong Liu is a senior postdoctoral research associate in computer networks research group of Computer Science department at the University of Massachusetts, Amherst. He received his PhD degree from Electrical and Computer Engineering department at the same university in May 2002. From June to August 2000, he worked as a summer intern in Networking Technologies and Performance Department of Bell Labs, Holmdel, NJ. From September 2000 to March 2001, he continued to work as a co-op in the same department. He received his master degree in the field of automatic control from the University of Science and Technology of China, in July 1997. He is a member of IEEE and ACM.
Vishal Misra has been an Assistant Professor of Computer Science and Electrical Engineering at Columbia University since 2001.He received a B.Tech degree from IIT Bombay, and an MS and PhD from the University of Massachusetts, Amherst, all in Electrical Engineering. His interests lie in the modeling, analysis and design of various aspects of communication networks, with particular emphasis on network traffic and congestion control mechanisms. He is currently also interested in the robustness of network architectures and protocols. He has received an NSF CAREER Award, an IBM Faculty Award and a DoE CAREER award.
Don Towsley (M’78-SM’93-F’95) holds a B.A. in Physics (1971) and a Ph.D. in Computer Science (1975) from University of Texas. From 1976 to 1985 he was a member of the faculty of the Department of Electrical and Computer Engineering at the University of Massachusetts, Amherst. He is currently a Distinguished Professor at the University of Massachusetts in the Department of Computer Science. He has held visiting positions at IBM T.J. Watson Research Center, Yorktown Heights, NY; Laboratoire MASI, Paris, France; INRIA, Sophia-Antipolis, France; AT&T Labs - Research, Florham Park, NJ; and Microsoft Research Lab, Cambridge, UK. His research interests include networks and performance evaluation. He currently serves on the Editorial board of Journal of the ACM and IEEE Journal on Selected Areas in Communications and has previously served on several editorial boards including those of the IEEE Transactions on Communications and IEEE/ACM Transactions on Networking. He was a Program Co-chair of the joint ACM SIGMETRICS and PERFORMANCE ’92 conference and the Performance 2002 conference. He is a member of ACM and ORSA, and Chair of IFIP Working Group 7.3. He has received the 1998 IEEE Communications Society William Bennett Best Paper Award and numerous best conference/workshop paper awards. Last, he has been elected Fellow of both the ACM and IEEE.