Acessibilidade / Reportar erro

From network measurement collection to traffic performance modeling: challenges and lessons learned

Abstract

Recent advances in network technologies have far outpaced our abilities to effectively manage and engineer them. The fundamental challenge in the traffic management of these new and emerging networks is to describe, analyze and control the complex traffic flows they carry in ways that can be applied in practice. Our approach for tackling this problem is experimental in nature and starts with collection and analysis of high-resolution traffic traces from working networks. Traffic models that can accurately and adequately describe these complex flows are devised based on the measurement analyses, which then lead to the development of appropriate and practical traffic management methods. This paper reports on our experiences in analyzing a large number of traffic traces collected from a wide range of network technologies including Ethernet, ISDN packet, CCS, Internet, Frame Relay and ATM. We first describe the challenges in collecting, mining and filtering traffic traces from high-speed networks. Lessons learned from these efforts are then reported. Specifically, we discuss the choice of traffic models and the issues of traffic characterization and performance modeling from various viewpoints: i) aggregate versus individual traffic; ii) user session versus link layer traffic; and iii) user application versus network management traffic. The discussions are given so that the appropriate traffic models can be chosen in different scenarios, and subsequent traffic management methods can be developed and applied to these high-speed networks in practice.

Traffic measurements; data collection; statistical analysis; performance modeling; high-speed; networks


From network measurement collection to traffic performance modeling: challenges and lessons learned

Judith L. Jerkins and Jonathan L. Wang

Telcordia Technologies (Formerly Bellcore)

331 Newman Springs Road

Red Bank, NJ 07701, USA

{jlj,jwang}@research.telcordia.com

Abstract

Recent advances in network technologies have far outpaced our abilities to effectively manage and engineer them. The fundamental challenge in the traffic management of these new and emerging networks is to describe, analyze and control the complex traffic flows they carry in ways that can be applied in practice. Our approach for tackling this problem is experimental in nature and starts with collection and analysis of high-resolution traffic traces from working networks. Traffic models that can accurately and adequately describe these complex flows are devised based on the measurement analyses, which then lead to the development of appropriate and practical traffic management methods.

This paper reports on our experiences in analyzing a large number of traffic traces collected from a wide range of network technologies including Ethernet, ISDN packet, CCS, Internet, Frame Relay and ATM. We first describe the challenges in collecting, mining and filtering traffic traces from high-speed networks. Lessons learned from these efforts are then reported. Specifically, we discuss the choice of traffic models and the issues of traffic characterization and performance modeling from various viewpoints: i) aggregate versus individual traffic; ii) user session versus link layer traffic; and iii) user application versus network management traffic. The discussions are given so that the appropriate traffic models can be chosen in different scenarios, and subsequent traffic management methods can be developed and applied to these high-speed networks in practice.

Keywords: Traffic measurements, data collection, statistical analysis, performance modeling, high-speed networks

1 Introduction

Recent advances in network technologies have far outpaced our abilities to effectively manage and engineer them. The fundamental challenge in the traffic management of these new and emerging networks is to describe, analyze and control the complex traffic flows they carry in ways that can be applied in practice. Our approach for tackling this problem is experimental in nature and starts with collection and analysis of high-resolution traffic traces from working networks. Traffic models that can accurately and adequately describe these complex flows are devised based on the measurement analyses, which then lead to the development of appropriate and practical traffic management methods.

While it is widely accepted that traffic flows in high-speed packet networks such as Frame Relay and ATM are bursty, the notion of bursty traffic has eluded a definitive characterization. In recent years, however, a wealth of traffic trace data from a wide range of packet networks and services has yielded insights into this intuitive notion. In this paper, we will review and discuss the findings from these previous as well as recent statistical analyses based on actual network and application traffic. The purpose of the discussion is to point out that the choice of appropriate traffic models depends on the purpose of the modeling and should take into consideration the specific network and protocol scenarios.

The rest of the paper is organized as follows: In Section 2, we describe the challenges in the data collection effort as well as the subsequent data mining, filtering and extraction tasks in high-speed network measurement studies. Section 3 focuses on the issue of traffic models and traffic characterizations. First, a discussion of the choice of appropriate traffic models is given. We then describe some of the lessons learned regarding high-speed network traffic characteristics from various angles, specifically,

•

aggregate versus individual traffic;

• user session versus link layer traffic;

• user application versus network management traffic.

The discussions are given so that the appropriate traffic models can be chosen in different scenarios, and subsequent traffic management methods can be developed and applied to these high-speed networks in practice. Finally, Section 4 summarizes the paper.

2 Challenges in Data Collection, Mining and Filtering

Traffic measurements form the basis of all traffic management functions. In general, three types of traffic measurements are feasible:

  • those regularly scheduled and collected by a switch,

  • those collected as part of special studies by a switch or adjunct measurement device, and

  • high-resolution measurements that require highly specialized traffic collection and storage capabilities.

For the first two types of measurements, the data collection reporting can either be done by the switch sending the measurements to downstream data collection systems autonomously (after setup in advance) or based on polling (e.g., using SNMP queries) through external data collection devices [11]. These two types of measurements are often specified in related technical and generic requirements (e.g., [5] and [14] specify the data collection requirements for the broadband ATM and Frame Relay switching systems respectively). However, most existing switching systems have only the first type of data collection capability; the switch reports certain counts (e.g., cell counts per VPI/VCI or frame counts per DLCI) over a coarse time scale period (e.g., 15 minutes). Currently, switching systems and the downstream Operations Support Systems (OSSs) lack the capacity to collect, transport, process and store a large number of measurements. More frequent measurements, for example, measurements based on more frequent SNMP queries or more frequent switch peg counts, may impose unacceptable measurement overheads. In the absence of the finer-time measurement capabilities provided by the switch or the adjunct device, the third type of data collection, that is, high-resolution measurement of carried traffic, is necessary to explore and analyze traffic patterns so as to support other traffic management functions.

2.1 Challenges in Data Collection

Unlike commercial protocol analyzers, traffic data collectors, especially collectors for high-speed networks such as ATM, are specialized in the sense that they are not intended to be used for protocol or network diagnostic purposes (although a by-product of the traffic data analysis, as we will discuss later, is that it can be used to identify specific protocol behaviors and vendor implementations). The main purpose of the data collector is to non-intrusively and losslessly collect traffic (cell-by-cell, frame-by-frame, or packet-by-packet depending on the underlying technologies) for an extended period of time with high-resolution timestamp. The challenges for building data collectors of this kind lie not only in the ability to timestamp with high-resolution accuracy (so that the arrival patterns can be accurately recorded), but also in the high-speed tape drive (so that a large amount of traffic data can be collected to render the analysis and characterization statistically reliable). For all the network technologies (e.g., Ethernet, CCS, Frame Relay, ATM) for which we have collected data thus far, Bellcore has built customized traffic data collectors specific to each individual technology. As an example, the ATM data recording device used in analyses [17, 19, 20, 32] can store ATM traffic cell-by-cell for an extended period of time (many hours, depending on the traffic levels) with timestamp resolution of 50 nanoseconds. The collected traces consist of complete (header plus payload) copies of all ATM cells seen on the ATM link during the measurement collection period. This is in contrast to commercially available ATM traffic/protocol analyzers which have limited storage space. For a more detailed description of the ATM traffic data recording device, readers are referred to [32].

In addition to the difficulties in building data collectors, the actual data collection tasks themselves are not easy, and in most cases are made more time-consuming and difficult largely due to the political environment and existing rules and regulations. Just imagine trying to convince people in network operations that the data collectors will be installed in some central offices (with many regulations on safety issues regarding equipment installation) and will literally "vacuum" all the traffic passing through a switch with all the user information (e.g., password) in it. Many justifications and security reassurances have to be documented and implemented before the actual collection can take place, in addition to all the troubles and surprises that may (and will) occur with the equipment shipping, installation, and collection.

2.2 Challenges in Data Mining and Filtering

Depending on the specific technologies and the network loading levels at the time of collection, a large amount of data may be collected. For example, in one ATM collection, about 2.6 billion cells were collected, which translated to about 150 Gigabytes of recorded data [17]. This large data set presented tremendous challenges not only in the data collection task itself, but also in subsequent data storage, extraction, transport and analysis. After collection, the traces were transferred from the recorder's tape drive to a high-performance computing environment for post-processing; scanning and reading the data from the original tapes onto which the cell information was recorded was time consuming. Care was taken to check the sanity of the data, as any anomalies or mistakes could set back the entire analysis for an extended period of time. The two-gigabyte file size limit of the UNIX operating system (prior to Solaris version 2.6) also presented problems in data storage. In most cases, the original data file was divided into smaller files for subsequent storage and analysis. Ad hoc practices and tricks were necessarily applied to divide the files, extract the necessary information, perform analysis and finally combine the results, either because of hardware or software limitations or simply to reduce the analysis time.

To study application level characteristics, two approaches can be and have been used:

  1. direct application level measurements: this requires modification of the application source code (such as modification of the Mosaic source code as done in [6])

  2. mining and filtering based on network traffic measurements.

Each of these two approaches has its own advantages and challenges, specifically, the first approach may introduce intrusive overhead, alter application characteristics, and encounter difficulties as the application source code evolves (e.g., Netscape becomes the more popular browser); while the second approach requires filtering and reassembling collected network traffic data back to application level protocol data units, which may encounter difficulties in identifying the specific application (without looking deep into the protocol data unit and user data) due to a variety of reasons (e.g., encryption) and in accounting for network induced delays.

3 Traffic and Performance Modeling

The goal of traffic characterization and modeling is to allow concise (or parsimonious) descriptions of complex traffic flows in a way that can be practically used in other traffic management functions such as connection admission control algorithms, congestion and overload control mechanisms, network engineering and dimensioning guidelines, and network design and planning methods. The description needs to include the traffic characteristics of the arrivals as well as the amount of work they bring to systems of interest (e.g., an ATM switching system or a Frame Relay trunk). Parsimony is obviously desirable in theory, but it is essential in engineering; the difficulty in supplying large numbers of parameters demanded by many theoretical traffic models is one reason these models find limited use in practice. To achieve parsimony, models need to include statistical features that have performance impacts and ignore those that do not.

The choice of the appropriate model depends on the goal of the traffic modeling. For example, a call level model is needed to design and engineer a network resource that will be held for the duration of a call, while a frame or cell level model is required for designing and engineering network resources that are frame or cell level traffic sensitive. Similarly, models for individual traffic are needed for resources that handle individual sources such as line cards and for traffic management functions that require end-to-end demands, such as network planning and design. Models for aggregate traffic are required for resources that are shared among connections and for traffic management functions that focus on a single network element such as switch or trunk engineering.

In the following, we will discuss relevant traffic characteristics based on our measurement analyses of traffic from various technologies with a variety of different scenarios.

3.1 Aggregate versus Individual Traffic

Depending on the purpose of the traffic characterization, models may be required either for individual connections or on an aggregate basis. Characteristics, modeling and performance evaluation of aggregate and individual traffic are discussed in this section.

3.1.1 Aggregate Traffic

In general, traffic engineering of network equipment and facilities would like to take advantage of large aggregation of independent users. This not only allows more compact description of the aggregate traffic (the marginal distribution tends to Gaussian due to the law of large numbers), but also allows us to neglect detail behavior of individual users and the traffic they generate. For example, aggregate traffic in the voice network (with voice applications) has long been successfully modeled with the simple Poisson model. Only one single parameter (i.e., the mean rate) is required for the Poisson model (assuming the average holding time is known from off-line studies), and many engineering solutions can be subsequently derived based on the model.

An essential aspect of traffic patterns and resource usage in new and emerging networks is complexity due to a wide range of services and applications. With these new services and applications being carried by various packet technologies, the only apparent common feature in all these traffic data sets which we have analyzed is that the traffic is bursty. The results in [25] (ISDN packet traffic), [8] (CCS traffic), [23] (Ethernet traffic), [4, 15] (variable-bit-rate video traffic), [30] (wide area TCP traffic), [6] (World Wide Web traffic), [22] (Frame Relay traffic), and [17] (ATM traffic) were striking for two reasons: (1) these studies demonstrate that it is possible to clearly distinguish between actual packet network traffic and traffic generated by currently employed theoretical models, e.g., batch-Poisson, and (2) in sharp contrast to the traditional packet traffic models, aggregate packet streams are statistically self-similar or fractal; that is, realistic network traffic looks the same "statistically" when measured over time scales ranging from milliseconds to minutes and hours.

One could therefore argue that these fractal features are associated with the most basic feature of packet traffic, which is burstiness. Figure 1 illustrates time series plots of the aggregate traffic collected from three different network technologies: 1.5 Mbps Frame Relay (left), 10 Mbps Ethernet (middle), and 155 Mbps ATM (right). It shows the time series of the number of frames or cells transmitted over a link or trunk, for four different time scales: 10 seconds in the top panel, 1 second in the second panel, 100 milliseconds in the third panel, and 10 milliseconds in the bottom panel. Subintervals viewed on a smaller time scale are indicated by a darker shade in each plot. We see that the traffic exhibits variations over four decades of time scales, which is an indication of its fractal nature.

Figure 1
: Variations of frame/cell counts over many time scales

Figure 2 explores the correlation structure in these traffic streams based on various methods. In the figure, from top to bottom: the autocorrelation [3], the variance-time [23], and the wavelet estimation plots [1,2] are depicted. They all indicate that these fractal traffic streams have long-range dependent characteristics.

Figure 2
: Evidence of long-range dependence

One model used to describe this aggregate packet traffic (the number of arrivals in a given time period) is the Fractional Brownian Motion (FBM) model, which incorporates the self-similarity observed in actual traffic with only three parameters [28]. The mean rate is well known to quantify the volume of traffic carried by the network element. Two other parameters are used to quantify the "burstiness" of the traffic. A "peakedness" coefficient, often estimated as the variance divided by the mean, describes the magnitude of fluctuations on a given time scale. An exponent called the Hurst parameter quantifies the intensity of the long-range dependence. For long-range dependent processes, the Hurst parameter is between 0.5 and 1, while the Hurst parameter equals 0.5 for short-range dependent (SRD) and independent processes such as batch-Poisson (shown as dotted lines in the middle and bottom panels of Figure 2). FBM is called an exactly self-similar model, because it has the same burstiness structure on all time scales. In practice, data traffic shows this scaling behavior over a wide range of time scales, though there are lower cut-offs (for example, about 10 milliseconds for Ethernet traffic) below which short-range correlations dominate, and upper cut-offs beyond which the nonstationary time-of-day effects govern. Many useful engineering methods can be derived based on the FBM traffic model [10, 12, 24, 27].

In terms of the performance evaluation of the aggregate traffic, recent analytical results (providing both approximate and exact relationships for the asymptotic distribution of buffer backlogs for a queue driven by FBM, assuming that the tail distribution of an infinite buffer queue is a surrogate for the cell loss rate in a finite buffer system [26. 28]) indicate that the queuing behavior is Weibullian (or stretched exponential), that is,

where L is the buffer backlog, B is the buffer level and

Curve (b) in Figure 3 shows the Weibullian form of the asymptotic distribution of buffer backlogs on a log-linear scale (left) and a log-log scale (right). Curve (a) depicts the exponential tail behavior which is the queuing behavior obtained with the short-range dependent traffic models such as batch-Poisson. We see that for a given Quality-of-Service (QoS) requirement, the buffer required is far greater in the case of self-similar (or long-range dependent) traffic.

Figure 3
: Traffic performance modeling

Note that the FBM traffic model is valid under the following conditions [10]:

  1. the time scales of interest in the queuing processes and the engineering period fall within the scaling region,

  2. the traffic is aggregated from a large number of independent users, and

  3. the effect of flow controls is negligible on the aggregation of users.

For scenarios where the above conditions are not satisfied, e.g., limited aggregation at the network access, the resulting (aggregate) traffic may exhibit multifractal properties and require further analysis [13].

3.1.2 Individual Traffic

Traffic characteristics of individual connections (for connectionless networks, this means individual source-destination pairs; and for connection-oriented networks, this means individual connections such as traffic on individual DLCIs for Frame Relay and VPI/VCIs for ATM) have also been analyzed [6, 19, 20, 30, 33]. In general, the individual traffic is much more complex than the aggregate, and depends sensitively on the applications being used. Specifically, individual traffic exhibits ON-OFF behavior with very different sojourn time characteristics depending on the applications.

The ON-OFF behavior of connections can often be observed through a textured plot [33], displaying one-dimensional data points (e.g., arrival times of single frames or cells) in a strip in order to show all data points individually; if necessary, the points are displaced vertically by small amounts that are partly random, partly constrained. Figure 4 shows the textured plots of five connections of Frame Relay (left) and ATM (right). We see evidence of the ON-OFF behavior and various ON and OFF sojourn time characteristics.

Figure 4
: Evidence of the ON-OFF behavior and various sojourn time characteristics

[33] first reported characteristics of Ethernet traffic traces and showed that individual connections exhibit ON-OFF behavior with the sojourn times of the ON (active) and OFF (silent) periods being "heavy-tailed" with infinite variances (i.e., Noah Effect). By combining many connections each having heavy-tailed ON and/or OFF periods, the aggregate traffic can be shown to have long-range dependence (LRD) characteristics [33]. The physical reasons behind the heavy-tailed ON and OFF periods can be traced to application characteristics such as file sizes, CPU time for a job, and human behavior in working with computers [6, 33].

Thus, plausible causes of the self-similar aggregate traffic may be

  1. the long-tailed distributed file sizes that users access through the network translate into the long-tailed distributed ON sojourn times in individual traffic

  2. a wide range of user behaviors in accessing information through the network translate into the long-tailed distributed OFF sojourn times in individual traffic

  3. combining a large number of these individual traffic streams with heavy-tailed ON and OFF sojourn times translates into self-similar aggregate traffic.

In terms of the performance evaluation of the individual traffic, an analysis based on modeling the individual traffic by chaotic maps [31] suggests that the queue length distribution function decays as the power law, that is,

where H here is the Hurst parameter of the traffic stream resulting from independently aggregating a large number of these individual traffic sources.

Curve (c) in Figure 3 shows the power law form of the asymptotic distribution of buffer backlogs on a log-linear scale (left) and a log-log scale (right). Compared to Curves (a) and (b), the power law behavior has the heaviest tail. Note that the figure is shown to illustrate the tail behavior of these three probability distributions (exponential, stretched exponential, and power law) with no consideration of any realistic traffic and system parameters.

As we have mentioned, characterization and modeling of traffic with limited aggregations is still not well understood. Figure 5 shows time series of aggregate traffic from one single ON-OFF stream to 50 ON-OFF streams (with Pareto distributed sojourn times). For aggregation of more than 30 traffic streams, the aggregate traffic appears reasonably self-similar. Multifractal analysis, as mentioned before, may be helpful in the analysis for cases with aggregation level below 30 streams (likely in the network access point).

Figure 5
: Aggregation of ON-OFF traffic streams

3.2 User Session versus Link Layer Traffic

As mentioned, the choice of models depends on the purpose of the traffic modeling. A traffic model that incorporates all levels of detail may not be feasible, nor necessary. Even if it is possible to come up with such models, they may well be intractable or require too many input parameters. One way to cope with this difficulty is to carefully choose the models at the appropriate protocol layer based on the modeling requirements.

In terms of protocol layers, two levels of models are especially of interest and importance; these represent the top and "bottom" of the protocol stack. The top level denotes user login sessions in computer communications and call arrivals in voice communications. The bottom or link level (OSI level 2) denotes the cell or frame arrivals depending on the underlying technology.

3.2.1 User Session Traffic

For conventional voice networks, the top level and the bottom level are the same since the network resources are dedicated for the duration of the calls. In this case, the (first attempt) call arrivals, as mentioned, can be successfully modeled by the Poisson distribution. The Poisson model remains valid for new voice services such as 800 [18] and for computer communications such as user login sessions [21,22]. The intuitive explanation is that the Poisson assumption is valid for processes due to aggregation of a large number of independent users, which is true for the top level user or session arrivals whether the arrivals are for voice, 800, or data services. Figure 6 shows the statistics of the interarrival times of user calls and sessions for the 800 (left) and the WWW (right) services. The figure shows the process is independent (middle panels), and the interarrival time is exponentially distributed (the probability density function is depicted in the top panels, and the bottom panels display the Quantile-Quantile (Q-Q) plots against the exponential distribution).

Figure 6
: Session arrivals are Poisson

The call holding times (the work that each individual call brings) for the voice application has often been modeled by the exponential distribution. This, however, has changed as new applications (such as fax, data/WWW, video) and network services (such as FR SVC and ATM SVC) have emerged. The call holding times due to these new and emerging services and applications tend to be longer for obvious reasons (consider the holding times for video and WWW applications). In these cases, not only the average holding time increases, but also the distribution of the holding times changes. In essence, the new holding times distribution has a much longer (or heavier) tail than the exponential distribution, that is, there is non-negligible probability that a call will hold the network resource for an extended period of time. One theoretical model with

heavy tails is the Pareto distribution with cumulative distribution function:

where the shape parameter a can be estimated by the slope of the logarithm of the complementary cumulative distribution function (CCDF), and the location parameter b can be determined by the mean. Note that the random variable X has infinite mean when a £ 1, finite mean but infinite variance when 1 < a £ 2, and finite mean and variance when 2 < a .

Figure 7 shows the log-log CCDF of the holding times for Internet/WWW sessions (top panels) and the Hill estimate (bottom panels) measured from the PSTN/CCS network (for dial-up connections) on the left and the Frame Relay network (for PSTN-offloaded Internet traffic) on the right [21,22]. Both measurements show that the holding times for web sessions have an average of about 20 minutes and exhibit heavy tails. The Hill method [16], a statistical technique for inference related to heavy-tailed phenomena, gave an estimate of a between 1.0 and 2.0, suggesting that web session duration distribution has infinite variance.

Figure 7
: Web session duration is long-tailed with infinite variance

The following summarizes some of the potential impacts of long holding times (for a more detailed discussion, please refer to [9]):

1. Erlang effect: The most obvious effect of long holding times is the increased offered or intended erlang load per user. This either increases the average blocking probability with existing capacity or requires more network equipment and facilities to keep the same blocking performance due to the additional loads. Due to the insensitivity of the Erlang-B to the distribution of the holding time, the average (over a long term, see discussion below) blocking probability can still be obtained through the standard Erlang-B formula. The impact of the change in distribution is less obvious; this will be discussed next.

2. Slow convergence: Standard Erlang B and Engset distribution tables are widely used in practice in designing networks with blocking, in part because of the well-known insensitivity of these results to the distribution of holding times. In principle, one would therefore expect these results to hold for any distribution, including those characterizing long holding time services such as WWW. However, it can be shown [9] that, as a consequence of the long holding times, convergence to the standard results can be so slow that these results may not be useful in practice. Intuitively, it can take time scales of the order of many holding times for convergence to these results. Given the long holding times, it may take a very long time for the blocking behavior to converge to expected levels.

3. Nonstationarity: The slow convergence may suggest that the problem may be alleviated by increasing the length of the engineering period; however, stationarity assumptions on the call arrival process are less applicable as the engineering period is increased. The interaction of nonstationary arrivals in a long holding time environment can cause significantly higher blocking than is indicated by the average erlang load.

4. Reattempts: Because of long holding times, successive user reattempts are likely to be highly correlated, inflating actual blocking to levels beyond standard results. Blocking objectives must be set taking this into account. Unlike in voice telephony, a user may not receive service after a few reattempts due to long holding times of other users.

3.2.2 Link Layer Traffic

In the previous section, we mentioned that the aggregate traffic (at the network link level) can be considered self-similar. This specifically refers to the arrival process. The work that each arrival brings depends on the technologies and applications. For ATM, the work is fixed due to the fixed cell size. For other technologies that allow variable frame sizes, the work each arrival brings depends on the network element under consideration. For network elements that are sensitive to the frame sizes such as links and trunks, the work depends on the frame size distribution. For network elements that are insensitive to the frame sizes such as the switch process (which in most cases only processes the header information), the work is fixed for each arrival. Our current analyses indicate that the frame size characterization very much depends on the applications that the network carries, and frame sizes may or may not exhibit serial correlation among themselves. Figure 8 shows the probability density function, the CCDF, the autocorrelation, the variance-time plots of the frame sizes collected from a Frame Relay network and an Ethernet. We see that in these two data sets, frame sizes exhibit correlation structure themselves. The impact of frame size correlation on performance and engineering needs to be further explored.

Figure 8:
Frame size characteristics

For layers between the top and the bottom, traffic characterization is harder to obtain from measurement analyses. Further mining, filtering and protocol decoding are required and is beyond the scope of this paper.

3.3 User Application versus Network Management Traffic

This section discusses the characteristics of user generated application traffic versus network generated management traffic.

3.3.1 User Application Traffic

Given the rapid evolution in network customer usage and technical composition, the characterization of applications and their traffic flows may seem a moving target. It has been said that traffic characteristics differ by application and by hardware, software, protocol mix, site, recording period, the presence or absence of firewalls, and even implementations (such as automated scripts). Nonetheless, addressing these issues is imperative for network planning and maintenance, and the progress made at each stage facilitates understanding of the developments that may follow.

In general, for TCP applications [6,29,30], the arrival process of user-initiated sessions, such as remote-logins and file transfer connections, are well-modeled using Poisson models with fixed hourly rates. At the transport layer, the interarrival times of the TCP connections for almost all applications can be better modeled using heavy-tailed distributions, and the arrival process of these connections appears self-similar. Figures 9 and 10 show the (1-second) time series plots for popular TCP and UDP applications in a busy hour. Each application is identified by either the source port number or the destination port number. Two columns are shown for each application: the source is shown on the left (where the corresponding port number is found in the Source Port Number field in the TCP or UDP header) and the destination is shown on the right (where the corresponding port number is found in the Destination Port Number field). Note that some UDP applications are not known to us (by just examining the port numbers since they are not well-defined), in which cases the port numbers are shown instead.

Figure 9
: Frame level 1-second time series plots for popular TCP applications
Figure 10
: Frame level 1-second time series plots for popular UDP applications

For WWW applications, the top left panel of Figure 11 shows the variance-time plot of an http arrival process which has a Hurst parameter estimate between 0.8 and 0.85, thus is long-range dependent. The bottom left panel of Figure 11 shows the CCDF of the retrieved web page sizes which exhibits a heavy tail.

Figure 11
: WWW and VBR video application characterization

The right panels of Figure 11 compare the variance-time plots between the cell-level and burst-level (video frame level) arrival count processes for a video source [20]. The burst-level variance-time analysis resembles those shown previously [4, 15] which exhibit both short- and long-range dependencies (top panel) while the cell-level variance-time analysis displays only long-range dependent structure (bottom panel). The difference may come from the fact that the cells in a video frame are buffered before they are transmitted over the network link. This cell buffering and emission in effect eliminates the short-range dependent (or high frequency) structure between bursts (or video frames). This again emphasizes the point we made earlier, that traffic description at the network link level is highly dependent on aggregation, other level protocols, hardware, link speed, etc. Both processes, however, yield approximately the same Hurst parameter estimate (which, as we recall, measures the intensity of long-range dependence.) This VBR video with long- and short-range dependence can be modeled using a Fractional Auto-Regressive Integrated Moving Average (F-ARIMA) process [4, 15].

As new applications (e.g., network games, note that the UDP port number 27001 shown in Figure [10] is believed to be used for a network game called "Quake") emerge, continuing monitoring and characterization is necessary to understand the traffic patterns these applications generate as well as to assess their performance and engineering impacts.

3.3.2 Network Management Traffic

Unlike most user application traffic, the network management traffic tends to be periodic. The purposes of network management packets include, among others, route discovery, routing information exchange, "heart-beat", failure recovery, congestion notification and service advertisement. From a performance viewpoint, network management traffic, considered to be an overhead, should be kept to a minimum. On the other hand, the frequency of the network management traffic should not be so small as to make the information that the network management packets convey out of date. For a MAN/WAN (such as Frame Relay and ATM networks) that interconnects various LANs for the purpose of establishing a virtual private network (VPN), several connections (e.g., DLCIs or VPI/VCIs) may be subscribed between various locations. The effect of periodic network management updates (e.g., IP Routing Information Protocol (RIP) or IPX Service Advertisement Protocol (SAP)) may affect the traffic patterns in the following ways:

·

Periodicity: Since a router interconnecting a LAN with a WAN sends out periodic network management updates to various routers within the same VPN through the WAN, traffic in the WAN will exhibit temporal periodic correlation corresponds to the periods of the specific network management protocols and their implementations. The left panel of Figure [12] shows the autocorrelation plot (for 1 second frame counts) for a subset of traffic which is carried via IPX; while the right panel shows the autocorrelation for the IP traffic. The 10-second and 60-second correlations are apparent in the IPX traffic and the 30-second correlations are apparent in the IP traffic.


Figure 12: Periodicities of IPX and IP traffic

· Spatial correlation: Various connections within the WAN will be spatially correlated since the network management updates are typically synchronized.

Both of these effects will have impact on the choice of traffic models. For example, as we recall, one of the criteria of choosing the FBM model is that traffic is aggregated from a large number of independent users, thus spatial correlation among the users may nullify the validity of the use of FBM model.

One may argue that network management traffic, being an overhead, eventually will have minimal effect on the aggregate traffic once the user traffic grows. However, we have observed in working and relatively mature networks that users subscribe to channels or connections and use them exclusively for signaling or management traffic. In fact, in a network we collected traffic from for two consecutive years, this type of "application" of wide area network to transport management traffic between routers actually increased from unnoticeable to substantial. It remains to be seen whether a large portion of the traffic carried by future networks will still be network management traffic.

4 Discussion and Summary

In this paper, we described the challenges and lessons learned from traffic measurement analyses in a wide range of packet networks. We first discussed the challenges in collecting, mining, filtering and analyzing traffic traces from high-speed networks.

Using examples based on actual network measurements to facilitate the discussion, the lessons learned so far regarding the actual traffic patterns in these packet networks were then given, specifically, discussing traffic characteristics from various angles:

· aggregate network traffic versus individual connection traffic: the aggregate traffic is self-similar, can be modeled by FBM, and the queuing performance is Weibullian; on the other hand, the individual connection is ON-OFF with different ON and OFF sojourn times characteristics, and the queuing performance is power law.

· user session traffic versus network link traffic: user session arrival is Poisson, and the session holding time has a light tail for voice calls and a long tail for data and video applications, and the impacts of long holding times on network performance include erlang effect (i.e., heavier load), slow convergence to theoretical results, nonstationarity, and higher reattempt blocking; on the other hand, the work that the network link traffic brings to the resource depends on whether the network protocol allows for variable-size frames and whether the resource usage is sensitive to the frame size.

· application traffic versus network management traffic: user generated application traffic depends on the specific application and can have very different and bursty characteristics; on the other hand, network management traffic (e.g., routing and service advertisement updates) tends to be periodic, the impact of which needs to be further examined.

The discussion in the paper was given to raise readers' awareness of these issues and complications in traffic analyses, and to suggest careful selection of traffic models for needs in specific scenarios.

  • [1]  P. Abry and D. Veitch. Wavelet Analysis of Long-Range Dependent Traffic, IEEE Transactions Info. Theory, 44 (1): 2-15, 1998.
  • [2]  P. Abry and D. Veitch. Long-Range Dependence: Revisiting Aggregation with Wavelets, Journal of Time Series Analysis},19 (3): 253-266, 1998
  • [3]  J. Beran. Statistics for Long-Memory Process, Chapman & Hall, 1994.
  • [4]  J. Beran, R. Sherman, M.S. Taqqu and W. Willinger. Long-Range Dependence in Variable-Bit-Rate Video Traffic. IEEE Transactions Commun, pp. 1566-1579, 1995
  • [5]  Broadband ISDN Switching System Generic Requirements, Bellcore Generic Requirements, GR-1110-CORE, issue 1, Revision 5, October 1997.
  • [6]  M.E. Crovella and A. Bestavros. Self-Similarity in World Wide Web Traffic: Evidence and Possible Causes, Proc. ACM Sigmetrics, pp. 160-169, May 1996.
  • [7]  L. Chappel, Novell’s Guide to NetWare LAN Analysis, Novell Press, San Jose, 1993.
  • [8]  D.E. Duffy, A.A. McIntosh, M. Rosenstein and W. Willinger, Statistical Analysis of CCSN/SS7 Traffic Data from Working Subnetworks, IEEE JSAC, pp. 544-551, 1994.
  • [9]  A. Erramilli, E.L. Lipper and J.L. Wang. Some Performance Considerations for Mass Market Broadband Services, Proc. IEEE International Workshop on Community Networking, pp. 109-116, San Francisco, CA, July 1994.
  • [10]  A. Erramilli, O. Narayan and W. Willinger. Experimental Queueing Analysis with Long-Range Dependent Packet Traffic, IEEE/ACM Trans. On Networking, 4(2), 209-223, April 1996.
  • [11]  A. Erramilli and J.L. Wang. Monitoring Packet Traffic Levels, Proc. IEEE Globecom, pp. 274-280, San Francisco, CA, 1994.
  • [12]  A. Erramilli, W. Willinger and J.L. Wang, Modeling and Management of Self-Similar Traffic Flows in High-Speed Networks, in Network Systems Design, Chap. 4, K. Bagchi (ed.), Gordon and Breach, 1999.
  • [13]  A. Feldmann, A.C. Gilbert and W. Willinger, Data Networks as Cascades: Investigating the Multifractal Nature of Internet WAN Traffic, Proc. ACM Sigcomm, Vancouver, Canada, September 1998.
  • [14]  Frame Relay Network Element Operations, Bellcore Generic Requirements, GR-1327-CORE, issue 1, March 1994.
  • [15]  M.W. Garrett and W. Willinger, Analysis, Modeling and Generation of Self-Similar VBR Video Traffic, Proc. ACM Sigcomm, pp. 269-280, London, UK, 1994.
  • [16]  B.M. Hill. A Simple General Approach to Inference About the Tail of a Distribution, Annals of Statistics, pp. 1163-1174, 1975.
  • [17]  J.L. Jerkins and J.L. Wang, A Measurement Analysis of ATM Cell-Level Aggregate Traffic, Proc. IEEE Globecom, pp. 1589-1595, Phoenix, AZ, November 1997.
  • [18]  J.L. Jerkins and J.L. Wang. Traffic Analysis and Engineering for CCS Links Carrying 800 or AIN Service, Proc. ISCOM, pp. 15-19, Hsinchu, Taiwan, December 1997.
  • [19]  J.L. Jerkins and J.L. Wang, Establishing Broadband Application Signatures through ATM Network Traffic Measurement Analyses, Proc. IEEE ICC, pp. 837-843, Atlanta, GA, June 1998.
  • [20]  J.L. Jerkins and J.L. Wang, A Cell-Level Measurement Analysis of Individual ATM Connections, Workshop on Workload Characterization in High-Performance Computing Environments, Montreal, Canada, July 1998.
  • [21]  J.L. Jerkins and J.L. Wang, A Close Look at Traffic Measurements from Packet Networks, Proc. IEEE Globecom, pp. 2405-2411, Sydney, Australia, November 1998.
  • [22]  J.L. Jerkins and J.L. Wang, Carrying PSTN-Offloaded Internet Traffic over Frame Relay: Frame and Call Level Traffic Analyses, to appear in Proc. IEEE NetWorld+Interop Engineers Conference, Las Vegas, NV, May 1999.
  • [23]  K.R. Krishnan, A.L. Neidhardt and A. Erramilli, Scaling Analysis in Traffic Management of Self-Similar Process, Proc. ITC, pp. 1087-1096, Washington, DC, June 1997.
  • [24]  W.E. Leland, M.S. Taqqu, W. Willinger and D.V. Wilson, On the Self-Similar Nature of Ethernet Traffic (Extended Version), IEEE/ACM Transactions on Networking, 2(1): 1-15, 1994.
  • [25]  K. Meier-Hellstern, P.E. Wirth, Y.-L. Yan and D.A. Hoeflin, Traffic Models for ISDN Data Users: Office Automation Application, Proc. 13th ITC, pp. 167-172, Copenhagen, 1991.
  • [26]  O. Narayan. Exact Asymptotic Queue Length Distribution for Fractional Brownian Traffic, Advances in Performance Analysis, 1 (1): 39-64. 1998
  • [27]  A.L. Neidhardt and J.L. Wang. The Concept of Relevant Time Scales and Its Application to Queuing Analysis of Self-Similar Traffic (or Is Hurst Naughty or Nice?), Proc. ACM Sigmetrics, pp. 222-232, Madison, WI, June 1998.
  • [28]  I. Norros, A Storage Model with Self-Similar Input, Queueing Systems, Vol. 16, pp. 387-396, 1994.
  • [29]  V. Paxson. Empirically-Derived Analytic Models of Wide-Area TCP Connections, IEEE/ACM Transactions on Networking, 2 (4): 316-336, August 1994.
  • [30]  V. Paxson and S. Floyd, Wide-Area Traffic: The Failure of Poisson Modeling, Proc. ACM Sigcomm, pp. 257-268, London, UK, 1994.
  • [31]  P. Pruthi, An Application of Chaotic Maps to Packet Traffic Modeling, Royal Institute of Technology, Stockholm, Sweden, TRITA-IT R 95:19, ISSN 1103534X, October 1995.
  • [32]  W. Willinger, S. Devadhar, A. Heybey, R. Sherman, M. Sullivan and J. Vollaro, Measuring ATM Traffic Cell-by-Cell: Experiences and Preliminary Findings from BAGNet, Proc. PMCCN, pp. 91-110, Tsukuba, Japan, 1997.
  • [33]  W. Willinger, M.S. Taqqu, R. Sherman, and D.V. Wilson, Self-Similarity through High-Variability: Statistical Analysis of Ethernet LAN Traffic at the Source Level, Proc. ACM Sigcomm, pp. 100-113, Cambridge, MA, August 1995.

Publication Dates

  • Publication in this collection
    31 July 2000
  • Date of issue
    Feb 1999
Sociedade Brasileira de Computação Sociedade Brasileira de Computação - UFRGS, Av. Bento Gonçalves 9500, B. Agronomia, Caixa Postal 15064, 91501-970 Porto Alegre, RS - Brazil, Tel. / Fax: (55 51) 316.6835 - Campinas - SP - Brazil
E-mail: jbcs@icmc.sc.usp.br